Lexer Submodule
SmartGameFormat.Lexer
— Module.The Lexer
sub-module is concerned with transcribing a given stream of characters into a sequence of domain specific lexical units called "token".
Basic methodology:
Wrap a plain
IO
object into aLexer.CharStream
.Call
Lexer.next_token
to collect anotherLexer.Token
from the character stream.Goto 2. unless end of file is reached.
For convenience the above process is simplified by providing the type Lexer.TokenStream
, which supports eof
, read
and Lexer.peek
.
Types
SmartGameFormat.Lexer.CharStream
— Type.CharStream(io::IO)
Stateful decorator around io
to keep track of some context information, as well as allow the use of peek
(i.e. looking at the next character without consuming it).
TokenStream(cs::CharStream)
Stateful decorator around cs
to allow the use of peek
(i.e. looking at the next Token
without consuming it).
It uses the function next_token
to create a new Token
from the current position of cs
onwards.
SmartGameFormat.Lexer.Token
— Type.Token(name::Char, [value::String])
A SGF specific lexical token. It can be either for the following:
Token('\0')
: Empty token to denote trailing whitespaces.Token(';')
: Separator for nodes.Token('(')
andToken(')')
: Delimiter for game trees.Token('[')
andToken(']')
: Delimiter for property values.Token('I', "AB1")
: Identifier for properties. In general these are made up of one or more uppercase letters. However, with the exception of the first position, digits are also allowed to occur. This is done in order to supported older FF versions.Token('S', "abc 23(\)")
: Any property value between'['
and']'
. This includes moves, numbers, simple text, and text.
Functions
SmartGameFormat.Lexer.peek
— Function.peek(cs::CharStream, ::Type{Char}) -> Char
Return the next Char
in cs
without consuming it, which means that the next time peek
or read
is called, the same Char
will be returned.
SmartGameFormat.Lexer.next_token
— Function.next_token(cs::CharStream) -> Token
Reads and returns the next Token
from the given character stream cs
. If no more token are available, then a EOFError
will be thrown.
Note that the lexer should support FF[1]-FF[4] versions. In case any unambiguously illegal character sequence is encountered, the function will throw a LexicalError
.
Exceptions
LexicalError(msg)
The string or stream passed to Lexer.next_token
was not a valid sequence of characters according to the smart game format.