dex.cf.lexer

CF Lexer - Tokenizer for CF Configuration Files

This module provides a lexer that converts CF source text into a stream of tokens. It implements the input range interface for convenient iteration and preserves comments as tokens for roundtrip-preserving document model support.

The lexer handles:

  • Whitespace (space, tab, CR, LF) - LF emits NEWLINE tokens for separator detection
  • Line comments (hash and double-slash) - preserved as COMMENT tokens
  • Block comments (slash-star) - preserved as BLOCK_COMMENT tokens
  • Punctuation: braces, brackets, equals, colon, comma, semicolon
  • String literals (double, single, triple, raw quoted)
  • Numbers (integers and floats)
  • Identifiers and keywords
  • Temporal literals (date, time, datetime)

Types 1

structLexer

Lexer for CF source text.

Converts CF source into a stream of tokens, implementing the input range interface. Comments are preserved as tokens to support the roundtrip-preserving document model.

Example:

auto lexer = Lexer(`{ key = "value" }`);
foreach (token; lexer) {
    writeln(token);
}

Fields
string source
string filename
size_t pos
size_t line
size_t col
Token currentToken
bool initialized
bool eofReturned
Methods
Token front() @safe pureReturns the current token.
void popFront() @safe pureAdvances to the next token.
bool empty() @safe pureChecks if the token stream is exhausted.
private void advance() @safe pureAdvances the lexer to produce the next token.
private void skipWhitespaceExceptNewline() @safe pureSkips whitespace characters except newlines. Newlines are handled separately as they can be significant separators.
private void lexLineComment() @safe pureLexes a line comment (# or //).
private void lexBlockComment() @safe pureLexes a block comment (slash-star to star-slash).
private void lexString() @safe pureLexes a string literal (double or single quoted, including triple-quoted). Validates escape sequences and rejects unescaped control characters.
private bool validateAndConsumeEscape() @safe pureValidates and consumes an escape sequence.
private bool consumeHexDigits(size_t n) @safe pureConsumes exactly n hex digits.
private bool isControlChar(char c) @safe pure nothrowChecks if a character is a control character (U+0000 to U+001F, except tab U+0009).
private void lexTripleQuotedString(char quote) @safe pureLexes a triple-quoted string.
private void lexRawString() @safe pureLexes a raw string literal (r"..." or r'...').
private void lexNumber() @safe pureLexes a number (integer or float).
private void lexDecimalNumber(size_t startPos, Location startLoc) @safe pureLexes a decimal number (integer or float).
private void lexHexNumber(size_t startPos, Location startLoc) @safe pureLexes a hexadecimal number.
private void lexOctalNumber(size_t startPos, Location startLoc) @safe pureLexes an octal number.
private void lexBinaryNumber(size_t startPos, Location startLoc) @safe pureLexes a binary number.
private void lexEnvVar() @safe pureLexes an identifier or keyword. Lexes an environment variable reference: ${VAR}, ${VAR:-default}, ${VAR:?message}
void lexIdentifierOrKeyword() @safe pure
private bool looksLikeTemporalStart() @safe pure nothrowChecks if current position looks like the start of a temporal literal. Pattern: YYYY-MM-DD (4 digits, hyphen, 2 digits, hyphen, 2 digits)
private bool looksLikeTimeStart() @safe pure nothrowChecks if the current position looks like a time literal start.
private void lexTime() @safe pureLexes a standalone time literal (HH:MM:SS with optional fractional seconds).
private void lexTemporal() @safe pureLexes a temporal literal (date, time, or datetime).
private TokenType classifyKeyword(string text) @safe pure nothrowClassifies a keyword or returns IDENTIFIER.
private Token makeToken(TokenType type, string value) @safe pureCreates a token with the current location.
private Location currentLocation() @safe pureReturns the current source location.
private void consumeChar() @safe pureConsumes a character and updates position tracking.
private bool isDigit(char c) @safe pure nothrowChecks if a character is a digit.
private bool isHexDigit(char c) @safe pure nothrowChecks if a character is a hex digit.
private bool isOctalDigit(char c) @safe pure nothrowChecks if a character is an octal digit.
private bool isIdStart(char c) @safe pure nothrowChecks if a character is a valid identifier start. Per UAX #31: XID_Start or underscore. Uses std.uni.isAlpha for Unicode support.
private bool isIdStartAt(size_t idx) @safe pureChecks if the UTF-8 character at position idx is a valid identifier start. Decodes multi-byte UTF-8 sequences for proper Unicode support.
private bool isIdContinue(char c) @safe pure nothrowChecks if a character is a valid identifier continuation. Per UAX #31: XID_Continue or hyphen (CF-specific extension). Uses std.uni.isAlphaNum for Unicode support.
private bool isIdContinueAt(size_t idx) @safe pureChecks if the UTF-8 character at position idx is a valid identifier continuation. Decodes multi-byte UTF-8 sequences for proper Unicode support.
private void consumeUtf8Char() @safe pureConsumes a UTF-8 character (potentially multi-byte) and updates position/column.
Constructors
this(string input, string sourceFilename = "")Constructs a lexer for the given CF source text.

Functions 2

fnLexer tokenize(string input, string filename = "") @safe pureCreates a lexer for the given CF source text.
fnToken[] tokenizeAll(string input, string filename = "") @safe pureTokenizes the input and eagerly collects all tokens into an array.