dex.cf.lexer

CF Lexer - Tokenizer for CF Configuration Files

This module provides a lexer that converts CF source text into a stream of tokens. It implements the input range interface for convenient iteration and preserves comments as tokens for roundtrip-preserving document model support.

The lexer handles:

Whitespace (space, tab, CR, LF) - LF emits NEWLINE tokens for separator detection
Line comments (hash and double-slash) - preserved as COMMENT tokens
Block comments (slash-star) - preserved as BLOCK_COMMENT tokens
Punctuation: braces, brackets, equals, colon, comma, semicolon
String literals (double, single, triple, raw quoted)
Numbers (integers and floats)
Identifiers and keywords
Temporal literals (date, time, datetime)
Copyright
© 2025 Dejan Lekić
License
BSD-3-Clause

struct Lexer

fn tokenize tokenizeAll

Types 1

structLexer

Lexer for CF source text.

Converts CF source into a stream of tokens, implementing the input range interface. Comments are preserved as tokens to support the roundtrip-preserving document model.

Example:

auto lexer = Lexer(`{ key = "value" }`);
foreach (token; lexer) {
    writeln(token);
}

Fields

string source

string filename

size_t pos

size_t line

size_t col

Token currentToken

bool initialized

bool eofReturned

Methods

Token front() @safe pureReturns the current token.

void popFront() @safe pureAdvances to the next token.

bool empty() @safe pureChecks if the token stream is exhausted.

private void advance() @safe pureAdvances the lexer to produce the next token.

private void skipWhitespaceExceptNewline() @safe pureSkips whitespace characters except newlines. Newlines are handled separately as they can be significant separators.

private void lexLineComment() @safe pureLexes a line comment (# or //).

private void lexBlockComment() @safe pureLexes a block comment (slash-star to star-slash).

private void lexString() @safe pureLexes a string literal (double or single quoted, including triple-quoted). Validates escape sequences and rejects unescaped control characters.

private bool validateAndConsumeEscape() @safe pureValidates and consumes an escape sequence.

private bool consumeHexDigits(size_t n) @safe pureConsumes exactly n hex digits.

private bool isControlChar(char c) @safe pure nothrowChecks if a character is a control character (U+0000 to U+001F, except tab U+0009).

private void lexTripleQuotedString(char quote) @safe pureLexes a triple-quoted string.

private void lexRawString() @safe pureLexes a raw string literal (r"..." or r'...').

private void lexNumber() @safe pureLexes a number (integer or float).

private void lexDecimalNumber(size_t startPos, Location startLoc) @safe pureLexes a decimal number (integer or float).

private void lexHexNumber(size_t startPos, Location startLoc) @safe pureLexes a hexadecimal number.

private void lexOctalNumber(size_t startPos, Location startLoc) @safe pureLexes an octal number.

private void lexBinaryNumber(size_t startPos, Location startLoc) @safe pureLexes a binary number.

private void lexEnvVar() @safe pureLexes an identifier or keyword. Lexes an environment variable reference: ${VAR}, ${VAR:-default}, ${VAR:?message}

void lexIdentifierOrKeyword() @safe pure

private bool looksLikeTemporalStart() @safe pure nothrowChecks if current position looks like the start of a temporal literal. Pattern: YYYY-MM-DD (4 digits, hyphen, 2 digits, hyphen, 2 digits)

private bool looksLikeTimeStart() @safe pure nothrowChecks if the current position looks like a time literal start.

private void lexTime() @safe pureLexes a standalone time literal (HH:MM:SS with optional fractional seconds).

private void lexTemporal() @safe pureLexes a temporal literal (date, time, or datetime).

private TokenType classifyKeyword(string text) @safe pure nothrowClassifies a keyword or returns IDENTIFIER.

private Token makeToken(TokenType type, string value) @safe pureCreates a token with the current location.

private Location currentLocation() @safe pureReturns the current source location.

private void consumeChar() @safe pureConsumes a character and updates position tracking.

private bool isDigit(char c) @safe pure nothrowChecks if a character is a digit.

private bool isHexDigit(char c) @safe pure nothrowChecks if a character is a hex digit.

private bool isOctalDigit(char c) @safe pure nothrowChecks if a character is an octal digit.

private bool isIdStart(char c) @safe pure nothrowChecks if a character is a valid identifier start. Per UAX #31: XID_Start or underscore. Uses std.uni.isAlpha for Unicode support.

private bool isIdStartAt(size_t idx) @safe pureChecks if the UTF-8 character at position idx is a valid identifier start. Decodes multi-byte UTF-8 sequences for proper Unicode support.

private bool isIdContinue(char c) @safe pure nothrowChecks if a character is a valid identifier continuation. Per UAX #31: XID_Continue or hyphen (CF-specific extension). Uses std.uni.isAlphaNum for Unicode support.

private bool isIdContinueAt(size_t idx) @safe pureChecks if the UTF-8 character at position idx is a valid identifier continuation. Decodes multi-byte UTF-8 sequences for proper Unicode support.

private void consumeUtf8Char() @safe pureConsumes a UTF-8 character (potentially multi-byte) and updates position/column.

Constructors

this(string input, string sourceFilename = "")Constructs a lexer for the given CF source text.

Functions 2

fnLexer tokenize(string input, string filename = "") @safe pureCreates a lexer for the given CF source text.

fnToken[] tokenizeAll(string input, string filename = "") @safe pureTokenizes the input and eagerly collects all tokens into an array.

dex.cf.lexer

Copyright

License

Types 1

Functions 2