ddn.data.xml.internal.lexer

Internal: XML tokenization.

This module is not part of the public API and may change at any time.

Types 11

Token kinds produced by XmlLexer.

EOFEnd-of-input.
TEXTText content outside markup.
START_TAGStart tag like `<a>` or empty-element tag like `<a/>`.
END_TAGEnd tag like `</a>`.
COMMENTComment `<!-- ... -->` (content excludes the delimiters).
CDATACDATA section `<![CDATA[ ... ]]>` (content excludes the delimiters).
PROCESSING_INSTRUCTIONProcessing instruction `<?target ...?>` (raw content excludes delimiters).
DOCTYPE`DOCTYPE` declaration `<!DOCTYPE ...>` (raw content excludes delimiters).

An attribute parsed from a start tag.

Fields
string nameAttribute name (QName).
string valueRaw attribute value (without surrounding quotes).
XmlLocation locationLocation of the attribute name start.
structXmlToken

A single lexical token.

Fields
XmlTokenKind kindToken kind.
string textRaw content (meaning depends on `kind`).
string nameOptional name (for start/end tags and processing instructions).
bool isEmptyElementTrue if `kind == START_TAG` and the tag ended with `/>`.
XmlLocation locationLocation of the token start.
XmlTokenAttribute[] attributesAttributes for `START_TAG` tokens (may be empty).
Constructors
this( XmlTokenKind kind, string text, string name, bool isEmptyElement, XmlLocation location, XmlTokenAttribute[] attributes = null)Constructs an `XmlToken`.

A minimal XML lexer.

This tokenizer is intentionally conservative and is used internally by the parser.

Fields
private string _input
private size_t _i
private ulong _line
private ulong _column
private ulong _byteOffset
private string _systemId
Methods
bool eof() const @safe nothrowReturns `true` if the lexer is at end-of-input.
XmlLexer save() @property const @safeSaves the current state of the lexer.
XmlToken next() @safeReads the next token.
XmlLocation currentLocation() const @safe nothrow
char peekChar() const @safe nothrow
bool startsWithAt(const string needle) const @safe nothrow
void advance() @safe nothrow
void advanceN(size_t n) @safe nothrow
void skipWhitespace() @safe nothrow
string readName() @safe
string readUntil(const string terminator, XmlErrorCode eofCode) @safe
Constructors
this(string input, string systemId = "")Constructs a lexer.

Identifies which high-level scan the incremental lexer is currently performing.

When the lexer is in the middle of a multi-byte token across chunk boundaries, this enum records what kind of token is being assembled so that the scan can be resumed when more data arrives.

IDLEIdle — not inside any partial token; ready to classify the next byte.
TEXTScanning text content (outside markup).
COMMENTInside a comment `<!-- ... -->`; tracking dash sequences.
CDATAInside a CDATA section `<![CDATA[ ... ]]>`.
PIInside a processing instruction `<?target ...?>`.
DOCTYPEInside a DOCTYPE declaration `<!DOCTYPE ...>`.
TAGInside a start tag `<name ...>` or end tag `</name ...>`.
ERRORAn error was encountered during tokenization.

Mutable state carried across chunk boundaries for a partially-scanned tag.

Only valid when the enclosing lexer state is XmlIncrementalLexerState.TAG.

Fields
bool isEndTag`true` if this is an end tag (`</name ...>`).
bool sawOpenAngle`true` if the opening `<` (and optional `/`) have been consumed.
bool nameComplete`true` if the element name has been fully read.
string nameThe element name accumulated so far.
bool isEmptyElement`true` if a self-closing `/>` has been detected.
int phaseCurrent sub-phase within the tag scan.
char attrQuoteQuote character used for the current attribute value (`'\''` or `'"'`).
XmlTokenAttribute[] attributesAttributes accumulated so far.

Mutable state for a partially-scanned comment, CDATA section, or processing instruction.

These constructs are all delimited by a fixed terminator string:

  • Comment: `"-->"`
  • CDATA: `"]]>"`
  • PI: `"?>"`"

The struct tracks how many characters of the terminator have been matched consecutively so the scan can resume correctly when data arrives mid-terminator.

Fields
size_t terminatorMatchedNumber of consecutive terminator characters matched so far (0 .. terminator.length).
string terminatorTerminator string for the current construct.

Mutable state for a partially-scanned DOCTYPE declaration.

DOCTYPE is not terminated by a fixed string. Instead it tracks bracket depth and quote state to correctly handle internal subsets and quoted strings that may contain `>`.

Fields
int bracketDepthNesting depth of `[` brackets (internal subset).
char quoteCurrent quote character (`'\''` or `'"'`).

Aggregate of all partial-token states for the incremental lexer.

Only one member is active at a time, determined by the enclosing XmlIncrementalLexerState. The active member should be considered invalid when the lexer state does not correspond to it.

Fields
XmlPartialTagState tagState for TAG scans. Valid when lexer state is `TAG`.
XmlPartialDelimitedState delimitedState for comment/CDATA/PI scans. Valid when lexer state is COMMENT/CDATA/PI.
XmlPartialDoctypeState doctypeState for DOCTYPE scans. Valid when lexer state is `DOCTYPE`.
string contentText content accumulated so far for the current partial token.

Result of an incremental lexer tryNext() attempt.

Fields: token = The token produced when ready is true. When ready is false, contains a placeholder EOF token. ready = true when a complete token was produced; false when more data is needed.

Fields
XmlToken tokenThe token (valid only when `ready` is `true`).
bool readyWhether a complete token was produced.

An incremental (resumable) XML lexer that accepts data in chunks.

Unlike XmlLexer, which requires the entire input up front, this class maintains an internal growable buffer and supports the feed() / compact() / markEndOfStream() lifecycle required for streaming parsing.

Tokenization (tryNext()) is built on top of this substrate in T8/T9. This class (T7) provides only the buffer management and query primitives.

Ownership: feed() copies caller data into internal storage. The caller may free or overwrite its buffer immediately after feed() returns.

Fields
private string _systemId
private char[] _buf
private size_t _readPos
private size_t _writePos
private bool _eos
private ulong _line
private ulong _column
private ulong _byteOffset
private XmlPartialTokenState _partial
private XmlLocation _tokenStartLoc
Methods
void feed(const(char)[] chunk) @safeAppends `chunk` to the internal unread buffer.
void markEndOfStream() @safe pure nothrowSignals that no more data will be fed.
bool isEndOfStream() const @safe pure nothrowReturns `true` if `markEndOfStream()` has been called.
void compact() @safe pure nothrowMoves unread bytes to the front of the internal buffer to reclaim space.
size_t available() const @safe pure nothrowReturns the number of unread bytes currently in the buffer.
bool isDrained() const @safe pure nothrowReturns `true` when there are no unread bytes and the stream has been marked as ended.
ulong line() const @safe pure nothrowReturns the current line number (1-based).
ulong column() const @safe pure nothrowReturns the current column number (1-based).
ulong byteOffset() const @safe pure nothrowReturns the cumulative byte offset from the start of the first `feed()`.
string systemId() const @safe pure nothrowReturns the system identifier passed at construction.
XmlLocation currentLocation() const @safe pure nothrowReturns the current location as an `XmlLocation`.
const(char)[] unread() const @safe pure nothrowReturns a slice of the unread portion of the internal buffer.
void enforceNotEos() const @safeThrows `XmlException` if the stream has been marked as ended.
void ensureCapacity(size_t additional) pure nothrow @trustedEnsures the internal buffer has room for `additional` bytes beyond the current write position, growing by doubling capacity as needed.
char peek() @safe pure nothrowPeeks at the next unread byte without consuming it.
void advanceOne() @safe pure nothrowAdvances the read cursor by one byte, updating line/column/offset counters.
void advanceN(size_t n) @safe pure nothrowAdvances the read cursor by `n` bytes, updating location counters.
bool startsWithAt(string needle, size_t fromReadPos) const @safe pure nothrowReturns `true` if at least `needle.length` unread bytes starting at the given offset match `needle`.
void skipWhitespace() @safe pure nothrowSkips whitespace bytes in the unread buffer without producing a token.
XmlLexerResult tryNext() @safeAttempts to produce the next token from the buffered input.
XmlLexerResult tokenReady(XmlToken tok) @trustedBuilds a `XmlLexerResult` signalling that a token was successfully produced.
XmlLexerResult unexpectedEof() @safeSets the lexer to `ERROR` state and throws an `UNEXPECTED_EOF` exception.
bool scanName(ref string dest) @safeReads a name starting at the current read position into `partial.tag.name`.
XmlLexerResult tryNextFromIdle() @safeAttempts to classify and begin scanning the next token from `IDLE` state.
XmlLexerResult tryResumeText() @safeResumes scanning a TEXT token across chunk boundaries.
XmlLexerResult tryResumeTag() @safeResumes scanning a STARTTAG or ENDTAG token across chunk boundaries.
XmlLexerResult scanTagAttributes() @safeScans attributes inside a start tag, resuming across chunk boundaries.
XmlLexerResult tryResumePI() @safeResumes scanning a PROCESSING_INSTRUCTION token across chunk boundaries.
XmlLexerResult tryResumeDelimited(XmlTokenKind kind) @safeResumes scanning a delimited token (COMMENT or CDATA) across chunk boundaries.
XmlLexerResult tryResumeDoctype() @safeResumes scanning a DOCTYPE declaration across chunk boundaries.
Constructors
this(string systemId = "")Constructs an incremental lexer.