ddn.data.xml.internal.lexer

Internal: XML tokenization.

This module is not part of the public API and may change at any time.

enum XmlIncrementalLexerState XmlTokenKind

struct XmlLexerResult XmlPartialDelimitedState XmlPartialDoctypeState XmlPartialTagState XmlPartialTokenState XmlToken XmlTokenAttribute

class XmlIncrementalLexer XmlLexer

Types 11

enumXmlTokenKind

Token kinds produced by XmlLexer.

EOFEnd-of-input.

TEXTText content outside markup.

START_TAGStart tag like `<a>` or empty-element tag like `<a/>`.

END_TAGEnd tag like `</a>`.

COMMENTComment `` (content excludes the delimiters).

CDATACDATA section `<![CDATA[ ... ]]>` (content excludes the delimiters).

PROCESSING_INSTRUCTIONProcessing instruction `<?target ...?>` (raw content excludes delimiters).

DOCTYPE`DOCTYPE` declaration `<!DOCTYPE ...>` (raw content excludes delimiters).

structXmlTokenAttribute

An attribute parsed from a start tag.

Fields

string nameAttribute name (QName).

string valueRaw attribute value (without surrounding quotes).

XmlLocation locationLocation of the attribute name start.

structXmlToken

A single lexical token.

Fields

XmlTokenKind kindToken kind.

string textRaw content (meaning depends on `kind`).

string nameOptional name (for start/end tags and processing instructions).

bool isEmptyElementTrue if `kind == START_TAG` and the tag ended with `/>`.

XmlLocation locationLocation of the token start.

XmlTokenAttribute[] attributesAttributes for `START_TAG` tokens (may be empty).

Constructors

this(
          XmlTokenKind kind,
          string text,
          string name,
          bool  isEmptyElement,
          XmlLocation location,
          XmlTokenAttribute[]  attributes =  null)

Constructs an `XmlToken`.

classXmlLexer

A minimal XML lexer.

This tokenizer is intentionally conservative and is used internally by the parser.

Fields

private string _input

private size_t _i

private ulong _line

private ulong _column

private ulong _byteOffset

private string _systemId

Methods

bool eof() const @safe nothrowReturns `true` if the lexer is at end-of-input.

XmlLexer save() @property const @safeSaves the current state of the lexer.

XmlToken next() @safeReads the next token.

XmlLocation currentLocation() const @safe nothrow

char peekChar() const @safe nothrow

bool startsWithAt(const string needle) const @safe nothrow

void advance() @safe nothrow

void advanceN(size_t n) @safe nothrow

void skipWhitespace() @safe nothrow

string readName() @safe

string readUntil(const string terminator, XmlErrorCode eofCode) @safe

XmlToken readText() @safe

XmlToken readComment() @safe

XmlToken readCData() @safe

XmlToken readProcessingInstruction() @safe

XmlToken readEndTag() @safe

XmlToken readStartTag() @safe

XmlToken readDoctype() @safe

Constructors

this(string input, string systemId = "")Constructs a lexer.

enumXmlIncrementalLexerState

Identifies which high-level scan the incremental lexer is currently performing.

When the lexer is in the middle of a multi-byte token across chunk boundaries, this enum records what kind of token is being assembled so that the scan can be resumed when more data arrives.

IDLEIdle — not inside any partial token; ready to classify the next byte.

TEXTScanning text content (outside markup).

COMMENTInside a comment ``; tracking dash sequences.

CDATAInside a CDATA section `<![CDATA[ ... ]]>`.

PIInside a processing instruction `<?target ...?>`.

DOCTYPEInside a DOCTYPE declaration `<!DOCTYPE ...>`.

TAGInside a start tag `<name ...>` or end tag `</name ...>`.

ERRORAn error was encountered during tokenization.

structXmlPartialTagState

Mutable state carried across chunk boundaries for a partially-scanned tag.

Only valid when the enclosing lexer state is XmlIncrementalLexerState.TAG.

Fields

bool isEndTag`true` if this is an end tag (`</name ...>`).

bool sawOpenAngle`true` if the opening `<` (and optional `/`) have been consumed.

bool nameComplete`true` if the element name has been fully read.

string nameThe element name accumulated so far.

bool isEmptyElement`true` if a self-closing `/>` has been detected.

int phaseCurrent sub-phase within the tag scan.

char attrQuoteQuote character used for the current attribute value (`'\''` or `'"'`).

XmlTokenAttribute[] attributesAttributes accumulated so far.

structXmlPartialDelimitedState

Mutable state for a partially-scanned comment, CDATA section, or processing instruction.

These constructs are all delimited by a fixed terminator string:

Comment: `"-->"`
CDATA: `"]]>"`
PI: `"?>"`"

The struct tracks how many characters of the terminator have been matched consecutively so the scan can resume correctly when data arrives mid-terminator.

Fields

size_t terminatorMatchedNumber of consecutive terminator characters matched so far (0 .. terminator.length).

string terminatorTerminator string for the current construct.

structXmlPartialDoctypeState

Mutable state for a partially-scanned DOCTYPE declaration.

DOCTYPE is not terminated by a fixed string. Instead it tracks bracket depth and quote state to correctly handle internal subsets and quoted strings that may contain `>`.

Fields

int bracketDepthNesting depth of `[` brackets (internal subset).

char quoteCurrent quote character (`'\''` or `'"'`).

structXmlPartialTokenState

Aggregate of all partial-token states for the incremental lexer.

Only one member is active at a time, determined by the enclosing XmlIncrementalLexerState. The active member should be considered invalid when the lexer state does not correspond to it.

Fields

XmlPartialTagState tagState for TAG scans. Valid when lexer state is `TAG`.

XmlPartialDelimitedState delimitedState for comment/CDATA/PI scans. Valid when lexer state is COMMENT/CDATA/PI.

XmlPartialDoctypeState doctypeState for DOCTYPE scans. Valid when lexer state is `DOCTYPE`.

string contentText content accumulated so far for the current partial token.

structXmlLexerResult

Result of an incremental lexer tryNext() attempt.

Fields: token = The token produced when ready is true. When ready is false, contains a placeholder EOF token. ready = true when a complete token was produced; false when more data is needed.

Fields

XmlToken tokenThe token (valid only when `ready` is `true`).

bool readyWhether a complete token was produced.

classXmlIncrementalLexer

An incremental (resumable) XML lexer that accepts data in chunks.

Unlike XmlLexer, which requires the entire input up front, this class maintains an internal growable buffer and supports the feed() / compact() / markEndOfStream() lifecycle required for streaming parsing.

Tokenization (tryNext()) is built on top of this substrate in T8/T9. This class (T7) provides only the buffer management and query primitives.

Ownership: feed() copies caller data into internal storage. The caller may free or overwrite its buffer immediately after feed() returns.

Fields

private string _systemId

private char[] _buf

private size_t _readPos

private size_t _writePos

private bool _eos

private ulong _line

private ulong _column

private ulong _byteOffset

private XmlIncrementalLexerState _state

private XmlPartialTokenState _partial

private XmlLocation _tokenStartLoc

Methods

void feed(const(char)[] chunk) @safeAppends `chunk` to the internal unread buffer.

void markEndOfStream() @safe pure nothrowSignals that no more data will be fed.

bool isEndOfStream() const @safe pure nothrowReturns `true` if `markEndOfStream()` has been called.

void compact() @safe pure nothrowMoves unread bytes to the front of the internal buffer to reclaim space.

size_t available() const @safe pure nothrowReturns the number of unread bytes currently in the buffer.

bool isDrained() const @safe pure nothrowReturns `true` when there are no unread bytes and the stream has been marked as ended.

ulong line() const @safe pure nothrowReturns the current line number (1-based).

ulong column() const @safe pure nothrowReturns the current column number (1-based).

ulong byteOffset() const @safe pure nothrowReturns the cumulative byte offset from the start of the first `feed()`.

string systemId() const @safe pure nothrowReturns the system identifier passed at construction.

XmlLocation currentLocation() const @safe pure nothrowReturns the current location as an `XmlLocation`.

const(char)[] unread() const @safe pure nothrowReturns a slice of the unread portion of the internal buffer.

void enforceNotEos() const @safeThrows `XmlException` if the stream has been marked as ended.

void ensureCapacity(size_t additional) pure nothrow @trustedEnsures the internal buffer has room for `additional` bytes beyond the current write position, growing by doubling capacity as needed.

char peek() @safe pure nothrowPeeks at the next unread byte without consuming it.

void advanceOne() @safe pure nothrowAdvances the read cursor by one byte, updating line/column/offset counters.

void advanceN(size_t n) @safe pure nothrowAdvances the read cursor by `n` bytes, updating location counters.

bool startsWithAt(string needle, size_t fromReadPos) const @safe pure nothrowReturns `true` if at least `needle.length` unread bytes starting at the given offset match `needle`.

void skipWhitespace() @safe pure nothrowSkips whitespace bytes in the unread buffer without producing a token.

XmlLexerResult tryNext() @safeAttempts to produce the next token from the buffered input.

XmlLexerResult needMore() @trusted

XmlLexerResult tokenReady(XmlToken tok) @trustedBuilds a `XmlLexerResult` signalling that a token was successfully produced.

XmlLexerResult unexpectedEof() @safeSets the lexer to `ERROR` state and throws an `UNEXPECTED_EOF` exception.

bool scanName(ref string dest) @safeReads a name starting at the current read position into `partial.tag.name`.

XmlLexerResult tryNextFromIdle() @safeAttempts to classify and begin scanning the next token from `IDLE` state.

XmlLexerResult tryResumeText() @safeResumes scanning a TEXT token across chunk boundaries.

XmlLexerResult tryResumeTag() @safeResumes scanning a STARTTAG or ENDTAG token across chunk boundaries.

XmlLexerResult scanTagAttributes() @safeScans attributes inside a start tag, resuming across chunk boundaries.

XmlLexerResult tryResumePI() @safeResumes scanning a PROCESSING_INSTRUCTION token across chunk boundaries.

XmlLexerResult tryResumeDelimited(XmlTokenKind kind) @safeResumes scanning a delimited token (COMMENT or CDATA) across chunk boundaries.

XmlLexerResult tryResumeDoctype() @safeResumes scanning a DOCTYPE declaration across chunk boundaries.

Constructors

this(string systemId = "")Constructs an incremental lexer.