ddn.data.xml.internal.lexer
Internal: XML tokenization.
This module is not part of the public API and may change at any time.
Types 11
Token kinds produced by XmlLexer.
An attribute parsed from a start tag.
string nameAttribute name (QName).string valueRaw attribute value (without surrounding quotes).XmlLocation locationLocation of the attribute name start.A single lexical token.
XmlTokenKind kindToken kind.string textRaw content (meaning depends on `kind`).string nameOptional name (for start/end tags and processing instructions).bool isEmptyElementTrue if `kind == START_TAG` and the tag ended with `/>`.XmlLocation locationLocation of the token start.XmlTokenAttribute[] attributesAttributes for `START_TAG` tokens (may be empty).this(
XmlTokenKind kind,
string text,
string name,
bool isEmptyElement,
XmlLocation location,
XmlTokenAttribute[] attributes = null)Constructs an `XmlToken`.A minimal XML lexer.
This tokenizer is intentionally conservative and is used internally by the parser.
private string _inputprivate size_t _iprivate ulong _lineprivate ulong _columnprivate ulong _byteOffsetprivate string _systemIdstring readName() @safestring readUntil(const string terminator, XmlErrorCode eofCode) @safeXmlToken readComment() @safeXmlToken readProcessingInstruction() @safeXmlToken readEndTag() @safeXmlToken readStartTag() @safeXmlToken readDoctype() @safethis(string input, string systemId = "")Constructs a lexer.Identifies which high-level scan the incremental lexer is currently performing.
When the lexer is in the middle of a multi-byte token across chunk boundaries, this enum records what kind of token is being assembled so that the scan can be resumed when more data arrives.
Mutable state carried across chunk boundaries for a partially-scanned tag.
Only valid when the enclosing lexer state is XmlIncrementalLexerState.TAG.
bool isEndTag`true` if this is an end tag (`</name ...>`).bool sawOpenAngle`true` if the opening `<` (and optional `/`) have been consumed.bool nameComplete`true` if the element name has been fully read.string nameThe element name accumulated so far.bool isEmptyElement`true` if a self-closing `/>` has been detected.int phaseCurrent sub-phase within the tag scan.char attrQuoteQuote character used for the current attribute value (`'\''` or `'"'`).XmlTokenAttribute[] attributesAttributes accumulated so far.Mutable state for a partially-scanned comment, CDATA section, or processing instruction.
These constructs are all delimited by a fixed terminator string:
- Comment: `"-->"`
- CDATA: `"]]>"`
- PI: `"?>"`"
The struct tracks how many characters of the terminator have been matched consecutively so the scan can resume correctly when data arrives mid-terminator.
size_t terminatorMatchedNumber of consecutive terminator characters matched so far (0 .. terminator.length).string terminatorTerminator string for the current construct.Mutable state for a partially-scanned DOCTYPE declaration.
DOCTYPE is not terminated by a fixed string. Instead it tracks bracket depth and quote state to correctly handle internal subsets and quoted strings that may contain `>`.
int bracketDepthNesting depth of `[` brackets (internal subset).char quoteCurrent quote character (`'\''` or `'"'`).Aggregate of all partial-token states for the incremental lexer.
Only one member is active at a time, determined by the enclosing XmlIncrementalLexerState. The active member should be considered invalid when the lexer state does not correspond to it.
XmlPartialTagState tagState for TAG scans. Valid when lexer state is `TAG`.XmlPartialDelimitedState delimitedState for comment/CDATA/PI scans. Valid when lexer state is COMMENT/CDATA/PI.XmlPartialDoctypeState doctypeState for DOCTYPE scans. Valid when lexer state is `DOCTYPE`.string contentText content accumulated so far for the current partial token.Result of an incremental lexer tryNext() attempt.
Fields: token = The token produced when ready is true. When ready is false, contains a placeholder EOF token. ready = true when a complete token was produced; false when more data is needed.
XmlToken tokenThe token (valid only when `ready` is `true`).bool readyWhether a complete token was produced.An incremental (resumable) XML lexer that accepts data in chunks.
Unlike XmlLexer, which requires the entire input up front, this class maintains an internal growable buffer and supports the feed() / compact() / markEndOfStream() lifecycle required for streaming parsing.
Tokenization (tryNext()) is built on top of this substrate in T8/T9. This class (T7) provides only the buffer management and query primitives.
Ownership: feed() copies caller data into internal storage. The caller may free or overwrite its buffer immediately after feed() returns.
private string _systemIdprivate char[] _bufprivate size_t _readPosprivate size_t _writePosprivate bool _eosprivate ulong _lineprivate ulong _columnprivate ulong _byteOffsetprivate XmlIncrementalLexerState _stateprivate XmlPartialTokenState _partialprivate XmlLocation _tokenStartLocvoid compact() @safe pure nothrowMoves unread bytes to the front of the internal buffer to reclaim space.size_t available() const @safe pure nothrowReturns the number of unread bytes currently in the buffer.bool isDrained() const @safe pure nothrowReturns `true` when there are no unread bytes and the stream has been marked as ended.ulong byteOffset() const @safe pure nothrowReturns the cumulative byte offset from the start of the first `feed()`.XmlLocation currentLocation() const @safe pure nothrowReturns the current location as an `XmlLocation`.const(char)[] unread() const @safe pure nothrowReturns a slice of the unread portion of the internal buffer.void ensureCapacity(size_t additional) pure nothrow @trustedEnsures the internal buffer has room for `additional` bytes beyond the current write position, growing by doubling capacity as needed.void advanceOne() @safe pure nothrowAdvances the read cursor by one byte, updating line/column/offset counters.void advanceN(size_t n) @safe pure nothrowAdvances the read cursor by `n` bytes, updating location counters.bool startsWithAt(string needle, size_t fromReadPos) const @safe pure nothrowReturns `true` if at least `needle.length` unread bytes starting at the given offset match `needle`.void skipWhitespace() @safe pure nothrowSkips whitespace bytes in the unread buffer without producing a token.XmlLexerResult needMore() @trustedXmlLexerResult tokenReady(XmlToken tok) @trustedBuilds a `XmlLexerResult` signalling that a token was successfully produced.XmlLexerResult unexpectedEof() @safeSets the lexer to `ERROR` state and throws an `UNEXPECTED_EOF` exception.bool scanName(ref string dest) @safeReads a name starting at the current read position into `partial.tag.name`.XmlLexerResult tryNextFromIdle() @safeAttempts to classify and begin scanning the next token from `IDLE` state.XmlLexerResult tryResumeTag() @safeResumes scanning a STARTTAG or ENDTAG token across chunk boundaries.XmlLexerResult scanTagAttributes() @safeScans attributes inside a start tag, resuming across chunk boundaries.XmlLexerResult tryResumePI() @safeResumes scanning a PROCESSING_INSTRUCTION token across chunk boundaries.XmlLexerResult tryResumeDelimited(XmlTokenKind kind) @safeResumes scanning a delimited token (COMMENT or CDATA) across chunk boundaries.XmlLexerResult tryResumeDoctype() @safeResumes scanning a DOCTYPE declaration across chunk boundaries.this(string systemId = "")Constructs an incremental lexer.