import ddn.data.csv;
// Reader detects CRLF/LF by default
auto d1 = CsvDialect.init; // newlinePolicy = DETECT
// Force writer to emit LF newlines
auto d2 = CsvDialect.init; d2.newlinePolicy = NewlinePolicy.FORCE_LF;ddn.data.csv
ddn.data.csv — High‑performance CSV reader/writer
This module will provide a fast, RFC 4180–compliant CSV reader and writer with configurable dialect options and performance features such as buffered I/O and zero‑copy parsing.
RFC 4180 Compliance -------------------
- Record delimiters: CRLF per RFC; reader will detect CRLF, LF, and legacy CR in
detectmode. - Header row: optional; same field count as data rows when enabled.
- Field delimiter: comma by default; configurable to other single‑byte delimiters.
- Quoted fields: fields containing delimiter, quote, or newline are quoted.
- Quote escaping: doubled quotes within quoted fields.
- Embedded newlines: supported inside quoted fields.
- Spaces: spaces are data; optional trimming for unquoted fields.
Dialect Options --------------------------------------------
delimiter: char— default `','`quote: char— default `'"'`doubleQuote: bool— defaulttruetrimWhitespace: bool— defaultfalse(unquoted fields only)newlinePolicy: NewlinePolicy— defaultdetect(recognize CRLF/LF)detect,forceCRLF,forceLFescapeStyle: EscapeStyle— defaultnonenone,backslash(extension for non‑RFC datasets)header: bool— defaultfalse
Additional toggles:
strictFieldCount: bool— strict mode vs permissiveacceptUtf8BOM: bool— accept/skip leading UTF‑8 BOM
Notes and Examples ------------------
- A comprehensive compliance and options document is available at
docs/rfc4180-compliance.md in this repository.
- Performance tips and tuning guidance are collected in
docs/performance-notes.md.
- Writer will quote any field that contains the delimiter, quote character, or
newline, and will double internal quotes.
- Reader will support embedded CRLF within quoted fields and mixed line endings
under newlinePolicy = detect.
Example: --- import ddn.data.csv;
void example() { auto dialect = CsvDialect( delimiter: ',', quote: '"', doubleQuote: true, trimWhitespace: false, newlinePolicy: NewlinePolicy.detect, escapeStyle: EscapeStyle.none, header: true ); }
Types 24
Newline policy controls how record boundaries are detected and, for writer, how newlines are emitted.
Planned values match RFC 4180 defaults while allowing practical overrides.
Examples
Escape style for non‑RFC datasets (optional extension). none — only RFC 4180 double‑quote escaping inside quoted fields. backslash — treat backslash as escape for delimiter/quote/newline in unquoted fields.
Example:
import ddn.data.csv;
auto d = CsvDialect.init;
d.escapeStyle = EscapeStyle.NONE; // RFC 4180 behavior (default)Reader error handling mode.
PERMISSIVE(default): malformed rows are skipped and errors are
counted; iteration continues. Optionally collect diagnostics.
FAIL_FAST: stop iteration at the first error and surface it via
CsvReader.stats.
Convenience aliases for common public types.
These aliases have no runtime cost and exist purely for readability in user code and documentation.
Examples
import ddn.data.csv;
CsvField f; // alias for FieldView
CsvRow row; // alias for RowView
CsvResultT!int ri; // alias for CsvResult!int
// Parametric reader/writer helpers
alias Reader = CsvReaderOf!(const(char)[]);
alias Writer = CsvWriterTo!(typeof((const(char)[] s){}) ); // any sink with put()Shorthand for CsvResult!T.
Shorthand for CsvReader!R.
Shorthand for CsvWriter!S.
Lightweight non‑owning view over a CSV field.
const(char)[] dataSlice pointing into an underlying buffer (non-owning, zero-copy).bool wasQuotedTrue when the source field was quoted.bool needsUnescapeTrue when the field contains doubled quotes that must be unescaped to get logical text.Header index providing fast name -> column index lookup.
Built once from a set of header fields (typically the first CSV record when CsvDialect.header == true) and attached to subsequent RowView instances via RowView.attachHeader to enable row.byName("col") without per-row overhead.
Policy for duplicates: the first occurrence wins; hasDuplicates is set to true when the header contains duplicate (normalized) names.
private string[] _namesprivate size_t[string] _mapprivate bool _caseSensitiveprivate bool _hasDuplicatesbool hasDuplicates() @property const @safe pure nothrow @nogcWhether the header contains duplicate names (after normalization).inout(string) nameAt(size_t i) inout @safe pure nothrow @nogcReturn the original, unescaped name at index `i`.CsvResult!size_t indexOf(scope const(char)[] name) const @safe pure nothrow @nogcLookup the index of a column by `name`. Returns `unknownColumn` on miss.string _normalize(scope const(char)[] s) const @safeNormalize name for lookup depending on case-sensitivity policy.Lightweight non‑owning view over a parsed CSV row.
void attachHeader(const(HeaderIndex) * header) @safe pure nothrow @nogcAttach a header index to this row to enable name-based field access.Error codes describing common CSV parsing and configuration issues.
Structured error information returned in CsvResult!T.
CsvErrorCode codeError code.size_t line1‑based line counter if available (0 when unknown).size_t column1‑based column counter if available (0 when unknown).string messageOptional human‑readable message.Exception type used by optional throwing helpers.
This exception wraps a CsvError and is thrown by convenience APIs such as CsvResult!T.valueOrThrow() for users who prefer exception‑driven control flow instead of explicit result checking. Hot paths should prefer CsvResult without throwing for performance.
CsvError errorThe underlying structured error information.Result container for error‑aware APIs without throwing.
Holds either a value of type T (when isOk is true) or an error. For compile‑time friendliness we store both value and err; users should consult isOk before accessing value.
bool isOkTrue when the operation succeeded and `value` is valid.T valueThe result value (meaningful when `isOk == true`).CsvError errThe error (meaningful when `isOk == false`).CSV dialect configuration.
Represents the set of options that control how CSV is parsed and written. Defaults follow RFC 4180: comma delimiter, double‑quote for quoting, CRLF/LF detection on read, RFC‑only escaping, and no header row by default.
Fields (all public):
delimiter: Single‑byte field separator (default: `,`).quote: Quote character (default: `"`).doubleQuote: Whentrue(default), two consecutive quotes inside
quoted fields represent a literal quote character.
trimWhitespace: Whentrue, trim leading/trailing whitespace for
unquoted fields. Default: false.
newlinePolicy: Controls record boundary handling and writer emission
policy. Default: NewlinePolicy.detect.
escapeStyle: Optional extension for non‑RFC datasets. Default:none.header: Whentrue, the first record is interpreted as a header row.
Construction:
- The type can be used with its
.initvalue for RFC defaults, e.g.
auto d = CsvDialect.init; and then mutate fields.
- A convenience constructor allows specifying any subset (positional with
defaults) while other options fall back to defaults, e.g.: auto d = CsvDialect(';'); // semicolon delimiter
Validation:
isValid()returnstrueif options are self‑consistent. At minimum it
ensures delimiter != quote.
Examples
import ddn.data.csv;
// Customize only the delimiter (semicolon-separated values)
auto d1 = CsvDialect(';');
// Fully configure a dialect
auto d2 = CsvDialect(',', '\'', true, false, NewlinePolicy.forceLF);
assert(d1.isValid && d2.isValid);char delimiterField delimiter (single‑byte), default `,`.char quoteQuote character, default `"`.bool doubleQuoteWhether doubled quotes inside quoted fields represent a literal quote.bool trimWhitespaceTrim leading/trailing whitespace in unquoted fields.NewlinePolicy newlinePolicyNewline detection/emission policy.EscapeStyle escapeStyleOptional non‑RFC escape policy.bool headerInterpret the first record as a header row.ErrorMode errorModeError handling mode for the reader.bool strictFieldCountWhen `true`, enforce a consistent number of fields per row.bool collectDiagnosticsWhen `true`, the reader will collect per-error diagnostics in `stats`.bool isValid() const @safe pure nothrow @nogcReturns `true` when the dialect options are self‑consistent.this(
char delimiter,
char quote = DEFAULT_QUOTE,
bool doubleQuote = DEFAULT_DOUBLE_QUOTE,
bool trimWhitespace = DEFAULT_TRIM_WHITESPACE,
NewlinePolicy newlinePolicy = DEFAULT_NEWLINE_POLICY,
EscapeStyle escapeStyle = DEFAULT_ESCAPE_STYLE,
bool header = DEFAULT_HEADER
)Convenience constructor. Specify any prefix of arguments to customize options and rely on defaults for the rest.High‑throughput CSV reader over an input range of bytes.
Models a forward input range over RowView. Instances created via the provided constructors yield an empty range.
Examples
import ddn.data.csv;
// Parse two rows, sum first column lazily (no allocations on hot path)
const csv = "x,y\n10,20\n30,40\n";
long sum = 0;
auto r = byRows(csv);
// Skip header
r.popFront();
while (!r.empty)
{
auto row = r.front;
auto a = fromCsv!long(row[0]);
auto b = fromCsv!long(row[1]);
if (a.isOk && b.isOk) sum += a.value + b.value;
r.popFront();
}
assert(sum == 100);private CsvDialect _dialectprivate CsvReadStats _statsReading statistics and diagnostics (see `CsvReadStats`).private size_t _lineNumber of successfully yielded rows so far (1-based line numbers use `line + 1` for next row).private size_t _expectedFieldCountExpected field count when `strictFieldCount` is enabled.private bool _expectedSetprivate Range _sourceinout(CsvDialect) dialect() @property ref return inout @safe pure nothrowAccess the dialect (read/write).inout(CsvReadStats) stats() @property ref return inout @safe pure nothrowAccess read statistics and, optionally, diagnostics collected during iteration.inout(RowView) front() @property inout @safe pure nothrow @nogcCurrent row view. Valid until the next call to `popFront`.void _recordError(CsvError e) @safe nothrow @nogcRecord an error into the reader statistics and diagnostics as configured.this(Range source, CsvDialect dialect = CsvDialect.init)Construct a reader over `source` with an optional `dialect`.Efficient CSV writer to an output range of bytes.
Implements a buffered writer core that minimizes calls to the underlying sink by batching output into a large internal buffer. The buffer size is configurable via the constructor; a sensible default is used when not specified.
Notes:
- Fields are written as provided. Quoting rules per RFC 4180.
- Newline emission follows
dialect.newlinePolicy:detectand
forceCRLF emit CRLF; forceLF emits LF.
Examples
import ddn.data.csv;
static class Sink { string data; void put(const(char)[] s) { data ~= s; } }
auto s = new Sink();
auto w = CsvWriter!(Sink)(s); // default RFC dialect
assert(w.writeRow(["id", "name"]).isOk);
assert(w.writeRow(["1", "Doe, Jane"]).isOk); // quoted due to comma
w.flush();
assert(s.data.startsWith("id,name"));private OutputRange _sinkprivate CsvDialect _dialectprivate char[] _bufprivate size_t _posprivate size_t _flushessize_t DEFAULT_BUFFER_SIZEDefault buffer size (64 KiB).inout(CsvDialect) dialect() @property ref return inout @safe pure nothrowAccess the dialect (read/write).size_t flushes() @property const @safe pure nothrow @nogcNumber of flushes performed so far (diagnostic/testing aid).void flush()Ensure pending buffered data is sent to the underlying sink.void writeRaw(scope const(char)[] s)Write raw bytes through the buffer, flushing as necessary.void writeNewline()Write dialect-specific newline (CRLF for detect/forceCRLF, LF for forceLF).bool needsQuoting(scope const(char)[] s) const @safe pure nothrow @nogcDetermine whether `s` must be quoted given the current dialect.CsvResult!bool writeField(scope const(char)[] s)Write a single field applying quoting/escaping per RFC 4180 and dialect.CsvResult!bool writeRow(T)(auto ref T t) if (isTuple!T)Write a row from a `std.typecons.Tuple` of arbitrary values.CsvResult!bool writeRow(T)(auto ref T s) if (isAggregateType!T && !isTuple!T && !isSomeString!T && !is(T == RowView))Write a row from a user `struct` using field declaration order.CsvResult!bool writeRow(Args...)(auto ref Args args) if (!(Args.length == 1 && (is(Args[0] == string[]) || is(Args[0] == const(string)[]) || is(Args[0] == FieldView[]) || is(
Args[0] == RowView))))Variadic overload to write a row from heterogeneous values.CsvResult!bool writeValue(T)(auto ref T v)Emit a single value as a CSV field using the appropriate conversion.this(OutputRange sink, CsvDialect dialect = CsvDialect.init, size_t bufferSize = DEFAULT_BUFFER_SIZE)Construct a writer over `sink` with an optional `dialect` and `bufferSize`.~thisDestructor flushes any remaining bytes.Package-internal buffer manager that batches read() calls into large contiguous buffers to reduce syscall overhead and improve throughput.
Usage pattern (illustrative):
auto bm = BufferManager!MyReader(reader, 64 * 1024);
for (;;)
{
auto chunk = bm.nextChunk();
if (chunk.length == 0) break; // EOF
// Consume data from `chunk` (do not hold past next call).
bm.popFront(chunk.length);
}Notes:
- Buffer size is configurable via the constructor; defaults to 64 KiB.
nextChunkreturns a slice of the internal buffer; its content becomes
invalid after the next nextChunk call.
- Metrics
readCallsandbytesReadare available for tests/diagnostics.
private Reader _readerprivate ubyte[] _bufprivate size_t _fillprivate size_t _posprivate bool _eofprivate size_t _readCallsprivate size_t _bytesReadsize_t DEFAULT_BUFFER_SIZEDefault buffer size (64 KiB).size_t readCalls() @property const @safe pure nothrow @nogcNumber of low-level `read()` calls performed so far.size_t bytesRead() @property const @safe pure nothrow @nogcNumber of bytes read from the underlying reader so far.bool empty() @property const @safe pure nothrow @nogcReturns `true` when EOF has been reached and no data remains in buffer.size_t available() @property const @safe pure nothrow @nogcBytes currently available in the buffer (without more I/O).const(ubyte)[] nextChunk()Obtain the next available chunk. If the internal buffer is exhausted, this method refills it by calling the underlying `read()` once.void popFront(size_t n) @safe pure nothrowAdvance the current position by `n` bytes (consumed by caller).Package‑internal scan result for a single CSV row.
row: ARowViewreferencing the provided buffer slice; valid until the
next scan or buffer reuse.
consumed: Number of bytes consumed from the start of the buffer,
including the record terminator (CRLF/LF) when found.
hasRow:truewhen a complete row was found;falseotherwise (no
terminator encountered in the provided chunk per newlinePolicy).
RowView rowsize_t consumedbool hasRowbool badRowTrue when the scanned row exhibits a parsing anomaly (e.g., invalid quote sequence).CsvErrorCode errCodeError code describing the anomaly when `badRow == true`.size_t totalFieldsTotal number of fields in the row (may exceed row.length if truncated).Zero‑copy row scanner that splits a single row into fields without allocations and without quote/escape handling.
Notes:
- Handles CRLF and/or LF per
CsvDialect.NEWLINE_POLICY. - Delimiter is taken from
CsvDialect.DELIMITER. trimWhitespaceis applied to all fields (unquoted fields only are
relevant at this stage).
- Produces
FieldViewslices that reference the input buffer; callers must
ensure the buffer remains valid until they finish using the row.
- The scanner maintains an internal fixed array to avoid GC allocations in
hot paths. If the number of fields exceeds the fixed capacity, a dynamic array fallback is used for that row (rare); @nogc tests should keep field counts below the fixed capacity to avoid allocations.
bool isTrimChar(char c) @safe pure nothrow @nogcReturns true if character `c` is considered whitespace for trimming.bool isTerminator(const(char)[] buf, size_t i, NewlinePolicy p, ref size_t termLen) @safe pure nothrow @nogcDetects a record terminator at position `i` and returns its length via `termLen`.ScanResult scan(const(char)[] chunk, const bool eofIsTerminator = false) @trusted pure nothrow @nogcScan the provided `chunk` for a single CSV row using the scanner's dialect. Returns a `ScanResult`. When `hasRow` is false, the caller should provide more data (append next chunk) and retry.this(CsvDialect dialect)Reading statistics and optional diagnostics collected by CsvReader.
Fields:
rows: number of successfully yielded rows.badRows: number of malformed rows encountered.errors: total error count (equalsbadRowsin current implementation).lastError: the most recent error encountered.diagnostics: per-error list (populated only whencollectDiagnosticsis enabled).
size_t rowssize_t badRowssize_t errorsCsvError lastErrorprivate size_t DIAG_CAPACITYprivate CsvError[DIAG_CAPACITY] _diagnosticsprivate size_t _diagLenprivate size_t _diagDroppedinout(CsvError)[] diagnostics() @property return inout @safe pure nothrow @nogcView over collected diagnostics (non-owning slice).size_t diagnosticsDropped() @property const @safe pure nothrow @nogcNumber of diagnostics dropped due to capacity limits.void _pushDiagnostic(CsvError e) @safe nothrow @nogcInternal: push a diagnostic if capacity allows, otherwise count as dropped.Mode for opening a CsvFile.
Convenience wrapper for reading/writing CSV files.
Examples
import ddn.data.csv;
import std.file : remove, exists;
// Write a small CSV file
CsvFile out = CsvFile("./t23_sample.csv", CsvOpenMode.write);
string[][] rows = [["a","b"],["1","x,y"],["2","z"]];
assert(out.writeRows(rows).isOk);
assert(out.close().isOk);
// Read it back (buffered path)
CsvFile f = CsvFile("./t23_sample.csv", CsvOpenMode.read);
auto rr = f.reader();
size_t cnt = 0; while (!rr.empty) { ++cnt; rr.popFront(); }
assert(cnt == rows.length);
// Cleanup
if (exists("./t23_sample.csv")) remove("./t23_sample.csv");string pathFilesystem path to the CSV.CsvOpenMode modeOpen mode (read/write/append).CsvDialect dialectDialect used for parsing/formatting.bool memoryMappedWhen true, memory mapping is used for reading where supported.private File _fileprivate MmFile _mmprivate const(char)[] _mappedCsvResult!bool close()Close the file if it is open. Safe to call multiple times. Returns: success or `ioFailure` if the OS close operation fails.CsvWriter!(FileSink) _makeWriter()CsvResult!bool writeRows(R)(R rows)Write a sequence of rows to the file using a `CsvWriter` under the hood.CsvResult!bool appendRow(const string[] fields)Append a single row given as `string[]`. See `CsvWriter.writeRow`.CsvResult!bool appendRow(Args...)(auto ref Args args) if (!(Args.length == 1 && (is(Args[0] == string[]) || is(Args[0] == const(string)[]) || is(Args[0] == FieldView[]) || is(
Args[0] == RowView))))Variadic convenience to append a heterogeneous row, e.g., `appendRow("id", 42, true)`.CsvResult!(CsvReader!(const(char)[])) _bufferedReader()Buffered non-mapped reading using BufferManager.this(string path, CsvOpenMode mode = CsvOpenMode.read,
CsvDialect dialect = CsvDialect.init, bool memoryMapped = false)Construct a `CsvFile` with the given parameters.~thisDestructor: best-effort close of the file handle.FileSinkFunctions 4
bool ddnCanMemoryMapSize(size_t fileSize) @safe pure nothrow @nogcDecide whether memory mapping should be attempted for a file of `fileSize` bytes on the current platform.HeaderIndex makeHeaderIndex(RowView headerRow, bool caseSensitive = true) @safeConvenience: build a `HeaderIndex` from a parsed header `RowView`.CsvResult!T fromCsv(T)(FieldView f) @safe nothrow @nogcConvert a `FieldView` to the requested type `T` lazily, without allocations.auto byRows(Source)(Source source, CsvDialect dialect = CsvDialect.init) @safe nothrow @nogcHelper to construct a `CsvReader` range over rows for common sources.Variables 8
DDN_MAX_MAP_SIZE32 = cast(size_t) 1_500_000_000ULMaximum file size to attempt memory mapping on 32‑bit platforms (heuristic).
Mapping very large files on 32‑bit OSes can fail due to limited virtual address space. We apply a conservative cap (≈1.5 GB) and fall back to buffered reading beyond this size.
DEFAULT_DELIMITER = ','Default delimiter (RFC 4180).
DEFAULT_QUOTE = '"'Default quote character (RFC 4180).
DEFAULT_DOUBLE_QUOTE = trueDefault: interpret doubled quotes inside quoted fields.
DEFAULT_TRIM_WHITESPACE = falseDefault: do not trim whitespace in unquoted fields.
DEFAULT_NEWLINE_POLICY = NewlinePolicy.DETECTDefault newline policy: detect CRLF and LF on read.
DEFAULT_ESCAPE_STYLE = EscapeStyle.NONEDefault escape style: RFC only.
DEFAULT_HEADER = falseDefault: header row absent.
Templates 1
Trait that evaluates to true when R provides a read(ubyte[]) method returning the number of bytes read (size_t). This matches common D I/O patterns (e.g., file descriptors, custom sources) and enables our buffered reader to minimize read calls.