ddn.compressor.bzip2
ddn.compressor.bzip2
High-performance BZIP2 compression provider for ddn.api.compressor. This is a pure D implementation of the BZIP2 algorithm designed to match the performance of the system bzip2 tool.
The BZIP2 algorithm consists of the following stages:
- Run-Length Encoding (RLE1): Initial pass limiting runs to 255.
- Burrows-Wheeler Transform (BWT): Reversible block sort.
- Move-To-Front (MTF): Exploits locality in BWT output.
- Zero Run-Length Encoding (RLE2): Encodes zeros as RUNA/RUNB.
- Huffman Coding: Multiple tables with selectors.
Features:
- Supports both streaming and block compression.
- Compression levels map to bzip2's blockSize100k (1-9).
- Fully DDoc-documented.
- All public functions and types have unittests.
Copyright
Module Initializers 1
()Types 13
BWT transform result.
ubyte[] outputuint origPtrMTF transform result.
ubyte[] outputbool[256] inUseint nInUseRLE2 encode result.
ushort[] symbolsint nInUseHuffman encoding table entry.
uint codeubyte lenHuffman decoding table.
uint[] limituint[] baseuint[] permint minLenint maxLenBit writer for MSB-first bit packing.
ubyte[] buffersize_t posuint bsBuffint bsLiveconst(ubyte)[] consumeOutput() nothrow @nogc @safeFlush complete bytes and return them, keeping partial bits in buffer.Bit reader for MSB-first bit unpacking.
const(ubyte)[] datasize_t bytePosint bitPosuint bsBuffint bsLiveCompressed block result.
ubyte[] datauint blockCrcDetermine number of Huffman tables based on data size.
Parameters
nMTF | Number of MTF symbols. |
Returns
Result of multi-table Huffman selection.
ubyte[][] lengthsuint[][] codesint[] minLensint[] maxLensubyte[] selectorsubyte[] selectorsMtfint nGroupsint nSelectorsAssign canonical Huffman codes from code lengths.
Matches bzip2's BZ2_hbAssignCodes: codes assigned in order of increasing length, and within the same length, in symbol index order.
uint[] codesint minLenint maxLenDecompressed block result.
ubyte[] datauint blockCrcbool isEosBZIP2 streaming compressor.
CompressionOptions _optsOutputSink _sinkProgressCallback _progressubyte[] _blockBuffersize_t _blockPosulong _bytesInulong _bytesOutuint _combinedCrcbool _headerWrittenbool _finishedint _blockSize100kuint _blockSizeBitWriter _bitWritervoid setOutputSink(OutputSink sink)void setProgressCallback(ProgressCallback callback)void write(const(ubyte)[] data)void finish()void reset()void writeHeader()void flushBlock()void writeTrailer()this(CompressionOptions opts)Create a new BZIP2 compressor.BZIP2 streaming decompressor.
DecompressionOptions _optsOutputSink _sinkProgressCallback _progressubyte[] _inBuffersize_t _inPosulong _bytesInulong _bytesOutuint _combinedCrcuint _expectedCombinedCrcint _blockSize100kbool _headerParsedbool _finishedBitReader _bitReaderbool _readerInitedvoid setOutputSink(OutputSink sink)void setProgressCallback(ProgressCallback callback)void write(const(ubyte)[] data)void finish()void reset()void processInput()bool parseHeader()bool processBlock()this(DecompressionOptions opts)Create a new BZIP2 decompressor.Functions 44
uint bz2Crc32Update(uint crc, ubyte b) pure nothrow @nogc @safeCompute BZIP2 CRC32 for a single byte.size_t rle1EncodeCalcSize(const(ubyte)[] input) pure nothrow @nogc @safeCalculate the size of RLE1 encoded output.size_t rle1Encode(const(ubyte)[] input, ubyte[] output) pure nothrow @nogc @safeEncode data using RLE1.int compareRotationsFrom(const(ubyte)[] block, size_t i, size_t j,
size_t startOffset) pure nothrow @nogcCompare two rotations starting from a given offset.int compareRotations(const(ubyte)[] block, size_t i, size_t j) pure nothrow @nogcCompare two rotations of the input block lexicographically.void bwtIntrosort(uint[] indices, const(ubyte)[] block,
size_t lo, size_t hi, size_t depthLimit) pure nothrowIntrosort implementation for BWT sorting.void bwtIntrosortFrom(uint[] indices, const(ubyte)[] block,
size_t lo, size_t hi, size_t depthLimit, size_t compareOffset) pure nothrowIntrosort implementation that skips already-compared prefix bytes.size_t bwtPartition(uint[] indices, const(ubyte)[] block,
size_t lo, size_t hi) pure nothrowPartition for quicksort using median-of-three pivot.size_t bwtPartitionFrom(uint[] indices, const(ubyte)[] block,
size_t lo, size_t hi, size_t compareOffset) pure nothrowPartition for quicksort with prefix skip optimization.void bwtSwap(uint[] indices, size_t i, size_t j) pure nothrow @nogcSwap two elements in the indices array.void bwtHeapsort(uint[] indices, const(ubyte)[] block,
size_t lo, size_t hi) pure nothrowHeapsort for BWT indices (fallback for pathological cases).void bwtHeapsortFrom(uint[] indices, const(ubyte)[] block,
size_t lo, size_t hi, size_t compareOffset) pure nothrowHeapsort with prefix skip optimization.void bwtSiftDown(uint[] indices, const(ubyte)[] block,
size_t lo, size_t start, size_t n) pure nothrowSift down operation for heapsort.void bwtSiftDownFrom(uint[] indices, const(ubyte)[] block,
size_t lo, size_t start, size_t n, size_t compareOffset) pure nothrowSift down with prefix skip optimization.void bwtInsertionSort(uint[] indices, const(ubyte)[] block,
size_t lo, size_t hi) pure nothrowInsertion sort for small ranges.void bwtInsertionSortFrom(uint[] indices, const(ubyte)[] block,
size_t lo, size_t hi, size_t compareOffset) pure nothrowInsertion sort with prefix skip optimization.void bwtRadixSortDeep(uint[] indices, const(ubyte)[] block,
uint lo, uint hi, int depth, uint[] temp) pure nothrowDeeper radix sort for large buckets with repetitive data. Uses single-byte radix at each level with efficient batch skipping.void bwtTernaryRadixSort(uint[] indices, const(ubyte)[] block,
uint lo, uint hi, int depth, uint[] temp) pure nothrowTernary radix quicksort for BWT indices.void bwtSort(uint[] indices, const(ubyte)[] block) pure nothrowSort an array of rotation indices based on the rotations they represent.BwtResult bwtTransform(const(ubyte)[] input) purePerform the Burrows-Wheeler Transform on the input data.ubyte[] mtfInverseTransform(const(ubyte)[] indices, const bool[256] inUse, int nInUse) pure @safeInverse MTF transform.Rle2Result rle2Encode(const(ubyte)[] mtfIndices, int nInUse) pure @safeEncode MTF output using RLE2 (zero run encoding).ubyte[] generateHuffmanLengths(const(uint)[] frequencies, int alphaSize, int maxLen = BZ_MAX_CODE_LEN) pureGenerate Huffman code lengths from frequencies.void heapPush(ref uint[] heapFreq, ref int[] heapNode, int heapSize, uint freq, int node) purePush a node onto the min-heap.void heapPop(ref uint[] heapFreq, ref int[] heapNode, int heapSize, ref uint freq, ref int node) purePop the minimum node from the min-heap.void limitCodeLengths(ref ubyte[] lengths, int alphaSize, int maxLength) pureLimit Huffman code lengths using a simple heuristic. Ensures no code exceeds maxLength by redistributing.HuffEncEntry[] buildHuffmanEncodeTable(const(ubyte)[] lengths) pure @safeBuild Huffman encoding table.HuffDecTable buildHuffmanDecodeTable(const(ubyte)[] lengths) pure @safeBuild Huffman decoding table.void initializeHuffmanTables(ref ubyte[][] lengths,
const(uint)[] mtfFreqs,
int nGroups,
int alphaSize,
int nMTF) pure nothrowInitialize Huffman table code lengths based on symbol frequency partitioning.ubyte[] mtfEncodeSelectors(const(ubyte)[] selectors, int nGroups) pure nothrowMTF-encode the selector sequence.HuffmanTableSelection selectHuffmanTables(const(ushort)[] symbols,
const(uint)[] mtfFreqs,
int alphaSize) pureSelect Huffman tables for MTF-encoded symbols using iterative optimization.CompressedBlock compressBlock(const(ubyte)[] input, int blockSize100k, uint originalDataCrc = 0)Compress a block of RLE1-encoded data into a BZIP2 block.CompressedBlock compressBlockToWriter(const(ubyte)[] input, int blockSize100k, uint originalDataCrc, BitWriter * bw)Compress a block into a provided BitWriter without flushing.Variables 15
ubyte[3] BZ_HEADER_MAGICBZIP2 stream header magic bytes ("BZh")
ubyte[6] BZ_BLOCK_MAGICBZIP2 block header magic (48 bits: digits of pi 3.14159265359)
ubyte[6] BZ_EOS_MAGICBZIP2 end-of-stream magic (48 bits: sqrt(pi) 1.77245385090)
BZ_MAX_ALPHA_SIZE = 258Maximum alphabet size for Huffman coding
BZ_MAX_GROUPS = 6Number of Huffman tables (2-6)
BZ_MAX_CODE_LEN = 23Number of Huffman code lengths
BZ_G_SIZE = 50Symbols per group for Huffman table selection
BZ_N_ITERS = 4Number of iterations for Huffman table optimization
BZ_N_OVERSHOOT = 34Overshoot for safe BWT comparison past block end
BZ_RUNA = 0RUNA symbol for zero run encoding
BZ_RUNB = 1RUNB symbol for zero run encoding
BZ_EOB = 0End of block symbol
BZ_LESSER_ICOST = 0Initial cost for symbols within a table's assigned range
BZ_GREATER_ICOST = 15Initial cost for symbols outside a table's assigned range
uint[256] BZ_CRC32_TABLE