ddn.compressor.bzip2

High-performance BZIP2 compression provider for ddn.api.compressor. This is a pure D implementation of the BZIP2 algorithm designed to match the performance of the system bzip2 tool.

The BZIP2 algorithm consists of the following stages:

Run-Length Encoding (RLE1): Initial pass limiting runs to 255.
Burrows-Wheeler Transform (BWT): Reversible block sort.
Move-To-Front (MTF): Exploits locality in BWT output.
Zero Run-Length Encoding (RLE2): Encodes zeros as RUNA/RUNB.
Huffman Coding: Multiple tables with selectors.

Features:

Supports both streaming and block compression.
Compression levels map to bzip2's blockSize100k (1-9).
Fully DDoc-documented.
All public functions and types have unittests.

Copyright

BSD-3-Clause

var BZ_BLOCK_MAGIC BZ_CRC32_TABLE BZ_EOB BZ_EOS_MAGIC BZ_G_SIZE BZ_GREATER_ICOST BZ_HEADER_MAGIC BZ_LESSER_ICOST BZ_MAX_ALPHA_SIZE BZ_MAX_CODE_LEN BZ_MAX_GROUPS BZ_N_ITERS BZ_N_OVERSHOOT BZ_RUNA BZ_RUNB

struct BitReader BitWriter BwtResult CanonicalCodes CompressedBlock DecompressedBlock HuffDecTable HuffEncEntry HuffmanTableSelection MtfResult Rle2Result

class Bzip2Compressor Bzip2Decompressor

fn assignCanonicalCodes buildHuffmanDecodeTable buildHuffmanEncodeTable bwtHeapsort bwtHeapsortFrom bwtInsertionSort bwtInsertionSortFrom bwtIntrosort bwtIntrosortFrom bwtInverseTransform bwtLog2 bwtPartition bwtPartitionFrom bwtRadixSortDeep bwtSiftDown bwtSiftDownFrom bwtSort bwtSwap bwtTernaryRadixSort bwtTransform bz2Crc32 bz2Crc32Update compareRotations compareRotationsFrom compressBlock compressBlockToWriter decompressBlock determineNumGroups generateHuffmanLengths heapPop heapPush initializeHuffmanTables limitCodeLengths makeBzip2Compressor makeBzip2Decompressor mtfEncodeSelectors mtfInverseTransform mtfTransform rle1Decode rle1Encode rle1EncodeCalcSize rle2Decode rle2Encode selectHuffmanTables

Module Initializers 1

shared static this()

Types 13

structBwtResult

BWT transform result.

Fields

ubyte[] output

uint origPtr

structMtfResult

MTF transform result.

Fields

ubyte[] output

bool[256] inUse

int nInUse

structRle2Result

RLE2 encode result.

Fields

ushort[] symbols

int nInUse

structHuffEncEntry

Huffman encoding table entry.

Fields

uint code

ubyte len

structHuffDecTable

Huffman decoding table.

Fields

uint[] limit

uint[] base

uint[] perm

int minLen

int maxLen

structBitWriter

Bit writer for MSB-first bit packing.

Fields

ubyte[] buffer

size_t pos

uint bsBuff

int bsLive

Methods

void init(size_t initialSize = 4096) pure @safeInitialize the writer.

void ensureCapacity(size_t extra) pure @safeEnsure buffer has space.

void writeBits(int n, uint value) pure nothrow @nogc @safeWrite n bits from value (MSB-first).

void writeBit(uint bit) pure nothrow @nogc @safeWrite a single bit.

void writeByte(ubyte b) pure nothrow @nogc @safeWrite a byte (8 bits).

void writeUInt24(uint value) pure nothrow @nogc @safeWrite a 24-bit value.

void writeUInt32(uint value) pure nothrow @nogc @safeWrite a 32-bit value.

void writeUnary(int count) pure nothrow @nogc @safeWrite unary encoded value.

void writeBitmap16(ushort bitmap) pure nothrow @nogc @safeWrite bitmap for 16 symbols.

void flush() pure nothrow @nogc @safeFlush remaining bits to output.

const(ubyte)[] getOutput() pure nothrow @nogc @safe returnGet output bytes.

const(ubyte)[] consumeOutput() nothrow @nogc @safeFlush complete bytes and return them, keeping partial bits in buffer.

void reset() pure nothrow @nogc @safeReset to initial state.

structBitReader

Bit reader for MSB-first bit unpacking.

Fields

const(ubyte)[] data

size_t bytePos

int bitPos

uint bsBuff

int bsLive

Methods

void setInput(const(ubyte)[] input) pure nothrow @nogc @safeSet input data.

uint readBits(int n) pure nothrow @nogc @safeRead n bits.

uint readBit() pure nothrow @nogc @safeRead a single bit.

ubyte readByte() pure nothrow @nogc @safeRead a byte.

uint readUInt24() pure nothrow @nogc @safeRead a 24-bit value.

uint readUInt32() pure nothrow @nogc @safeRead a 32-bit value.

bool hasBits(int n) pure nothrow @nogc @safeCheck if we have n bits available.

size_t bytesConsumed() pure nothrow @nogc @safeGet bytes consumed.

structCompressedBlock

Compressed block result.

Fields

ubyte[] data

uint blockCrc

private structHuffmanTableSelection

Determine number of Huffman tables based on data size.

Parameters

nMTF Number of MTF symbols.

Returns

Number of tables (2-6).

Result of multi-table Huffman selection.

Fields

ubyte[][] lengths

uint[][] codes

int[] minLens

int[] maxLens

ubyte[] selectors

ubyte[] selectorsMtf

int nGroups

int nSelectors

private structCanonicalCodes

Assign canonical Huffman codes from code lengths.

Matches bzip2's BZ2_hbAssignCodes: codes assigned in order of increasing length, and within the same length, in symbol index order.

Fields

uint[] codes

int minLen

int maxLen

structDecompressedBlock

Decompressed block result.

Fields

ubyte[] data

uint blockCrc

bool isEos

classBzip2Compressor : Compressor

BZIP2 streaming compressor.

Fields

CompressionOptions _opts

OutputSink _sink

ProgressCallback _progress

ubyte[] _blockBuffer

size_t _blockPos

ulong _bytesIn

ulong _bytesOut

uint _combinedCrc

bool _headerWritten

bool _finished

int _blockSize100k

uint _blockSize

BitWriter _bitWriter

Methods

CompressionOptions options() const pure nothrow

void setOutputSink(OutputSink sink)

void setProgressCallback(ProgressCallback callback)

void write(const(ubyte)[] data)

void flush(FlushMode mode = FlushMode.SYNC)

void finish()

void reset()

ulong bytesInTotal() const pure nothrow

ulong bytesOutTotal() const pure nothrow

bool setDictionary(const(ubyte)[] dict) pure nothrow

bool isFinished() const pure nothrow

private void writeHeader()

private void flushBlock()

private void writeTrailer()

Constructors

this(CompressionOptions opts)Create a new BZIP2 compressor.

classBzip2Decompressor : Decompressor

BZIP2 streaming decompressor.

Fields

DecompressionOptions _opts

OutputSink _sink

ProgressCallback _progress

ubyte[] _inBuffer

size_t _inPos

ulong _bytesIn

ulong _bytesOut

uint _combinedCrc

uint _expectedCombinedCrc

int _blockSize100k

bool _headerParsed

bool _finished

BitReader _bitReader

bool _readerInited

Methods

DecompressionOptions options() const pure nothrow

void setOutputSink(OutputSink sink)

void setProgressCallback(ProgressCallback callback)

void write(const(ubyte)[] data)

void finish()

void reset()

ulong bytesInTotal() const pure nothrow

ulong bytesOutTotal() const pure nothrow

bool setDictionary(const(ubyte)[] dict) pure nothrow

bool isFinished() const pure nothrow

private void processInput()

private bool parseHeader()

private bool processBlock()

Constructors

this(DecompressionOptions opts)Create a new BZIP2 decompressor.

Functions 44

fnuint bz2Crc32Update(uint crc, ubyte b) pure nothrow @nogc @safeCompute BZIP2 CRC32 for a single byte.

fnuint bz2Crc32(const(ubyte)[] data) pure nothrow @nogc @safeCompute BZIP2 CRC32 for a buffer.

fnsize_t rle1EncodeCalcSize(const(ubyte)[] input) pure nothrow @nogc @safeCalculate the size of RLE1 encoded output.

fnsize_t rle1Encode(const(ubyte)[] input, ubyte[] output) pure nothrow @nogc @safeEncode data using RLE1.

fnubyte[] rle1Decode(const(ubyte)[] input) pure @safeDecode RLE1 encoded data.

private fn

int compareRotationsFrom(const(ubyte)[]  block,  size_t i,  size_t j,
       size_t startOffset)

pure nothrow @nogcCompare two rotations starting from a given offset.

private fnint compareRotations(const(ubyte)[] block, size_t i, size_t j) pure nothrow @nogcCompare two rotations of the input block lexicographically.

private fnsize_t bwtLog2(size_t n) pure nothrow @nogcCompute floor(log2(n)) for depth limiting.

private fn

void bwtIntrosort(uint[]  indices,  const(ubyte)[]  block,
       size_t lo,  size_t hi,  size_t depthLimit)

pure nothrowIntrosort implementation for BWT sorting.

private fn

void bwtIntrosortFrom(uint[]  indices,  const(ubyte)[]  block,
       size_t lo,  size_t hi,  size_t depthLimit,  size_t compareOffset)

pure nothrowIntrosort implementation that skips already-compared prefix bytes.

private fn

size_t bwtPartition(uint[]  indices,  const(ubyte)[]  block,
       size_t lo,  size_t hi)

pure nothrowPartition for quicksort using median-of-three pivot.

private fn

size_t bwtPartitionFrom(uint[]  indices,  const(ubyte)[]  block,
       size_t lo,  size_t hi,  size_t compareOffset)

pure nothrowPartition for quicksort with prefix skip optimization.

private fnvoid bwtSwap(uint[] indices, size_t i, size_t j) pure nothrow @nogcSwap two elements in the indices array.

private fn

void bwtHeapsort(uint[]  indices,  const(ubyte)[]  block,
       size_t lo,  size_t hi)

pure nothrowHeapsort for BWT indices (fallback for pathological cases).

private fn

void bwtHeapsortFrom(uint[]  indices,  const(ubyte)[]  block,
       size_t lo,  size_t hi,  size_t compareOffset)

pure nothrowHeapsort with prefix skip optimization.

private fn

void bwtSiftDown(uint[]  indices,  const(ubyte)[]  block,
       size_t lo,  size_t start,  size_t n)

pure nothrowSift down operation for heapsort.

private fn

void bwtSiftDownFrom(uint[]  indices,  const(ubyte)[]  block,
       size_t lo,  size_t start,  size_t n,  size_t compareOffset)

pure nothrowSift down with prefix skip optimization.

private fn

void bwtInsertionSort(uint[]  indices,  const(ubyte)[]  block,
       size_t lo,  size_t hi)

pure nothrowInsertion sort for small ranges.

private fn

void bwtInsertionSortFrom(uint[]  indices,  const(ubyte)[]  block,
       size_t lo,  size_t hi,  size_t compareOffset)

pure nothrowInsertion sort with prefix skip optimization.

private fn

void bwtRadixSortDeep(uint[]  indices,  const(ubyte)[]  block,
       uint  lo,  uint  hi,  int  depth,  uint[]  temp)

pure nothrowDeeper radix sort for large buckets with repetitive data. Uses single-byte radix at each level with efficient batch skipping.

private fn

void bwtTernaryRadixSort(uint[]  indices,  const(ubyte)[]  block,
       uint  lo,  uint  hi,  int  depth,  uint[]  temp)

pure nothrowTernary radix quicksort for BWT indices.

private fnvoid bwtSort(uint[] indices, const(ubyte)[] block) pure nothrowSort an array of rotation indices based on the rotations they represent.

fnBwtResult bwtTransform(const(ubyte)[] input) purePerform the Burrows-Wheeler Transform on the input data.

fnubyte[] bwtInverseTransform(const(ubyte)[] input, uint origPtr) pure @safeInverse BWT.

fnMtfResult mtfTransform(const(ubyte)[] input) pure @safePerform MTF transform.

fnubyte[] mtfInverseTransform(const(ubyte)[] indices, const bool[256] inUse, int nInUse) pure @safeInverse MTF transform.

fnRle2Result rle2Encode(const(ubyte)[] mtfIndices, int nInUse) pure @safeEncode MTF output using RLE2 (zero run encoding).

fnubyte[] rle2Decode(const(ushort)[] symbols, int nInUse) pure @safeDecode RLE2 to MTF indices.

fnubyte[] generateHuffmanLengths(const(uint)[] frequencies, int alphaSize, int maxLen = BZ_MAX_CODE_LEN) pureGenerate Huffman code lengths from frequencies.

private fnvoid heapPush(ref uint[] heapFreq, ref int[] heapNode, int heapSize, uint freq, int node) purePush a node onto the min-heap.

private fnvoid heapPop(ref uint[] heapFreq, ref int[] heapNode, int heapSize, ref uint freq, ref int node) purePop the minimum node from the min-heap.

private fnvoid limitCodeLengths(ref ubyte[] lengths, int alphaSize, int maxLength) pureLimit Huffman code lengths using a simple heuristic. Ensures no code exceeds maxLength by redistributing.

fnHuffEncEntry[] buildHuffmanEncodeTable(const(ubyte)[] lengths) pure @safeBuild Huffman encoding table.

fnHuffDecTable buildHuffmanDecodeTable(const(ubyte)[] lengths) pure @safeBuild Huffman decoding table.

private fnCanonicalCodes assignCanonicalCodes(const(ubyte)[] lengths) pure

private fn

void initializeHuffmanTables(ref  ubyte[][]  lengths,
       const(uint)[]  mtfFreqs,
       int  nGroups,
       int  alphaSize,
       int  nMTF)

pure nothrowInitialize Huffman table code lengths based on symbol frequency partitioning.

private fnubyte[] mtfEncodeSelectors(const(ubyte)[] selectors, int nGroups) pure nothrowMTF-encode the selector sequence.

private fn

HuffmanTableSelection selectHuffmanTables(const(ushort)[]  symbols,
       const(uint)[]  mtfFreqs,
       int  alphaSize)

pureSelect Huffman tables for MTF-encoded symbols using iterative optimization.

fnint determineNumGroups(int nMTF) pure nothrow @nogc

fnCompressedBlock compressBlock(const(ubyte)[] input, int blockSize100k, uint originalDataCrc = 0)Compress a block of RLE1-encoded data into a BZIP2 block.

private fn

CompressedBlock compressBlockToWriter(const(ubyte)[]  input,  int  blockSize100k,  uint  originalDataCrc,  BitWriter *  bw)

Compress a block into a provided BitWriter without flushing.

fnDecompressedBlock decompressBlock(ref BitReader reader)Decompress a single block.

fnCompressor makeBzip2Compressor(CompressionOptions opts)Create a BZIP2 compressor.

fnDecompressor makeBzip2Decompressor(DecompressionOptions opts)Create a BZIP2 decompressor.

Variables 15

private varubyte[3] BZ_HEADER_MAGIC

BZIP2 stream header magic bytes ("BZh")

private varubyte[6] BZ_BLOCK_MAGIC

BZIP2 block header magic (48 bits: digits of pi 3.14159265359)

private varubyte[6] BZ_EOS_MAGIC

BZIP2 end-of-stream magic (48 bits: sqrt(pi) 1.77245385090)

private enumvarBZ_MAX_ALPHA_SIZE = 258

Maximum alphabet size for Huffman coding

private enumvarBZ_MAX_GROUPS = 6

Number of Huffman tables (2-6)

private enumvarBZ_MAX_CODE_LEN = 23

Number of Huffman code lengths

private enumvarBZ_G_SIZE = 50

Symbols per group for Huffman table selection

private enumvarBZ_N_ITERS = 4

Number of iterations for Huffman table optimization

private enumvarBZ_N_OVERSHOOT = 34

Overshoot for safe BWT comparison past block end

private enumvarBZ_RUNA = 0

RUNA symbol for zero run encoding

private enumvarBZ_RUNB = 1

RUNB symbol for zero run encoding

private enumvarBZ_EOB = 0

End of block symbol

private enumvarBZ_LESSER_ICOST = 0

Initial cost for symbols within a table's assigned range

private enumvarBZ_GREATER_ICOST = 15

Initial cost for symbols outside a table's assigned range

private varuint[256] BZ_CRC32_TABLE