The design for LZ compression kernel is shown in the following figure:
The following is a description of the LZ based compression kernel design process:
- Input data is divided into muliple blocks with a 64K default block size (user configurable). Each block is sent to an individual compression engine to compress concurrently in a round-robin fashion.
- The Input unit (mm2s block) reads the uncompressed blocks from the global memory(512 bit wide) and distributes them across multiple parallel compression engines. The Output unit (s2mm block) reads compressed block from the compression engines and writes to the global memory.
- Each Compression engine contains a series of sub-modules, which process data parallelly and work in a pipelined fashion. Each sub-module transfers data to the next module using the HLS streams. Each sub-module is designed to process 1 byte/clock cycle, which along with pipelined processing, makes the throughput of each compression a 1 byte/clock cycle.
- Data read from the global memory is converted to a byte stream by the mm2s block and back to memory mapped from stream by the s2mm block for writing to the global memory.
The compression engine design remains same for all LZ based compression algorithms. The only difference is the Encoding sub-module in compression engine module, which is unique for each algorithm.