The CSV parser is implemented as a multiple PU architecture to provide high throughput. The diagram is illustrated as follows.
The full CSV file should first be loaded in a compacted buffer. For the parallel execution of each PU, the read block will divide the input file into several chunks by its size. Line parser is a FSM-based module to parse out each field at one byte per cycle. Also, all the trivial characters will be removed in this stage. For each data type input, there is one dedicated parse-unit to translate the raw bytes into its own value. At the final stage, each selected field will be merged into one full column before structuring into the output object-stream protocol.