Hardware JSON parser is implemented as multi-PU architecture to provide high throughput. The dataflow diagram of the kernel can be illustrated as below:
The whole JSON file should be pre-loaded to a compacted buffer firstly. For the parallel execution of each PU, the reading block will automatically divide the input file into several chunks by its size. Line parser is a FSM-based module to parse out each key-value pair at the throughput of 1 byte/cycle. For array on leaf node, it labels each element with incremental index, the last element will be labeled with all F’s to indicating the end of the array. For each value with different data type, there is one dedicated parse-unit to translate the raw bytes into its own value. At the final stage, each selected field will be merged into one full column before structuring into the output object-stream protocal. Each row in the selected output columns corresponding to the specific input JSON line, so the missing key/element of a specific JSON line will be indicated by a null object.