Hardware JSON parser is implemented as multi-PU architecture to provide high throughput. The dataflow diagram of the kernel can be illustrated as follows:
The whole JSON file should first be preloaded to a compacted buffer. For the parallel execution of each PU, the reading block will automatically divide the input file into several chunks by its size. Line parser is a FSM-based module to parse out each key-value pair at the throughput of 1 byte/cycle. For array on leaf node, it labels each element with an incremental index; the last element will be labeled with all F’s to indicate the end of the array. For each value with a different data type, there is one dedicated parse-unit to translate the raw bytes into its own value. At the final stage, each selected field will be merged into one full column before structuring into the output object-stream protocol. Each row in the selected output columns corresponds to the specific input JSON line, so the missing key/element of a specific JSON line will be indicated by a null object.