The naive bayes training is one general primitive to acclerate multinomial naive bayes utilizing the advantage of high bandwidth in Xilinx FPGA.
The top diagram is shown as below. Workload is distributed based on LSBs of feature value of one sample to processing data path, so that each path can work independently.
The dispatcher and merge module is feed by compact data format in 64-bit stream. Each 64-bit can be compounded of 12-bit class (from 0), 20-bit feature (from 1) and 32-bit count value. And the input feature vector can be sparse or dense. The end -1
of each sample must be tagged in 20-bit feature slot.
The counter module is responsible for counting the number of times feature appears across all the sample. And the collect module will count the number of all feature for certain class. All statictis result in each data-path will be gather in the following module. Finally, the logarithm result of likelihood and prior probability will be streamed out, respectively.
The following figure has been shown as the top structure of naive bayes classfiler. The training model will stream in firstly before the actual prediction process. The whole training model will be cached on on-chip memory. Only the 32-bit count value in test sample would be streamed into the classfiler primitive. And only dense feature vector is supported. The matrix multiplication would be handled in the tree cluster module. The argmax module would predict the result for each sample.