Meanwhile, benchmark results at 267MHz frequency on Alveo U200 board with 2019.2 shell are shown as below:
Dataset | samples | classes | features | Spark (4 threads) | Spark (8 threads) | Spark (16 threads) | Spark (32 threads) | Spark (56 threads) | FPGA (:ms) |
RCV1 | 697614 | 2 | 47236 | 6937 (18.6X) | 7751 (26.2X) | 5636 (12.6X) | 6500 (22.0X) | 5425 (12.2X) | 371 |
webspam | 350000 | 2 | 254 | 4676 (21.9X) | 5823 (22.6X) | 6869 (40.4X) | 5381 (20.1X) | 5848 (35.3X) | 214 |
news20 | 19928 | 20 | 62061 | 4249 (361X) | 4728 (453X) | 4256 (319X) | 4388 (332X) | 4308 (391X) | 12 |
Attention
For the training primitive, some padding-zero 64-bit data would be added into the input multi-channel data stream when the total length of feature vector
for all sample cannot be divided evenly by 8 including the ending -1
tag. And the multiplication of the number of class and feature cannot be greater than
2 million so far.
For the predict primitive, the sampe padding-zero 32-bit data would also be added when the length of feature vector for each sample cannot be divided evenly
by the number of channel. And the multiplication of the number of class and feature cannot be greater than 1 million.