Implementation - 2025.1 English

Vitis Libraries

Release Date
2025-06-04
Version
2025.1 English

The random forest framework is shown in the following figure:

Figure 1 Random Framewok architecture on FPGA

Separate random forest int sampling module and decision tree module. In sampling module, you design some different implementations of bagging methods, each of which implements different sampling methods. Instance sampling: withoutplacement instance sampling, sampling ratio can be set by user.

In particular, withplacement instance sampling is implemented by multi withoutplacement instance samplings.

Feature Quantize: To save bandwith and memory, you learn from spark binned dataset, quantize each feature into a fixed point integer.

Figure 2 Quantize modules on FPGA
In quantize method, each feauture value is binned into a fixed point integer; for example, if a feature value is 0.45, while the feature splitting array is [-0.1,0.25,0.50,0.80], the binning rule includes:
  1. < -0.1 -> 0
  2. -0.1 ~ 0.25 -> 1
  3. 0.25 ~ 0.50 -> 2
  4. 0.50 ~ 0.80 -> 3
  5. > 0.80 -> 4

so after quantizing, the binned feature value is 2 (0,25<0.45<0.5). In the quantize module, the main part is searching the inverval of each value, Figure 2 shows a detailed implementation by unrolling a binary search tree.

In the decision tree module, feature sampling support is added, so that each tree point reserves its own feature subsets. When splitting a tree node, it can only select the best feature from the specific subset. In current version, decision tree in random forest implements the quntize optimization for more kernel frequency and performance. You can get a special random forest with one single tree to implement a decision tree. Actually, decision tree from this method can ease IO bound compared with a non-quantized decision tree.

Figure 3 RF data header on FPGA
Figure 4 DT data header on FPGA

Figure 3 and Figure 4 shows the data layout in the ddr buffer. In Figure 3, you reserve last 288 bits in the data header for multi-seeds, by setting seedid, the function read corresponding seed from the header. After one call is done, the corresponding seed will write back an updated seed. The trick is for multi-sampling module kernel calls and PCIe® data write only once.

Figure 3 random forest tree based ping-pong mult-kernels calls

In general, you can only put 1~4 individual trees on board. Figure 5 shows the host implementaion mode of a random forest tree. In this mode, you can overlap the PCIe read, kernel call, and PCIe write, making the most of the kernel/hardware resources.

Caution

The current rf decision tree has the same limitations with the orignal decision tree. For thousands of features, the current framework cannot support all features saving in an individual tree, so a feature sampling module is implemented for further extention.