The DPUCVDX8G is composed of the PL and the AI Engine. The AI Engine is used for the convolution operation in neural networks. Data transfers, instruction scheduling, pooling, element-wise sum, and depth-wise convolution are executed in the PL.
The DPUCVDX8G can be set up with multiple batch handlers. For each batch handler, there is a corresponding AI Engine array and related AI Engine interface tile resources. On the PL side, the DPUCVDX8G is split into two parts: batch handler and shared logic. The batch handler is mainly for the processing of the feature map, such as loading, saving, pooling, etc. The Arithmetic and Logic Unit (ALU) module in the batch handler can process the pooling, element-wise, and depth-wise convolution operations for the feature maps. Feature maps are stored in the IMG BANK which is composed of the on-chip ram. The image sender and weights sender modules are used for preparing the data for the AI Engine array. The shared logic in the PL component includes the Permuter module and the Scheduler module. The scheduler fetches and dispatches instructions from the NoC and transfers them to the batch handler and the Permuter module. The Permuter module loads the weights and bias from the NoC and sends the specific weights data to the AI Engine array for each calculation iteration. For more information, see Table 1.
The DPUCVDX8G also can be set up with multiple compute units to run different models simultaneously. Please note that, more PL NMU interfaces of NoC are required (refer to Table 1.)
After starting up, the DPUCVDX8G fetches instructions from the NoC to control the operation of the computing engine. The instructions are generated by the Vitis™ AI compiler, where substantial optimizations are performed.
On-chip memory is used to buffer input, intermediate, and output data to achieve high-throughput and efficiency. The data is reused to reduce the external memory bandwidth. A deeply pipelined design is used for the computing engine.
The detailed hardware architecture of the DPUCVDX8G is shown in the following figure.