The detailed hardware architecture of the DPUCVDX8H is shown in the following figure. Each implement could have one DPU instance, and each DPU may have two, four, six, or eight processing engines instances, the number of DPU instances depends on FPGA resource.
The Conv computing unit is implemented on AI Engine. The Conv control unit, Load unit, and save unit are implemented in programmable logic. MISC unit (pooling and element-wise processing) is implemented on AI Engine or in programmable logic. All processing engines share the weight unit and scheduler unit, implemented with programmable logic. DRAM is used as system memory to store network instructions, input images, output results, and intermediate data. After bring-up, DPU fetches instructions from system memory to control the operations of the computing engine.