The detailed hardware architecture of the DPUCZDX8G is shown in the following figure. After start-up, the DPUCZDX8G fetches instructions from off-chip memory to control the operation of the computing engine. The instructions are generated by the Vitis™ AI compiler, which performs substantial optimizations including layer fusion.
On-chip memory is used to buffer input activations, intermediate feature-maps, and output meta-data to achieve high throughput and efficiency. The data is reused as much as possible to reduce external memory bandwidth requirements. A deep pipelined design is used for the computing engine. The processing elements (PEs) take full advantage of the fine-grained building blocks such as multipliers, adders, and accumulators in Xilinx devices.