When different neural networks run on the DPUCZDX8G, the I/O bandwidth requirement will vary depending on which
neural network is currently being executed. Even the I/O bandwidth requirement of
different layers in one neural network will be different. The I/O bandwidth requirements
for some neural networks, averaged by layer, have been tested with one DPUCZDX8G
core running at full speed. The peak and
average I/O bandwidth requirements of three different neural networks are shown in the
table below. The table only shows the number of two commonly used DPUCZDX8G architectures (B1152 and B4096).
Note: When multiple
DPUCZDX8G
cores run in parallel, each core might not be able to run at full
speed due to the I/O bandwidth limitations.
Network Model | B1152 | B4096 | ||
---|---|---|---|---|
Peak (MB/s) | Average (MB/s) | Peak (MB/s) | Average (MB/s) | |
Inception-v1 | 1704 | 890 | 4626 | 2474 |
ResNet50 | 2052 | 1017 | 5298 | 3132 |
SSD ADAS VEHICLE | 1516 | 684 | 5724 | 2049 |
YOLO-V3-VOC | 2076 | 986 | 6453 | 3290 |
If one DPUCZDX8G core needs to run at full speed, the peak I/O bandwidth requirement shall be met. The I/O bandwidth is mainly used for accessing data though the AXI master interfaces (DPU0_M_AXI_DATA0 and DPU0_M_AXI_DATA1).