Alveo U280 Data Accelerator Card - 1.4.1 English

Vitis AI Library User Guide (UG1354)

Document ID
UG1354
Release Date
2021-12-11
Version
1.4.1 English

The Xilinx® Alveo U280 Data Center accelerator cards are peripheral component interconnect express ( PCIe® ) Gen3x16 compliant and Gen4x8 compatible cards featuring the Xilinx 16 nm UltraScale+ technology. In this release, DPU is implemented in program logic for deep learning inference acceleration.

Note: Some models cannot run at the highest frequency of DPU and need DPU frequency reduction. See the For Edge for DPU frequency reduction operation.

U280 Performance with 14E300 MHz DPUCAHX8H

Refer to the following table for the throughput performance (in frames/sec or fps) for various neural network samples on U280 with DPUCAHX8H running at 14E@300 MHz.

Table 1. U280 Performance with 14E300 MHz DPUCAHX8H
No Neural Network Input Size GOPS DPU Frequency (MHz) Performance (fps) (Multiple thread)
1 densebox_320_320 320x320 0.49 300x0.5 4845.4
2 densebox_640_360 360x640 1.1 300x0.5 2135.5
3 ENet_cityscapes_pt 512x1024 8.6 300x0.5 123.0
4 face_landmark 96x72 0.14 300x0.5 12527.7
5 face-quality 80x60 0.06 300x0.5 28736.4
6 face-quality_pt 80x60 0.06 300x0.5 28616.8
7 facerec_resnet20 112x96 3.5 300x0.5 1599.5
8 facerec-resnet20_mixed_pt 112x96 3.5 300x0.5 1598.9
9 facerec_resnet64 112x96 11 300x0.5 579.2
10 facereid-large_pt 96x96 0.5 300x0.5 11573.5
11 facereid-small_pt 80x80 0.09 300x0.5 30606.7
12 fpn 256x512 8.9 300x0.5 478.4
13 FPN_Res18_Medical_segmentation 320x320 45.3 300x0.5 116.6
14 FPN-resnet18_covid19-seg_pt 352x352 22.7 300x0.5 263.5
15 inception_resnet_v2_tf 299x299 26.4 300x0.5 192.5
16 inception_v1 224x224 3.2 300x0.5 1378.1
17 inception_v1_tf 224x224 3 300x0.5 1430.6
18 inception_v2 224x224 4 300x0.5 1100.6
19 inception_v3 299x299 11.4 300x0.5 443.8
20 inception_v3_pt 299x299 5.7 300x0.5 444.2
21 inception_v3_tf 299x299 11.5 300x0.5 444.3
22 inception_v3_tf2 299x299 11.5 300x0.5 454.1
23 inception_v4 299x299 24.5 300x0.5 207.9
24 inception_v4_2016_09_09_tf 299x299 24.6 300x0.5 208.2
25 medical_seg_cell_tf2 128x128 5.3 300x0.5 1287.3
26 MLPerf_resnet50_v1.5_tf 224x224 8.19 300x0.5 645.7
27 mlperf_ssd_resnet34_tf 1200x1200 433 300x0.5 16.4
28 multi_task 288x512 14.8 300x0.5 353.8
29 openpose_pruned_0_3 368x368 49.9 300x0.5 37.4
30 personreid-res18_pt 176x80 1.1 300x0.5 4540.5
31 personreid-res50_pt 256x128 5.4 300x0.5 1042.6
32 plate_detection 320x320 0.49 300x0.5 8251.1
33 plate_num 96x288 1.75 300x0.5 1497.6
34 pmg_pt 224x224 2.28 300x0.5 1270.3
35 refinedet_baseline 480x360 123 300x0.5 60.5
36 RefineDet-Medical_EDD_tf 320x320 9.8 300x0.5 513.7
37 refinedet_pruned_0_8 360x480 25 300x0.5 220.3
38 refinedet_pruned_0_92 360x480 10.1 300x0.5 458.3
39 refinedet_pruned_0_96 360x480 5.1 300x0.5 647.4
40 refinedet_VOC_tf 320x320 81.9 300x0.5 87.9
41 reid 80x160 0.95 300x0.5 4770.5
42 resnet18 224x224 3.7 300x0.5 1658.3
43 resnet50 224x224 7.7 300x0.6 800.1
44 resnet50_pt 224x224 4.1 300x0.6 774.9
45 resnet50_tf2 224x224 7.7 300x0.5 666.2
46 resnet_v1_101_tf 224x224 14.4 300x0.5 389.0
47 resnet_v1_152_tf 224x224 21.8 300x0.5 259.4
48 resnet_v1_50_tf 224x224 7 300x0.5 750.0
49 salsanext_pt 64x2048 20.4 300x0.9 108.5
50 salsanext_v2_pt 64x2048 32 300x0.5 54.2
51 SemanticFPN_cityscapes_pt 256x512 10 300x0.5 540.2
52 semantic_seg_citys_tf2 512x1024 54 300x0.5 67.1
53 sp_net 128x224 0.55 300x0.5 4071.6
54 squeezenet 227x227 0.76 300x0.5 3961.0
55 squeezenet_pt 224x224 0.82 300x0.5 2187.5
56 ssd_adas_pruned_0_95 360x480 6.3 300x0.5 656.4
57 ssd_pedestrian_pruned_0_97 360x360 5.9 300x0.5 575.9
58 ssd_resnet_50_fpn_coco_tf 640x640 178.4 300x0.5 37.2
59 ssd_traffic_pruned_0_9 360x480 11.6 300x0.5 435.2
60 tiny_yolov3_vmss 416x416 5.46 300x0.5 1040.8
61 unet_chaos-CT_pt 512x512 23.3 300x0.5 132.5
62 vgg_16_tf 224x224 31 300x0.5 188.6
63 vgg_19_tf 224x224 39.3 300x0.5 157.5
64 vpgnet_pruned_0_99 480x640 2.5 300x0.5 729.8
65 yolov2_voc 448x448 34 300x0.5 202.9
66 yolov2_voc_pruned_0_66 448x448 11.6 300x0.5 499.6
67 yolov2_voc_pruned_0_71 448x448 9.9 300x0.5 582.3
68 yolov2_voc_pruned_0_77 448x448 7.8 300x0.5 694.9
69 yolov3_adas_pruned_0_9 256x512 5.5 300x0.5 817.1
70 yolov3_bdd 288x512 53.7 300x0.5 94.0
71 yolov3_voc 416x416 65.4 300x0.5 97.3
72 yolov3_voc_tf 416x416 65.6 300x0.5 97.2
73 yolov4_leaky_spp_m 416x416 60.1 300x0.5 100.3
74 yolov4_leaky_spp_m_pruned_0_36 416x416 38.2 300x0.5 123.5

U280 Performance with 2E250 MHz DPUCAHX8L

Refer to the following table for the throughput performance (in frames/sec or fps) for various neural network samples on U280 with DPUCAHX8L running at 2E@250 MHz.

Table 2. U280 Performance with 2E250 MHz DPUCAHX8L
No Neural Network Input Size GOPS DPU Frequency (MHz) Performance (fps) (Multiple thread)
1 ENet_cityscapes_pt 512x1024 8.6 250 8.1
2 face_landmark 96x72 0.14 250 9381.2
3 face-quality 80x60 0.06 250 13875.5
4 face-quality_pt 80x60 0.06 250 13837.3
5 facerec_resnet20 112x96 3.5 250 477.2
6 facerec-resnet20_mixed_pt 112x96 3.5 250 475.5
7 facerec_resnet64 112x96 11 250 228.3
8 facereid-small_pt 80x80 0.09 250 8875.4
9 fpn 256x512 8.9 250 49.9
10 FPN_Res18_Medical_segmentation 320x320 45.3 250 16.5
11 FPN-resnet18_covid19-seg_pt 352x352 22.7 250 128.5
12 inception_resnet_v2_tf 299x299 26.4 250 63.9
13 inception_v1 224x224 3.2 250 640.9
14 inception_v1_tf 224x224 3 250 651.9
15 inception_v2 224x224 3.88 250 339.1
16 inception_v3 299x299 11.4 250 188.5
17 inception_v3_pt 299x299 5.7 250 188.0
18 inception_v3_tf 299x299 11.5 250 188.9
19 inception_v3_tf2 299x299 11.5 250 180.0
20 inception_v4 299x299 24.5 250 102.7
21 inception_v4_2016_09_09_tf 299x299 24.6 250 102.8
22 medical_seg_cell_tf2 128x128 5.3 250 136.5
23 MLPerf_resnet50_v1.5_tf 224x224 8.19 250 161.8
24 mlperf_ssd_resnet34_tf 1200x1200 433 250 12.4
25 mobilenet_1_0_224_tf2 224x224 1.1 250 3426.1
26 mobilenet_v1_0_5_160_tf 160x160 0.15 250 9877.1
27 mobilenet_v1_1_0_224_tf 224x224 1.1 250 3485.4
28 mobilenet_v2 224x224 0.6 250 2042.4
29 mobilenet_v2_1_0_224_tf 224x224 0.6 250 2038.4
30 mobilenet_v2_1_4_224_tf 224x224 1.2 250 1493.0
31 multi_task 288x512 14.8 250 25.7
32 openpose_pruned_0_3 368x368 49.9 250 28.0
33 personreid-res50_pt 256x128 5.4 250 186.6
34 plate_detection 320x320 0.49 250 2277.9
35 rcan_pruned_tf 360x640 86.95 250 7.7
36 refinedet_baseline 480x360 123 250 51.9
37 RefineDet-Medical_EDD_tf 320x320 9.8 250 244.6
38 refinedet_pruned_0_8 360x480 25 250 111.9
39 refinedet_pruned_0_92 360x480 10.1 250 130.7
40 refinedet_pruned_0_96 360x480 5.1 250 156.1
41 refinedet_VOC_tf 320x320 81.9 250 79.2
42 reid 80x160 0.95 250 922.6
43 resnet18 224x224 3.7 250 542.0
44 resnet50 224x224 7.7 250 155.6
45 resnet50_pt 224x224 4.1 250 156.8
46 resnet50_tf2 224x224 7.7 250 155.5
47 resnet_v1_101_tf 224x224 14.4 250 102.0
48 resnet_v1_152_tf 224x224 21.8 250 69.3
49 resnet_v1_50_tf 224x224 7 250 178.6
50 retinaface 360x640 1.11 250 278.4
51 salsanext_pt 64x2048 20.4 250 19.3
52 SemanticFPN_cityscapes_pt 256x512 10 250 43.6
53 SemanticFPN_Mobilenetv2_pt 512x1024 5.4 250 12.7
54 semantic_seg_citys_tf2 512x1024 54 250 7.7
55 sp_net 128x224 0.55 250 12681.8
56 squeezenet 227x227 0.76 250 1541.1
57 squeezenet_pt 224x224 0.82 250 942.5
58 ssd_adas_pruned_0_95 360x480 6.3 250 200.3
59 ssdlite_mobilenet_v2_coco_tf 300x300 1.5 250 993.4
60 ssd_mobilenet_v1_coco_tf 300x300 2.5 250 1559.3
61 ssd_mobilenet_v2 360x480 6.6 250 230.7
62 ssd_mobilenet_v2_coco_tf 300x300 3.8 250 374.0
63 ssd_pedestrian_pruned_0_97 360x360 5.9 250 189.1
64 ssd_traffic_pruned_0_9 360x480 11.6 250 201.7
65 vgg_16_tf 224x224 31 250 106.2
66 vgg_19_tf 224x224 39.3 250 93.7
67 vpgnet_pruned_0_99 480x640 2.5 250 114.5