Alveo U50LV Data Center Accelerator Card - 2.0 English

Vitis AI Library User Guide (UG1354)

Document ID
UG1354
Release Date
2022-01-20
Version
2.0 English

The Xilinx® Alveo™ U50LV data center accelerator cards are peripheral component interconnect express ( PCIe® ) Gen3x4 compliant cards featuring the Xilinx 16 nm UltraScale+ technology. In this release, the DPU is implemented in program logic for deep learning inference acceleration.

Note: Some models cannot run at the highest frequency of DPU and need DPU frequency reduction. See For Cloud (Alveo U50LV/U55C Cards, Versal VCK5000 Card) for DPU frequency reduction operation.

U50LV Performance with 10PE275 MHz DPUCAHX8H

The following table shows the throughput performance (in frames/sec or fps) for various neural network samples on U50LV Gen3x4 with DPUCAHX8H running at 10PE@275 MHz.

Table 1. U50LV Performance with 10PE275 MHz DPUCAHX8H
No Neural Network Input Size GOPS DPU Frequency (MHz) Performance (fps) (Multiple thread)
1 densebox_320_320 320x320 0.49 275*0.7 3040.52
2 densebox_640_360 360x640 1.1 275*0.7 1316.35
3 drunet_pt 528x608 2.59 275*0.7 372.39
4 ENet_cityscapes_pt 512x1024 8.6 275*0.7 95.44
5 face_landmark 96x72 0.14 275*0.7 11396.40
6 face-quality 80x60 0.06 275*0.7 21514.90
7 face-quality_pt 80x60 0.06 275*0.7 21385.40
8 facerec_resnet20 112x96 3.5 275*0.7 1453.98
9 facerec-resnet20_mixed_pt 112x96 3.5 275*0.7 1453.13
10 facerec_resnet64 112x96 11 275*0.7 530.59
11 facereid-large_pt 96x96 0.5 275*0.7 8417.59
12 facereid-small_pt 80x80 0.09 275*0.7 23548.70
13 FairMot_pt 640x480 36 275*0.6 160.70
14 fpn 256x512 8.9 275*0.7 474.75
15 FPN_Res18_Medical_segmentation 320x320 45.3 275*0.7 110.01
16 FPN-resnet18_covid19-seg_pt 352x352 22.7 275*0.7 247.60
17 inception_resnet_v2_tf 299x299 26.4 275*0.7 184.68
18 inception_v1 224x224 3.2 275*0.7 1340.78
19 inception_v1_tf 224x224 3 275*0.7 1359.83
20 inception_v2 224x224 4 275*0.7 1056.64
21 inception_v3 299x299 11.4 275*0.7 436.96
22 inception_v3_pt 299x299 5.7 275*0.7 437.18
23 inception_v3_tf 299x299 11.5 275*0.7 438.49
24 inception_v3_tf2 299x299 11.5 275*0.7 444.04
25 inception_v4 299x299 24.5 275*0.7 198.69
26 inception_v4_2016_09_09_tf 299x299 24.6 275*0.7 198.88
27 medical_seg_cell_tf2 128x128 5.3 275*0.7 1250.55
28 MLPerf_resnet50_v1.5_tf 224x224 8.19 275*0.7 596.07
29 mlperf_ssd_resnet34_tf 1200x1200 433 275*0.7 16.44
30 multi_task 288x512 14.8 275*0.7 361.93
31 ofa_resnet50_0_9B_pt 160x160 0.9 275*0.7 1767.30
32 openpose_pruned_0_3 368x368 49.9 275*0.7 34.78
33 person-orientation_pruned_558m_pt 176x80 0.558 275*0.7 6733.67
34 personreid-res18_pt 176x80 1.1 275*0.7 4052.47
35 personreid-res50_pt 256x128 5.4 275*0.7 960.45
36 plate_detection 320x320 0.49 275*0.7 6369.09
37 plate_num 96x288 1.75 275*0.7 1396.99
38 pmg_pt 224x224 2.28 275*0.7 1170.22
39 refinedet_baseline 480x360 123 275*0.7 59.07
40 RefineDet-Medical_EDD_tf 320x320 9.8 275*0.7 502.22
41 refinedet_pruned_0_8 360x480 25 275*0.7 235.88
42 refinedet_pruned_0_92 360x480 10.1 275*0.7 480.04
43 refinedet_pruned_0_96 360x480 5.1 275*0.7 703.73
44 refinedet_VOC_tf 320x320 81.9 275*0.7 84.72
45 reid 80x160 0.95 275*0.7 4270.81
46 resnet18 224x224 3.7 275*0.7 1514.25
47 resnet50 224x224 7.7 275*0.7 691.01
48 resnet50_pt 224x224 4.1 275*0.7 596.10
49 resnet50_tf2 224x224 7.7 275*0.7 691.62
50 resnet_v1_101_tf 224x224 14.4 275*0.7 358.87
51 resnet_v1_152_tf 224x224 21.8 275*0.7 239.28
52 resnet_v1_50_tf 224x224 7 275*0.7 690.78
53 salsanext_pt 64x2048 20.4 275*0.7 145.11
54 salsanext_v2_pt 64x2048 32 275*0.7 44.21
55 SemanticFPN_cityscapes_pt 256x512 10 275*0.7 505.79
56 semantic_seg_citys_tf2 512x1024 54 275*0.7 63.02
57 SESR_S_pt 360x640 7.48 275*0.7 184.14
58 sp_net 128x224 0.55 275*0.7 3594.50
59 squeezenet 227x227 0.76 275*0.7 3932.12
60 squeezenet_pt 224x224 0.82 275*0.7 4186.64
61 ssd_adas_pruned_0_95 360x480 6.3 275*0.7 731.10
62 ssd_pedestrian_pruned_0_97 360x360 5.9 275*0.7 678.48
63 ssd_resnet_50_fpn_coco_tf 640x640 178.4 275*0.7 37.78
64 ssd_traffic_pruned_0_9 360x480 11.6 275*0.7 461.08
65 tiny_yolov3_vmss 416x416 5.46 275*0.7 1035.44
66 tsd_yolox_pt 640x640 73 275*0.6 72.22
67 ultrafast_pt 288x800 8.4 275*0.7 288.60
68 unet_chaos-CT_pt 512x512 23.3 275*0.7 85.58
69 vgg_16_tf 224x224 31 275*0.7 176.15
70 vgg_19_tf 224x224 39.3 275*0.7 146.51
71 vpgnet_pruned_0_99 480x640 2.5 275*0.7 646.59
72 yolov2_voc 448x448 34 275*0.6 193.58
73 yolov2_voc_pruned_0_66 448x448 11.6 275*0.7 489.60
74 yolov2_voc_pruned_0_71 448x448 9.9 275*0.7 575.05
75 yolov2_voc_pruned_0_77 448x448 7.8 275*0.7 696.65
76 yolov3_adas_pruned_0_9 256x512 5.5 275*0.6 742.51
77 yolov3_bdd 288x512 53.7 275*0.6 89.10
78 yolov3_voc 416x416 65.4 275*0.6 78.56
79 yolov3_voc_tf 416x416 65.6 275*0.6 78.70
80 yolov4_leaky_spp_m 416x416 60.1 275*0.6 82.96
81 yolov4_leaky_spp_m_pruned_0_36 416x416 38.2 275*0.7 91.36

U50LV Performance with 8PE275 MHz DPUCAHX8H-DWC

The following table shows the throughput performance (in frames/sec or fps) for various neural network samples on U50LV Gen3x4 with DPUCAHX8H-DWC running at 8PE@275 MHz.

Table 2. U50LV Performance with 8PE275 MHz DPUCAHX8H-DWC
No Neural Network Input Size GOPS DPU Frequency (MHz) Performance (fps) (Multiple thread)
1 densebox_320_320 320x320 0.49 275x0.5 2180.70
2 densebox_640_360 360x640 1.1 275x0.5 954.51
3 drunet_pt 528x608 2.59 275x0.5 214.20
4 ENet_cityscapes_pt 512x1024 8.6 275x0.5 59.20
5 face_landmark 96x72 0.14 275x0.5 6770.42
6 face-quality 80x60 0.06 275x0.5 17806.60
7 face-quality_pt 80x60 0.06 275x0.5 17811.40
8 facerec_resnet20 112x96 3.5 275x0.5 837.13
9 facerec-resnet20_mixed_pt 112x96 3.5 275x0.5 838.06
10 facerec_resnet64 112x96 11 275x0.5 304.40
11 facereid-large_pt 96x96 0.5 275x0.5 4997.13
12 facereid-small_pt 80x80 0.09 275x0.5 15164.50
13 FairMot_pt 640x480 36 275x0.5 92.15
14 fpn 256x512 8.9 275x0.5 275.09
15 FPN_Res18_Medical_segmentation 320x320 45.3 275x0.5 63.27
16 FPN-resnet18_covid19-seg_pt 352x352 22.7 275x0.5 142.42
17 inception_resnet_v2_tf 299x299 26.4 275x0.5 106.01
18 inception_v1 224x224 3.2 275x0.5 776.98
19 inception_v1_tf 224x224 3 275x0.5 788.10
20 inception_v2 224x224 4 275x0.5 608.44
21 inception_v3 299x299 11.4 275x0.5 250.90
22 inception_v3_pt 299x299 5.7 275x0.5 251.21
23 inception_v3_tf 299x299 11.5 275x0.5 251.22
24 inception_v3_tf2 299x299 11.5 275x0.5 254.60
25 inception_v4 299x299 24.5 275x0.5 114.06
26 inception_v4_2016_09_09_tf 299x299 24.6 275x0.5 113.95
27 medical_seg_cell_tf2 128x128 5.3 275x0.5 719.06
28 MLPerf_resnet50_v1.5_tf 224x224 8.19 275x0.5 342.16
29 mlperf_ssd_resnet34_tf 1200x1200 433 275x0.5 9.41
30 mobilenet_1_0_224_tf2 224x224 1.1 275x0.5 2021.57
31 mobilenet_v1_0_25_128_tf 128x128 0.027 275x0.5 13994.40
32 mobilenet_v1_0_5_160_tf 160x160 0.15 275x0.5 7508.79
33 mobilenet_v1_1_0_224_tf 224x224 1.1 275x0.5 2021.15
34 mobilenet_v2 224x224 0.6 275x0.5 1883.93
35 mobilenet_v2_1_0_224_tf 224x224 0.6 275x0.5 1862.71
36 mobilenet_v2_1_4_224_tf 224x224 1.2 275x0.5 1252.61
37 multi_task 288x512 14.8 275x0.5 207.78
38 ofa_resnet50_0_9B_pt 160x160 0.9 275x0.5 1044.55
39 openpose_pruned_0_3 368x368 49.9 275x0.5 20.05
40 person-orientation_pruned_558m_pt 176x80 0.558 275x0.5 3981.36
41 personreid-res18_pt 176x80 1.1 275x0.5 2361.76
42 personreid-res50_pt 256x128 5.4 275x0.5 552.22
43 plate_detection 320x320 0.49 275x0.5 4798.49
44 plate_num 96x288 1.75 275x0.5 884.38
45 pmg_pt 224x224 2.28 275x0.5 672.85
46 refinedet_baseline 480x360 123 275x0.5 33.86
47 RefineDet-Medical_EDD_tf 320x320 9.8 275x0.5 289.07
48 refinedet_pruned_0_8 360x480 25 275x0.5 135.29
49 refinedet_pruned_0_92 360x480 10.1 275x0.5 275.63
50 refinedet_pruned_0_96 360x480 5.1 275x0.5 403.98
51 refinedet_VOC_tf 320x320 81.9 275x0.5 48.37
52 reid 80x160 0.95 275x0.5 2488.00
53 resnet18 224x224 3.7 275x0.5 871.54
54 resnet50 224x224 7.7 275x0.5 396.71
55 resnet50_pt 224x224 4.1 275x0.5 341.97
56 resnet50_tf2 224x224 7.7 275x0.5 397.03
57 resnet_v1_101_tf 224x224 14.4 275x0.5 205.59
58 resnet_v1_152_tf 224x224 21.8 275x0.5 136.93
59 resnet_v1_50_tf 224x224 7 275x0.5 396.44
60 retinaface 360x640 1.11 275x0.5 1171.76
61 salsanext_pt 64x2048 20.4 275x0.5 113.08
62 salsanext_v2_pt 64x2048 32 275x0.5 24.91
63 SemanticFPN_cityscapes_pt 256x512 10 275x0.5 290.10
64 SemanticFPN_Mobilenetv2_pt 512x1024 5.4 275x0.5 120.07
65 semantic_seg_citys_tf2 512x1024 54 275x0.5 39.18
66 SESR_S_pt 360x640 7.48 275x0.5 135.92
67 sp_net 128x224 0.55 275x0.5 2117.40
68 squeezenet 227x227 0.76 275x0.5 2299.17
69 squeezenet_pt 224x224 0.82 275x0.5 2447.79
70 ssd_adas_pruned_0_95 360x480 6.3 275x0.5 430.43
71 ssdlite_mobilenet_v2_coco_tf 300x300 1.5 275x0.5 873.85
72 ssd_mobilenet_v1_coco_tf 300x300 2.5 275x0.5 943.25
73 ssd_mobilenet_v2 360x480 6.6 275x0.5 255.39
74 ssd_mobilenet_v2_coco_tf 300x300 3.8 275x0.5 530.97
75 ssd_pedestrian_pruned_0_97 360x360 5.9 275x0.5 389.98
76 ssd_resnet_50_fpn_coco_tf 640x640 178.4 275x0.5 21.61
77 ssd_traffic_pruned_0_9 360x480 11.6 275x0.5 264.78
78 tiny_yolov3_vmss 416x416 5.46 275x0.5 594.69
79 tsd_yolox_pt 640x640 73 275x0.5 48.44
80 ultrafast_pt 288x800 8.4 275x0.5 164.96
81 unet_chaos-CT_pt 512x512 23.3 275x0.5 61.21
82 vgg_16_tf 224x224 31 275x0.5 101.28
83 vgg_19_tf 224x224 39.3 275x0.5 84.05
84 vpgnet_pruned_0_99 480x640 2.5 275x0.5 424.78
85 yolov2_voc 448x448 34 275x0.5 110.82
86 yolov2_voc_pruned_0_66 448x448 11.6 275x0.5 281.58
87 yolov2_voc_pruned_0_71 448x448 9.9 275x0.5 330.35
88 yolov2_voc_pruned_0_77 448x448 7.8 275x0.5 399.89
89 yolov3_adas_pruned_0_9 256x512 5.5 275x0.5 435.65
90 yolov3_bdd 288x512 53.7 275x0.5 50.93
91 yolov3_voc 416x416 65.4 275x0.5 52.35
92 yolov3_voc_tf 416x416 65.6 275x0.5 52.53
93 yolov4_leaky_spp_m 416x416 60.1 275x0.5 55.46
94 yolov4_leaky_spp_m_pruned_0_36 416x416 38.2 275x0.5 63.78