Vitis AI provides several C++ and Python
examples to demonstrate the use of the unified cloud-edge runtime programming APIs.
Note: The sample
code helps you get started with the new runtime (VART). They are not meant for
performance benchmarking.
To familiarize yourself with the unified APIs, use
the VART examples. These examples are only to understand the APIs and do not provide
high performance. These APIs are compatible between the edge and cloud, though cloud
boards may have different software optimizations such as batching and on the edge would
require multi-threading to achieve higher performance. If you desire higher performance,
see the Vitis AI Library samples and demo
software.If you want to do optimizations to achieve high performance, here are some suggestions:
- Rearrange the thread pipeline structure so that every DPU thread has its own "DPU" runner object.
- Optimize display thread so that when DPU FPS is higher than display rate, skipping some frames. 200FPS is too high for video display.
- Pre-decoding. The video file might be H.264 encoded. The decoder is slower than the DPU and consumes a lot of CPU resources. The video file has to be first decoded and transformed into raw format.
- Batch mode on Alveo boards need special consideration as it may cause video frame jittering. ZCU102 has no batch mode support.
- OpenCV cv::imshow is slow, so you need to use libdrm.so. This is only for local display, not through X server.
The following table below describes these Vitis AI examples.
ID | Example Name | Models | Framework | Notes |
---|---|---|---|---|
1 | resnet50 | ResNet50 | Caffe | Image classification with Vitis AI unified C++ APIs. |
2 | resnet50_mt_py | ResNet50 | TensorFlow | Multi-threading image classification with Vitis AI unified Python APIs. |
3 | inception_v1_mt_py | Inception-v1 | TensorFlow | Multi-threading image classification with Vitis AI unified Python APIs. |
4 | pose_detection | SSD, Pose detection | Caffe | Pose detection with Vitis AI unified C++ APIs. |
5 | video_analysis | SSD | Caffe | Traffic detection with Vitis AI unified C++ APIs. |
6 | adas_detection | YOLO-v3 | Caffe | ADAS detection with Vitis AI unified C++ APIs. |
7 | segmentation | FPN | Caffe | Semantic segmentation with Vitis AI unified C++ APIs. |
The typical code snippet to deploy models with Vitis AI unified C++ high-level APIs is as follows:
// get dpu subgraph by parsing model file
auto runner = vart::Runner::create_runner(subgraph, "run");
//populate input/output tensors
auto job_id = runner->execute_async(inputsPtr, outputsPtr);
runner->wait(job_id.first, -1);
//process outputs
The typical code snippet to deploy models with Vitis AI unified Python high-level APIs is shown below:
dpu_runner = runner.Runner(subgraph,"run")
# populate input/output tensors
jid = dpu_runner.execute_async(fpgaInput, fpgaOutput)
dpu_runner.wait(jid)
# process fpgaOutput