ONNX Runtime
Vitis AI Execution Provider (Vitis AI EP) offers hardware-accelerated AI inference with AMD's DPU. It enables users to run the quantized ONNX model on the target board directly. The current Vitis AI EP inside ONNX Runtime enables Neural Network model inference acceleration on embedded devices, including Zynq UltraScale+ MPSoC, Versal, Versal AI Edge and Kria cards.
Vitis AI ONNXRuntime Engine (VOE) serves as the implementation library of Vitis AI EP.
Features
- Supports ONNX Opset version 18, ONNX Runtime 1.16.0 and ONNX version 1.13
- C++ and Python API (supported Python version 3)
- Supports incorporation of other execution providers, such as ACL EP, to accelerate the inference with AMD DPU in addition to the Vitis AI EP
- Supports computation on the ARM64 Cortex®-A72 cores and the supported target is VEK280 in VAI3.5
Benefits
- Versatility: You can deploy subgraphs on the AMD DPU while using other execution providers like Arm® NN and Arm® ACL for additional operators. This flexibility enables the deployment of models that might not be directly supported on target boards.
- Improved performance: By leveraging specialized execution providers such as the AMD DPU for specific operations and using other providers for the remaining operators, you can achieve optimized performance for their models.
- Expanded model support: Enhancing ONNX Runtime enables the deployment of models with operators that the DPU does not natively support. By incorporating additional execution providers, you can execute many models, including those from the ONNX model zoo.
Runtime Options
Vitis AI ONNX Runtime integrates a compiler responsible for compiling the model graph and weights into a micro-coded executable. This executable is deployed on the target accelerator.
During the initiation of the ONNX Runtime session, the model is compiled, and this compilation process must be completed before the first inference pass. The compilation duration might vary, but it could take a few minutes. After the model is compiled, the model executable is cached. For subsequent inference runs, you can use the cached executable model.
Several
runtime variables can be set to configure the inference session, as listed in the
following table. The config_file
variable is not
optional and must be set to point to the configuration file's location. The cacheDir
and cacheKey
variables are optional.
Runtime Variable | Default Value | Details |
---|---|---|
config_file |
"" | required, the configuration file path, the configuration file vaip_config.json is contained in Vitis_ai_2023.1-r3.5.0.tar.gz |
cacheDir |
/tmp/{user}/vaip/.cache/ |
optional, cache directory |
cacheKey |
{onnx_model_md5} |
optional, cache key, used to distinguish between different models. |
The final cache directory is {cacheDir}/{cacheKey}. In addition, environment variables can be set to customize the Vitis AI EP.
Environment Variable |
Default Value |
Details |
---|---|---|
XLNX_ENABLE_CACHE |
1 |
Whether to use the cache, if it is 0, it will ignore the cached executable, and the model will be recompiled. |
XLNX_CACHE_DIR |
/tmp/$USER/vaip/.cache/{onnx_model_md5} |
optional, configure cache path |
Vitis AI 3.5 offers over ten deployment examples based on ONNX Runtime. You can find the examples at https://github.com/Xilinx/Vitis-AI/tree/v3.5/examples/vai_library/samples_onnx. The following steps describe how to use VOE to deploy the ONNX model:
- Prepare the quantized model in ONNX format. Use Vitis AI Quantizer to quantize the model and output the quantized model in ONNX format.
- Download the ONNX runtime package vitis_ai_2023.1-r3.5.0.tar.gz and
install it on the target board.
tar -xzvf vitis_ai_2023.1-r3.5.0.tar.gz -C /
Then, download the voe-0.1.0-py3-none-any.whl and onnxruntime_vitisai-1.16.0-py3-none-any.whl. Ensure the device is online and install them online.pip3 install voe*.whl pip3 install onnxruntime_vitisai*.whl
-
Vitis AI 3.5 supports ONNX
Runtime C++ API and
Python
API. For details on ONNX Runtime API, refer to https://onnxruntime.ai/docs/api/. The following is an ONNX model deployment code snippet based on the C++ API:C++ example
// ... #include <experimental_onnxruntime_cxx_api.h> // include user header files // ... auto onnx_model_path = "resnet50_pt.onnx" Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "resnet50_pt"); auto session_options = Ort::SessionOptions(); auto options = std::unorderd_map<std::string,std::string>({}); options["config_file"] = "/etc/vaip_config.json"; // optional, eg: cache path : /tmp/my_cache/abcdefg // Replace abcdefg with your model name, eg. onnx_model_md5 options["cacheDir"] = "/tmp/my_cache"; options["cacheKey"] = "abcdefg"; // Replace abcdefg with your model name, eg. onnx_model_md5 // Create an inference session using the Vitis AI execution provider session_options.AppendExecutionProvider("VitisAI", options); auto session = Ort::Experimental::Session(env, model_name, session_options); auto input_shapes = session.GetInputShapes(); // preprocess input data // ... // Create input tensors and populate input data std::vector<Ort::Value> input_tensors; input_tensors.push_back(Ort::Experimental::Value::CreateTensor<float>( input_data.data(), input_data.size(), input_shapes[0])); auto output_tensors = session.Run(session.GetInputNames(), input_tensors, session.GetOutputNames()); // postprocess output data // ...
To leverage the Python APIs, use the following example for reference:
import onnxruntime # Add other imports # ... # Load inputs and do preprocessing # ... # Create an inference session using the Vitis-AI execution provider session = onnxruntime.InferenceSession( '[model_file].onnx', providers=["VitisAIExecutionProvider"], provider_options=[{"config_file":"/etc/vaip_config.json"}]) input_shape = session.get_inputs()[0].shape input_name = session.get_inputs()[0].name # Load inputs and do preprocessing by input_shape input_data = [...] result = session.run([], {input_name: input_data})
- Create a build.sh file or
copy one from the Vitis AI Library ONNX
examples and modify it. Then, build the
program:
result=0 && pkg-config --list-all | grep opencv4 && result=1 if [ $result -eq 1 ]; then OPENCV_FLAGS=$(pkg-config --cflags --libs-only-L opencv4) else OPENCV_FLAGS=$(pkg-config --cflags --libs-only-L opencv) fi lib_x=" -lglog -lunilog -lvitis_ai_library-xnnpp -lvitis_ai_library-model_config -lprotobuf -lxrt_core -lvart-xrt-device-handle -lvaip-core -lxcompiler-core -labsl_city -labsl_low_level_hash -lvart-dpu-controller -lxir -lvart-util -ltarget-factory -ljson-c" lib_onnx=" -lonnxruntime" lib_opencv=" -lopencv_videoio -lopencv_imgcodecs -lopencv_highgui -lopencv_imgproc -lopencv_core " if [[ "$CXX" == *"sysroot"* ]];then inc_x="-I=/usr/include/onnxruntime -I=/install/Release/include/onnxruntime -I=/install/Release/include -I=/usr/include/xrt" link_x=" -L=/install/Release/lib" else inc_x=" -I/usr/include/onnxruntime -I/usr/include/xrt" link_x=" " fi name=$(basename $PWD) CXX=${CXX:-g++} $CXX -O2 -fno-inline -I. \ ${inc_x} \ ${link_x} \ -o ${name}_onnx -std=c++17 \ $PWD/${name}_onnx.cpp \ ${OPENCV_FLAGS} \ ${lib_opencv} \ ${lib_x} \ ${lib_onnx}
- Copy the executable program and the quantized ONNX model to the
target. Then, run the program.Note: For the ONNX model deployment, the input model is the quantized ONNX model. If the environmental variable
WITH_XCOMPILER
is on, it first performs the model compiling online when you run the program. It might take some time to compile the model.