Vitis AI ONNX Runtime Engine, short for VOE, is a new feature in Vitis AI 3.0. It allows user to directly run the quantized ONNX model on the target board. VitisAI EP is provided to accelerate the inference with Xilinx DPU. The following is the overview of VOE in Vitis AI.
Figure 1. VOE Overview
In Vitis AI 3.0, there are more than 10 deployment examples based on ONNX runtime are provided. Users can find the examples in https://github.com/Xilinx/Vitis-AI/tree/v3.0/examples/vai_library/samples_onnx. The following shows how to use VOE to deploy the ONNX model step by step.
- Prepare the quantized model in ONNX format. Users need to use the Vitis-AI quantizer to quantize the model and output the quantized model in ONNX format.
- Download the ONNX runtime package vitis_ai_2022.2-r3.0.0.tar.gz and
install it on the target board.
tar -xzvf vitis_ai_2022.2-r3.0.0.tar.gz -C /
- Use the ONNX Runtime C++ API to create
the application program. For the details of ONNX Runtime API, refer to https://onnxruntime.ai/docs/api/. The following shows the segmentation model deployment
code snippet based on the C++ API.
C++ example
//Create a session //Select a set of execution provides(EP) if any, "VITISAI_EP" is selected env = Ort::Env(ORT_LOGGING_LEVEL_WARNING, "Segmentation"); session_options = Ort::SessionOptions(); CheckStatus(OrtSessionOptionsAppendExecutionProvider_VITISAI(session_options,"")); std::string model_name_(model_name); session = std::unique_ptr<Ort::Experimental::Session>( new Ort::Experimental::Session(env, model_name_, session_options)); //Do the pre-process and set the input cv::Mat resize_image; auto height = input_shapes[0][2]; auto width = input_shapes[0][3]; auto size = cv::Size((int)width, (int)height); cv::resize(image[0], resize_image, size); set_input_image(resize_image, input_tensor_values.data()); if (input_tensors.size()) { input_tensors[0] = Ort::Experimental::Value::CreateTensor<float> (input_tensor_values.data(), input_tensor_values.size(), input_shapes[0]); } else { input_tensors.push_back( Ort::Experimental::Value::CreateTensor<float>(input_tensor_values.data(), input_tensor_values.size(), input_shapes[0])); } //Run the session output_tensors = session->Run(session->GetInputNames(), input_tensors, session->GetOutputNames()); output_tensor_ptr[0] = output_tensors[0].GetTensorMutableData<float>(); //Get the output and do the post-process auto oc = output_shapes[0][1]; auto oh = output_shapes[0][2]; auto ow = output_shapes[0][3]; auto hwc = permute(output_tensor_ptr[0], oc, oh, ow); cv::Mat result(oh, ow, CV_8UC1); max_index_c(hwc.data(), oc, oh * ow, result.data);
- Create a build.sh file as shown below,
or copy one from the Vitis AI Library ONNX
examples and modify it. Then, build the
program.
result=0 && pkg-config --list-all | grep opencv4 && result=1 if [ $result -eq 1 ]; then OPENCV_FLAGS=$(pkg-config --cflags --libs-only-L opencv4) else OPENCV_FLAGS=$(pkg-config --cflags --libs-only-L opencv) fi lib_x=" -lglog -lunilog -lvitis_ai_library-xnnpp -lvitis_ai_library-model_config -lprotobuf -lxrt_core -lvart-xrt-device-handle -lvaip-core -lxcompiler-core -labsl_city -labsl_low_level_hash -lvart-dpu-controller -lxir -lvart-util -ltarget-factory -ljson-c" lib_onnx=" -lonnxruntime" lib_opencv=" -lopencv_videoio -lopencv_imgcodecs -lopencv_highgui -lopencv_imgproc -lopencv_core " if [[ "$CXX" == *"sysroot"* ]];then inc_x="-I=/usr/include/onnxruntime -I=/install/Release/include/onnxruntime -I=/install/Release/include -I=/usr/include/xrt" link_x=" -L=/install/Release/lib" else inc_x=" -I/usr/include/onnxruntime -I/usr/include/xrt" link_x=" " fi name=$(basename $PWD) CXX=${CXX:-g++} $CXX -O2 -fno-inline -I. \ ${inc_x} \ ${link_x} \ -o ${name}_onnx -std=c++17 \ $PWD/${name}_onnx.cpp \ ${OPENCV_FLAGS} \ ${lib_opencv} \ ${lib_x} \ ${lib_onnx}
- Copy the executable program and the quantized ONNX model to the target.
Then, run the program.Note: For the ONNX model deployment, the input model is the quantized ONNX model. It will do the model compiling online first when you run the program. It may take some time during compiling the model.