Xilinx Intermediate Representation (XIR) is a graph based intermediate representation of the AI algorithms which is well designed for compilation and efficient deployment of the Domain-specific Processing Unit (DPU) on the powerful FPGA platform. It is composed of Op, Tensor, Graph and Subgraph libraries. In future, the Vitis™ AI quantizer, compiler, runtime and many other tools will use the XIR to transmit data. Also, an advanced user can achieve Whole Application Acceleration to release more energy of FPGA by extending the XIR to support customized IP in the Vitis AI flow. Currently, the DPUCAHX8H is enabled by the XIR based flow. This section describes the DPUCAHX8H compiler and steps to use common VAI_C interface to create compiled xmodel from the vai_quantizer outputs.
The XIR based compiler for DPUCAHX8H takes the quantized TensorFlow or Caffe model as the input. It will first transform the input models into the XIR format as the foundation of the following processes. Most of the variations among different frameworks are eliminated and transferred to a unified representation in XIR. Then it applies various optimizations on the graph and break up the graph into several subgraphs on the basis of whether the OP can be executed on DPU. And some more architecture awared optimizations will be applied for each subgraph. For DPU subgraph, the compiler will generate the instruction stream and attach on it. Finally the optimized graph with necessary information and instructions for VART will be serialized to a compiled xmodel file.
Steps to compile Caffe or TensorFlow models for DPUCAHX8H with VAI_C are as same as previous DPU. It is assumed that you have successfully installed the Vitis AI package including VAI_C and compressed your model with vai_quantizer.
Caffe
For caffe, vai_q_caffe is supposed to generate a PROTOTXT (deploy.prototxt) and a MODEL (deploy.caffemodel). Make sure you specify “-keep_fixed_neuron” option for vai_q_caffe which is essential for DPUCAHX8H compiler. Then the following command is almost everything you need to do to get the compiled xmodel.
vai_c_caffe -p /PATH/TO/deploy.prototxt -c /PATH/TO/deploy.caffemodel -a /PATH/TO/arch/DPUCAHX8H/PLATFORM/arch.json -o /OUTPUTPATH -n netname}
The compiler will create three files in OUTPUTPATH directory. ‘netname_org.xmodel’ is the pre-compiled xmodel which is generated by compiler frontend. ‘netname.xmodel’ is the compiled xmodel which contains instructions and other necessary information. ‘meta.json’ is for runtime.
See Model Deployment Overview for more information on deploying the network on DPU with those files.
TensorFlow
For TensorFlow, vai_q_tensorflow is supposed to generate a pb file (quantize_eval_model.pb). Notice that there are two pb files generated by vai_q_tensorflow. The quantize_eval_model.pb file is the proper one for DPUCAHX8H compiler, which is different from DPUCZDX8G. The compilation command is similar.
vai_c_tensorflow -f /PATH/TO/quantize_eval_model.pb -a /PATH/TO/arch/DPUCAHX8H/PLATFORM/arch.json -o /OUTPUTPATH -n netname}
And the outputs will be as same as Caffe.
Pytorch
For Pytorch, the quantizer NNDCT will output the quantized model in XIR format directly. Use vai_c_xir to compile it.
vai_c_xir -i /PATH/TO/quantized.xmodel -a /PATH/TO/arch/DPUCAHX8H/PLATFORM/arch.json -o /OUTPUTPATH -n netname}
And the outputs will be as same as Caffe and Tensorflow.
Currently Supported Operators
Typical Layers in CNN | Parameters | DPU Support |
---|---|---|
Convolution (Caffe: Convolution) (Tensorflow: Conv2d, SeparaleConv2D…) |
Kernel size | W: [1, 8], H: [1, 8] |
Strides | W: [1, 4], H: [1, 4] | |
Paddings | Left, Right: [1, kernel_w-1] Top, Bottom: [1, kernel_h-1] |
|
In/Out Size | Arbitrary | |
In/Out Channels | [1, 256 * channel_parallel] | |
Activation | ReLU, LeakyReLU or ReLU6 | |
Dilation | Dilation * input_channel <= 256 * channel_parallel && stride ==1 | |
Group* (Caffe) | Group==1 | |
Deconvolution (Caffe: Deconvolution) (Tensorflow: Conv2DTranspose) |
Kernel size | W: [1, 8], H: [1, 8] |
Strides | W: [1, 4], H: [1, 4] | |
Paddings | Left, Right: [1, kernel_w-1] Top, Bottom: [1, kernel_h-1] |
|
In/Out Size | Arbitrary | |
In/Out Channels | [1, 256 * channel_parallel] | |
Activation | ReLU, LeakyReLU or ReLU6 | |
Max Pooling (Caffe: Pooling) (Tensorflow: MaxPool2D) |
Kernel size | W: [1, 8], H: [1, 8] |
Strides | W: [1, 4], H: [1, 4] | |
Paddings | Left, Right: [1, kernel_w-1] Top, Bottom: [1, kernel_h-1] |
|
Average Pooling (Caffe: Pooling) (Tensorflow: AveragePooling2D, Mean) |
Kernel size | W: [1, 8], H: [1, 8] |
Strides | W: [1, 4], H: [1, 4] | |
Paddings | Left, Right: [1, kernel_w-1] Top, Bottom: [1, kernel_h-1] |
|
Element-wise Sum (Caffe: Eltwise) (Tensorflow: Add) |
Input Size | Arbitrary |
Input Channel | [1, 256 * channel_parallel] | |
Activation | ReLU or LeakyReLU | |
Concat (Caffe: Concat) (Tensorflow: Concatenate) |
Number, Axis | Arbitrary |
Out Channel | [1, 256 * channel_parallel] | |
Reorg* (Caffe) | Strides* | stride ^ 2 * input_channel <= 256 * channel_parallel |
Scale*, Reverse* | Arbitrary | |
Fully Connection (Caffe: Inner Product) (Tensorflow: Matmul, Mul) |
Input Channel | Input_channel < 2048 * channel_parallel |
Output Channel | Arbitrary | |
|
Operators listed above are commonly used in CNN models, and DPU can support many configurations of these operators.
Operators below are primitively defined in different deep learning frameworks. The compiler can automatically parse these operators and distribute them to DPU or CPU. These operators are partially supported by the tools, and they are listed here for your reference.
Operators | Framework | Parameters | DPU Support |
---|---|---|---|
Const | Tensorflow | - | Arbitrary |
Shape | Tensorflow | - | Arbitrary |
Identity | Tensorflow | - | Arbitrary |
Batchnorm+ | Caffe | - | Arbitrary |
Neg* | Tensorflow | - | Partially |
Mul* | Tensorflow | - | Partially |
Sub* | Tensorflow | - | Partially |
Gstiling* | Caffe | reverse, stride | Partially |
Permute* | Caffe | order | Partially |
Flatten* | Caffe/TensorFlow | start_dim, end_dim | Partially |
Squeeze* | Tensorflow | dims | Partially |
Reshape* | Tensorflow | shape | Partially |
Stack* | Tensorflow | axis | Partially |
Matmul* | Tensorflow | transpose_a, transpose_b | Partially |
Strided_Slice* | Tensorflow | begin, end, strides, begin_mask, end_mask, ellipsis_mask, new_axis_mask, shrink_axis_mask | Partially |
Mean* | Tensorflow | dims, keep_dims | Avgpool-like configurations |
Resize* | Tensorflow | scale, align_corners, mode | scale = 2, false, NEAREST |
Pad* | Tensorflow | pad, pad_mode, constant_value | “Constant”and pad with 0, “SYMMETRIC” |
Resize_nearest* | Tensorflow | align_corners | False |
DeephiResize* | Caffe | scale, mode | Scale = 2, NEAREST |
Upsample2D** | Tensorflow | align_corners | - |
Resize_bilinear** | Tensorflow | align_corners | - |
Space_to_batch** | Tensorflow | block_shape, Paddings | - |
Batch_to_space** | Tensorflow | block_shape, Paddings | - |
Prior_box** | Caffe | - | - |
Softmax** | Tensorflow | axis | - |