Starting with version 2.5, Vitis AI supports Pytorch and Tensorflow2 models with custom op. The basic workflow for custom op is shown below.
Figure 1. Custom Op Workflow
The following are the steps in the workflow:
- Define the OP as a custom OP unknown to XIR and then quantize the model.
- Compile the quantized model.
- Register and implement the custom OP.
- Deploy the model with graph_runner APIs.
Note: If you want to implement an accelerated (PL or
AI Engine)
function for a custom op, make it a CPU OP, but implement the PL/AI Engine calling
codes in this CPU OP's implementation.
For step 4, graph_runner APIs support both C++ and Python. When using the Graph_runner API to deploy Custom OP, its runtime has been optimized, including Zero-copy technology between different DPU OPs and CPU OPs. It means address sharing between different layers without data copying.
The following model structure is supported by Zero copy.
Type | Output of OP | Input of OP | Using Zero copy |
---|---|---|---|
a | Single dpu OP | Single cpu OP | Yes |
b | Single cpu OP | Single dpu OP | Yes |
c | Single cpu OP | Single cpu OP | Yes |
d | Single dpu OP | Multiple cpu OP | Yes |
e | Multiple cpu OP and multiple dpu OP | Single cpu OP | Yes |
Note: Model structure types a-e are
shown in the following figure.
Figure 2. Model Structure Types
Note: The application of Zero copy for the other
model structures depends on the situation.
The following are examples of the two models, respectively.
- MNIST model based on Tensorflow2
- Pointpillars model based on Pytorch