For edge DPU, you can utilize Vitis AI unified APIs to develop deep learning
applications. In addition, they have another choice to adapt advanced low-level APIs to
flexibly meet various scenarios' requirements. Note that you need to adopt legacy DNNDK
N2Cube runtime as as to using such advanced APIs.For more details on advanced API usage,
see Advanced Programming Interface .
For Vitis AI advanced low-level APIs, you need to use the following operations:
- Call APIs to manage DPU kernels and tasks.
- DPU kernel creation and destruction
- DPU task creation and destruction
- Manipulate DPU input and output tensors
- Deploy DPU un-supported layers/operators over the CPU side.
- Implement pre-processing to feed input data to DPU and implement post-processing
to consume output data from
DPU.
int main(void) { /* DPU Kernel/Task for running ResNet-50 */ DPUKernel* kernel; DPUTask* task; /* Attach to DPU device and prepare for running */ dpuOpen(); /* Create DPU Kernel for ResNet-50 */ kernel = dpuLoadKernel("resnet50"); /* Create DPU Task for ResNet-50 */ task = dpuCreateTask(kernel, 0); /* Run DPU Task for ResNet-50 */ runResnet50(task); /* Destroy DPU Task & release resources */ dpuDestroyTask(task); /* Destroy DPU Kernel & release resources */ dpuDestroyKernel(kernel); /* Detach DPU device & release resources */ dpuClose(); return 0; }
Use ResNet50 as an example, the code snippet for manipulating the DPU kernels
and tasks are programmed within themain()
function
as follows. The operations inside main()
include:
- Call
dpuOpen()
to open the DPU device. - Call
dpuLoadKernel()
to load the DPU resnet50 kernel. - Call
dpuCreateTask()
to create a task for DPU kernel. - Call
dpuDestroyKernel()
anddpuDestroyTask()
to destroy the DPU kernel and task and release resources. - Call
dpuClose()
to close the DPU device.
The image classification takes place within the runResnet50()
function, which performs the following operations:
- Fetch an image using the OpenCV function
imread()
and set it as the input to the DPU kernel resnet50 by calling thedpuSetInputImage2()
for Caffe model. For TensorFlow model, the users should implement the pre-processing (instead of directly usingdpuSetInputImage2()
) to feed input image into DPU. - Call
dpuRunTask()
to run the task for ResNet-50 model. - Perform softmax calculation on the ArmĀ® CPU with the output data from DPU.
- Calculate the top-5 classification category and the corresponding
probability.
Mat image = imread(baseImagePath + imageName); dpuSetInputImage2(task, INPUT_NODE, image); dpuRunTask(task); /* Get FC result and convert from INT8 to FP32 format */ dpuGetOutputTensorInHWCFP32(task, FC_OUTPUT_NODE, FCResult, channel); CPUCalcSoftmax(FCResult, channel, softmax); TopK(softmax, channel, 5, kinds);