The kernel information generated by VAI_C is illustrated as follows. Such information is useful for the users to deploy models over edge DPU.
- Kernel ID
- The ID of each kernel generated by VAI_C after compilation. Every kernel has a unique id assigned by VAI_C. The neural network model will be compiled to several kernels depending on operators supported by DPU.
- Kernel Topology
- The kernel topology description file describes the kernels in
the kernel graph view when compilation is finished. The kernel_graph file is saved in standard JPEG
format with file extension .jpg in the
output directory specified by the
VAI_C --output_dir
option. If graphviz is not installed on the host system, VAI_C will output a DOT (graph description language) format file with extension .gv instead. You can convert the .gv format file to a JPEG file using the following command:dot -Tjpg -o kernel_graph.jpg kernel_graph.gv
- Kernel Name
- The name of the current kernel. For each DPU kernel, VAI_C produces one corresponding ELF object file named as dpu_kernelName.elf. For example, dpu_resnet50_0.elf and dpu_resnet50_2.elf are for DPU kernels resnet50_0 and resnet50_2 respectively. The kernel name is expected to be used in the Vitis AI programming, allowing DPU runtime to identify DPU different kernels correctly. As the container for DPU kernel, DPU ELF file encapsulates the DPU instruction codes and parameters for the network model.
- Kernel Type
- The type of kernel. Three types of kernel are supported by VAI_C.
- Code Size
- DPU instruction code size in the unit of MB, KB, or bytes for the DPU kernel.
- Param Size
- The size of parameters for this kernel in the unit of MB for the DPU kernel.
- Workload MACs
- The total computation workload in the unit of MOPS for the DPU kernel.
- Mean Value
- The mean values for the DPU kernel.
- I/O Memory Space
- Only available for DPU kernel compiled as unique memory model. It is the total size of input tenors, intermediate feature maps, and output tensors in the unit of MB. For split IO memory model, refer to the other three fields: Input Mem Size, Output Mem Size and Feature Map Mem Size, which are described below.
- Input Mem Size
- The total size of all the input tensors in the unit of MB(B). It is only available for DPU kernel compiled as split IO memory model.
- Output Mem Size
- The total size of all the outputs tensors in the unit of MB(B). It is only available for DPU kernel compiled as split IO memory model.
- Feature Map Mem Size
- The total size of the intermediate feature maps in the unit of MB(B). It is only available for DPU kernel compiled as split IO memory model.
- Total Node Count
- The number of DPU nodes for the DPU kernel.
- Total Tensor Count
- The number of DPU tensors for the DPU kernel.
- Boundary Input Tensors
- All input tensors of the kernel are listed out together with
their shape information in the format of HWC (height*width*channel). The
input tensor name can be used to retrieve DPUTensor via
dpuGetBoundaryIOTensor()
API. For ResNet50, its input tensor isdata:0
. - Boundary Output Tensors
- All output tensors of the kernel are listed out together
with their shape information in the format of HWC (height*width*channel).
The output tensor name can be used to retrieve DPUTensor via
dpuGetBoundaryIOTensor()
API. For ResNet50, its output tensor isfc1000:0
. Note that for the historical reason of edge DPU design , VAI_C compiler always produces even number channels for the output tensor with odd number channels. Regarding the additionally added one channel for the output tensor, it is always filled with zero. - Input nodes
- All input nodes of the current DPU kernel and the shape information of each node are listed in the format of height*width*channel. For kernels not supported by the DPU, the user must get the output of the preceding kernel through output nodes and feed them into input nodes of the current node, using APIs provided by N2Cube.
- Output nodes
- All output nodes of the current DPU kernel and the shape information of each node is listed in the format of height*width*channel. The address and size of output nodes can be extracted using APIs provided by N2Cube.
Note: The fields
of Code Size, Param Size, Workload MACs, Mean Value, Node Count and Tensor Count
from VAI_C compilation log are only available for DPU kernel.
For ResNet-50,
its kernel graph in JPEG format is shown in the following figure. The kernel graph node
describes the kernel id and its type, while the edge shows the relationship between
different kernels in two tuples. The first item represents the output tensor from the
source kernel, while the second item shows the input tensor to the destination kernel.
The tuple contains two parts: the name of input/output node binding to the tensor, and
the tensor index of the input/output node. Using the node name and index provided in the
tuple, users can use the APIs provided by N2Cube to get the input or output tensor
address.Figure 1. DPU Kernel Graph for ResNet-50
Regarding the operations supported by edge DPU, you can refer to the Zynq DPU v3.1 IP Product Guide (PG338) for details. After compilation process of VAI_C, network models are normally transformed into the following three kinds of kernels.
- DPUKernel
- Kernel running on edge DPU
- CPUKernel
- Kernel running on CPU side. It consists of the DPU un-supported layers/operators, which should be deployed onto the CPU by the user.
- ParamKernel
- Same as CPU Kernel, but also generates weights and bias parameters for the DPU un-supported layers/operators.