- DSight
- DSight is the Vitis AI performance profiler for edge DPU
and is a visual analysis tool for model performance profiling. The following
figure shows its usage.Figure 1. DSight Help Info
- DExplorer
- DExplorer is a utility running on the target board. It provides DPU running
mode configuration, DNNDK version checking, DPU status checking, and DPU core
signature checking. The following figure shows the help information for the
usage of DExplorer.Figure 2. DExplorer Usage Options
- Check DNNDK Version
- Running
dexplore -v
will display version information for each component in DNNDK, including N2cube, DPU driver, DExplorer, and DSight.
- Check DPU Status
- DExplorer provides DPU status information, including running mode of N2cube,
DPU timeout threshold, DPU debugging level, DPU core status, DPU register
information, DPU memory resource, and utilization. The following figure shows a
screenshot of DPU status.Figure 3. DExplorer Status
- Configuring DPU Running Mode
- Edge DPU runtime N2cube supports three kinds of DPU execution modes to help
developers to debug and profile Vitis AI applications.
- Normal Mode
- In normal mode, the DPU application can get the best performance without any overhead.
- Profile Mode
- In profile mode, the DPU will turn on the profiling switch. When running deep learning applications in profile mode, N2cube will output to the console the performance data layer by layer while executing the neural network; at the same time, a profile with the name dpu_trace_[PID].prof will be produced under the current folder. This file can be used with the DSight tool.
- Debug Mode
- In this mode, the DPU dumps raw data for each DPU computation node during execution, including DPU instruction code in binary format, network parameters, DPU input tensor, and output tensor. This makes it possible to debug and locate issues in a DPU application.
Note: Profile mode and debug mode are only available to network models compiled into debug
mode DPU ELF objects by the Vitis AI compiler.
- Checking DPU Signature
- New DPU cores have been introduced to meet various deep learning acceleration requirements across different Xilinx® FPGA devices. For example, DPU architectures B1024F, B1152F, B1600F, B2304F, and B4096F are available. Each DPU architecture can implement a different version of the DPU instruction set (named as a DPU target version) to support the rapid improvements in deep learning algorithms.
- DDump
- DDump is a utility tool to dump the information encapsulated inside a DPU ELF file, hybrid executable, or DPU shared library and can facilitate users to analyze and debug various issues. Refer to DPU Shared Library for more details.
- Check DPU Kernel Info
- DDump can dump the following information for each DPU kernel from DPU ELF
file, hybrid executable, or DPU shared library.
- Mode
- The mode of DPU kernel compiled by VAI_C compiler, NORMAL, or DEBUG.
- Code Size
- The DPU instruction code size in the unit of MB, KB, or bytes for DPU kernel.
- Param Size
- The Parameter size in the unit of MB, KB, or bytes for DPU kernel, including weight and bias.
- Workload MACs
- The computation workload in the unit of MOPS for DPU kernel.
- IO Memory Space
- The required DPU memory space in the unit of MB, KB, or bytes for intermediate feature map. For each created DPU task, N2Cube automatically allocates DPU memory buffer for intermediate feature map.
- Mean Value
- The mean values for DPU kernel.
- Node Count
- The total number of DPU nodes for DPU kernel.
- Tensor Count
- The total number of DPU tensors for DPU kernel.
- Tensor In(H*W*C)
- The DPU input tensor list and their shape information in the format of height*width*channel.
- Tensor Out(H*W*C)
- The DPU output tensor list and their shape information in the format of height*width*channel.
- Check DPU Arch Info
- DPU configuration information from DPU DCF is automatically wrapped into DPU ELF file by VAI_C compiler for each DPU kernel. VAI_C then generates the appropriate DPU instructions, according to DPU configuration parameters. Refer to Zynq DPU v3.1 IP Product Guide (PG338) for more details about configurable DPU descriptions.
- Check VAI_C Info
- VAI_C version information is automatically embedded into DPU ELF file while compiling network model. DDump can help to dump out this VAI_C version information, which users can provide to the Xilinx AI support team for debugging purposes.
- Legacy Support
- DDump also supports dumping the information for legacy DPU ELF file, hybrid executable, and DPU shared library generated. The main difference is that there is no detailed DPU architecture information.
- DLet
- DLet is host tool designed to parse and extract various edge DPU
configuration parameters from DPU hardware handoff file HWH, generated by
Vivado. The following figure shows the
usage information of DLet.Figure 9. Dlet Usage Options