After the model is deployed on edge DPU, perhaps the running results are not as desired, running into a lower accuracy issue. Under this situation, the users should first check the model’s accuracy after quantized by Vitis AI quantizer. If this is fine, then two suspected points are left to be further debugged. One possible point is related to the deployment source code, which should be checked very carefully. The other possible point is related to DPU execution itself. This section focuses on the illustrations about debugging the DPU running result. Normally, it involves the following five steps.
- Run Vitis AI quantizer to generate the golden baseline from the quantized model.
- Build the model as debug mode DPU kernel by Vitis AI compiler with option
--dump fused_graph_info
specified. - Before launching the running of DPU application, run command
dexplorer -m
debug to switch runtime N2Cube into debug mode, or callingdpuEnableTaskDebug()
to enable debug mode for the dedicated DPU task only (other tasks will not be affected). - Run the DPU application and get the raw dump data for DPU task’s each node.
- Compare DPU raw dump data with the golden baseline from quantizer.
DNNDK sample debugging
is delivered within
Vitis AI package to demonstrate how to debug the
DPU. TensorFlow Inception-v1 model is deployed within this sample and there are two
sub-folders: decent_golden
and dpu_deployment
. The folder decent_golden
holds all the required files to generate golden baseline together with the evaluation
version model quantize_eval_model.pb
(deployable
version model cannot be used) generated by quantizer. Run script decent_dump_golden.sh to dump the golden baseline for the
input image
decent_golden/dataset/images/cropped_224x224.jpg
and save into the folder
decent_golden/dump_golden/dump_results_0/
.
dump_gpu
by
default.DECENT_DEBUG=5 vai_q_caffe test -model quantize_model/quantize_train_test.prototxt \
-weights quantize_model/quantize_train_test.caffemodel \
-test_iter 1 \
2>&1 | tee ./log/dump.log
With option --dump fused_graph_info
specified
to Vitis AI compiler, while compiling Inception-v1 model, one file named
fused_graph_kernel_0.txt will be produced with DPU kernel
dpu_tf_inception_v1_0
. The folder
dpu_deployment holds the deployment source code for
Inception-v1 and dpuEnableTaskDump()
is used to enable DPU raw data
dumping. After going through the code in source file main.cc, it can be noticed that
pre-processing and post-processing for Inception-v1 model are not included, which is
helpful for isolating the affections of deployment code during debugging DPU. The file
fused_graph_kernel_0.txt describes the mapping relationship
between DPU node (or super-layer), which may contain several fused layers or operators,
and the quantized model’s layers or operators, which are divided into two types, in and
out. For Caffe model, the layers’ names are identical with the original floating-point
model. For TensorFlow model, the operators’ names are slightly different from the
original floating-point model because Vitis AI quantizer performs some operators’
fusion. With the name of the quantized model’s layer or operator, the users can locate
its corresponding dump files from quantizer golden baseline.
For kernel dpu_tf_inception_v1_0.elf
of TensorFlow Inception-v1 model,
the mapping information for its input node input
and output node
InceptionV1_Logits_Conv2d_0c_1x1_Conv2D
is shown below. For input
node input
, its out operator is input. And for output node
InceptionV1_Logits_Conv2d_0c_1x1_Conv2D
, its out operator is
InceptionV1_Logits_Conv2d_0c_1x1_Conv2D
.
input :
{
in(0): null
out(0): input
};
InceptionV1_Logits_Conv2d_0c_1x1_Conv2D :
{
in(0): InceptionV1_Logits_AvgPool_0a_7x7_AvgPool
out(0): InceptionV1_Logits_Conv2d_0c_1x1_Conv2D
};
For out type operator input, its corresponding text format dump file from
Vitis AI quantizer is
input_aquant_int8.txt
(_aquant_int8
is the
added suffix), which can be found from
decent_golden/dump_golden/dump_results_0/
. Feed Int8 type input data
from input_aquant_int8.txt
into DPU input node
input. After compiling and running this DPU application, raw data for each DPU node will
be dumped into a folder like dump_2134 (number 2134 is process ID). For the last DPU
node InceptionV1_Logits_Conv2d_0c_1x1_Conv2D
,
locate the DPU Int8 type running result within the file tf_inception_v1_0_InceptionV1_Logits_Conv2d_0c_1x1_Conv2D_out0.bin
(prefix
tf_inception_v1_0_
is the kernel name. And suffix
out0 indicates that it is the first output tensor for this DPU node). For the last DPU
node InceptionV1_Logits_Conv2d_0c_1x1_Conv2D
, use its
out type operator
InceptionV1_Logits_Conv2d_0c_1x1_Conv2D
to find the golden output file from
quantizer. Quantizer may fuse operators during performing quantization for TensorFlow
model. For Inception-v1 model, you can find the similar name dump file InceptionV1_Logits_Conv2d_0c_1x1_BiasAdd_aquant_int8.bin
(Conv2d and BiasAdd are two adjacent operators within model. _aquant_int8 is the added
suffix). Lastly, check to see if DPU output of tf_inception_v1_0_InceptionV1_Logits_Conv2d_0c_1x1_Conv2D_out0.bin
and
quantizer output of InceptionV1_Logits_Conv2d_0c_1x1_BiasAdd_aquant_int8.bin
are equal or not.
If they are the same then it can be confirmed that Inception-v1 runs well over DPU, as
expected. Nevertheless, potential issues exist related to DPU execution. Contact
Xilinx and report bugs.