For each DPU task in this mode, all its boundary input tensors and output tensors
together with its intermediate feature maps stay within one physical continuous memory
buffer, which is allocated automatically while calling dpuCreateTask()
to instantiate one DPU task from one DPU kernel. This DPU memory buffer can be cached in
order to optimize memory access from the ARM CPU side. Cache flushing and invalidation
is handled by N2Cube. Therefore, you don’t need to take care of DPU memory management
and cache manipulation. It is very easy to deploy models with unique memory model, which
is the case for most of the
Vitis™
AI samples.
You should copy unique memory model demands, that input data after pre-processing, into the boundary input tensors of DPU task’s memory buffer. After this, you can launch the DPU task for running. This may bring additional overhead as there might be situations where the pre-processed input Int8 data already stays in a physical continuous memory buffer. This buffer which can be accessed by DPU directly. One example is the camera based deep learning application. The pre-processing over each input image from the camera sensor can be accelerated by FPGA logic, such as image scaling, model normalization, and Float32-to-Int8 quantization. The log result data is then logged to the physical continuous memory buffer. With a unique memory model, this data must be copied to DPU input memory buffer again.