Split IO memory model is introduced to resolve the limitation within unique memory model
so that data coming from other physical memory buffer can be consumed by DPU directly.
When calling dpuCreateTask()
to create DPU task from the DPU kernel
compiled with options -split-io-mem
, N2Cube only allocates DPU memory
buffer for the intermediate feature maps. It is up to the users to allocate the physical
continuous memory buffers for boundary input tensors and output tensors individually.
The size of input memory buffer and output memory buffer can be found from compiler
building log with the field names Input Mem Size and Output Mem Size. The users also
need to take care of cache coherence if these memory buffers can be cached.
DNNDK sample split_io
provides a programming reference for split IO
memory model, and the TensorFlow model SSD is used. There is one input tensor image:0,
and two output tensors ssd300_concat:0
and
ssd300_concat_1:0
for SSD model. From compiler building log, you can see
that the size of DPU input memory buffer (for tensor image:0) is 270000, and the size of
DPU output memory buffer (for output tensors ssd300_concat:0
and
ssd300_concat_1:0
) is 218304. Then dpuAllocMem()
is used to allocate memory buffers for them.
dpuBindInputTensorBaseAddress()
and
dpuBindOutputTensorBaseAddress()
are subsequently used to bind the
input/output memory buffer address to DPU task before launching its execution. After the
input data is fed into DPU input memory buffer, dpuSyncMemToDev()
is
called to flush cache line. When DPU task completes running,
dpuSyncDevToMem()
is called to invalidate the cache line.
dpuAllocMem()
, dpuFreeMem()
, dpuSyncMemToDev()
and
dpuSyncDevToMem()
are provided only as
demonstration purpose for split IO memory model. They are not expected to be used
directly in your production environment. It is up to you whether you want to implement
such functionalities to better meet customized requirements.