input |
shape |
data |
shape |
Allocate memory for input
data. |
|
data_type |
convolution |
kernel_size |
conv2d (group = 1) /
depthwise-conv2d (group = input channel) |
kernel |
If group == input channel, the
convolution would be compiled into Depthwise-Convolution Engine, if
group == 1, the convolution would be mapped to Convolution Engine.
Otherwise, it would be mapped to CPU. |
stride |
stride |
pad |
pad |
|
pad_mode (FLOOR) |
dilation |
dilation |
bias_term |
|
num_output |
|
group |
|
deconvolution |
kernel_size |
transposed-conv2d (group = 1) /
depthwise-transposed-conv2d (group = input channel) |
kernel |
If group == input channel, the deconvolution would be
compiled into Depthwise-Convolution Engine, if group == 1, the
deconvolution would be mapped to Convolution Engine. Otherwise, it
would be mapped to CPU. |
stride |
stride |
pad |
pad |
|
pad_mode (FLOOR) |
dilation |
dilation |
bias_term |
|
num_output |
|
group |
|
innerproduct |
bias_term |
conv2d / matmul |
transpose_a |
The inner-product would be
transformed to matmul, then the matmul would be transformed to
conv2d and compiled to Convolution Engine. If the inner-product
fails to be transformed, it would be implemented by CPU. |
num_output |
transpose_b |
scale |
bias_term |
depthwise-conv2d / scale |
|
The scale would be transformed to
depthwise-convolution, otherwise, it would be mapped to CPU. |
pooling |
kernel_size |
maxpool2d (pool_method = 0) /
avgpool2d (pool_method = 1) |
kernel_size |
Pooling Engine. |
stride |
stride |
global_pooling |
global |
pad |
pad |
pool_method |
pad_mode(CEIL) |
|
count_include_pad (true) |
|
count_include_invalid (false) |
eltwise |
coeff = 1 |
add |
|
Element-wise Add Engine. |
operation = SUM |
|
concat |
axis |
concat |
axis |
Xilinx reduces the overhead resulting from the concat by special
reading or writing strategies and allocate the on-chip memory
carefully. |
relu |
negative_slope |
relu / leakyrelu |
alpha |
Activations would be fused to
adjacent operations such as convolution, add, etc. |
relu6 |
|
relu6 |
|
fixneuron |
bit_width |
fix |
bit_width |
It would be divided into float2fix
and fix2float during compilation, then the float2fix and fix2float
operations would be fused with adjacent operations into
course-grained operations. |
quantize_pos |
fix_point |
|
if_signed |
|
round_mode |
reshape |
shape |
reshape |
shape |
These operations are shape-related
operations, they would be removed or transformed into reshape in
most cases, which would not affect the on-chip data layout.
Otherwise, they would be compiled to CPU. |
permute |
order |
reshape / transpose |
order |
flatten |
axis |
reshape / flatten |
start_axis |
|
end_axis |
|
end_axis |
reorg |
strides |
reorg |
strides |
If the reorg meets the hardware
requirements, it would be mapped to DPU implementations. |
reverse |
reverse |
deephiresize |
scale |
resize |
size |
If the mode of the resize is
'BILINEAR', align_corner=false, half_pixel_centers = false, size =
2, 4, 8; align_corner=false, half_pixel_centers = true, size = 2, 4
can be transformed to DPU implementations (pad+depthwise-transposed
conv2d). If the mode of the resize is 'NEAREST' and the size is an
integer, the resize would be mapped to DPU
implementations. |
mode |
mode |
|
align_corners=false |
|
half_pixel_centers=false |
gstiling |
strides |
gstiling |
stride |
If the strides of gstiling are
integers, it may be mapped into special DPU read/write
instructions. |
reverse |
reverse |
slice |
axis |
strided_slice |
begin |
They would only be compiled into
CPU implementations. |
slice_point |
end |
|
strides |
priorbox |
min_sizes |
priorbox |
min_sizes |
max_sizes |
max_sizes |
aspect_ratio |
aspect_ratio |
flip |
flip |
clip |
clip |
variance |
variance |
step |
step |
offset |
offset |
softmax |
axis |
softmax |
axis |