placeholder / inputlayer* |
shape |
data |
shape |
Allocate memory for input
data. |
data_type |
const |
|
const |
data |
Allocate memory for const
data. |
|
shape |
|
data_type |
conv2d |
filter |
conv2d |
kernel |
Convolution Engine. |
strides |
|
stride |
|
|
pad([0, 0, 0, 0]) |
padding |
|
pad_mode(SAME or VALID) |
dilations |
|
dilation |
conv2d* |
kernel_size |
conv2d |
kernel |
strides |
stride |
padding |
pad([0, 0, 0, 0]) |
dilation_rate |
dilation |
use_bias |
|
group |
group |
depthwiseconv2dnative |
filter |
depthwise-conv2d |
kernel |
Depthwise-Convolution
Engine. |
strides |
stride |
explicit_paddings |
pad |
padding |
pad_mode(SAME or VALID) |
dilations |
dilation |
conv2dbackpropinput /
conv2dtranspose* |
filter |
transposed-conv2d |
kernel |
Convolution Engine. |
strides |
stride |
|
pad([0, 0, 0, 0]) |
padding |
pad_mode(SAME or VALID) |
dilations |
dilation |
spacetobacthnd + conv2d +
batchtospacend |
block_shape |
conv2d |
dilation |
Spacetobatch, Conv2d and
Batchtospace would be mapped to Convolution Engine when specific
requirements that Xilinx sets have been met. |
padding |
pad |
filter |
kernel |
strides |
stride |
padding |
pad_mode(SAME) |
dilations |
dilation |
block_shape |
|
crops |
|
matmul / dense* |
transpose_a |
conv2d / matmul |
transpose_a |
The matmul would be transformed to a
conv2d operation once the equivalent conv2d meets the hardware
requirements and can be mapped to DPU. |
transpose_b |
transpose_b |
maxpool / maxpooling2d* / globalmaxpool2d* |
ksize |
maxpool2d |
kernel |
Pooling Engine. Attribute global will be set true when
original pooling operator requires global reduction. |
strides |
stride |
|
pad([0, 0, 0, 0]) |
padding |
pad_mode(SAME or VALID) |
|
global |
avgpool / averagepooling2d* /
globalavgeragepooling2d* |
pool_size |
avgpool2d |
kernel |
Pooling Engine. Attribute global will be set true when
original pooling operator requires global reduction. |
strides |
stride |
|
pad([0, 0, 0, 0]) |
padding |
pad_mode(SAME or VALID) |
|
count_include_pad (false) |
|
count_include_invalid (true) |
|
global |
mean |
axis |
avgpool / reduction_mean |
axis |
Mean operation would be transformed
to avgpool if the equivalent avgpool meets the hardware requirements
and can be mapped to DPU. |
keep_dims |
keep_dims |
relu |
|
relu |
|
Activations would be fused to
adjacent operations such as convolution, add, etc. |
relu6 |
|
relu6 |
|
leakyrelu |
alpha |
leaky_relu |
alpha |
fixneuron /
quantizelayer* |
bit_width |
fix |
bit_width |
It would be divided into float2fix
and fix2float during compilation, then the float2fix and fix2float
operations would be fused with adjacent operations into
course-grained operations. |
quantize_pos |
fix_point |
|
if_signed |
|
round_mode |
identity |
|
identity |
|
Identity would be removed. |
add, addv2 |
|
add |
|
If the add is an element-wise add, the add would
be mapped to DPU Element-wise Add Engine, if the add is an
channel-wise add, Xilinx searches for opportunities to fuse the add
with adjacent operations such as convolutions. |
mul |
|
mul |
|
Mul can be mapped to Depthwise-Convolution Engine if one of its input
is constant. If its two inputs are in same shape, it may be mapped
to Misc Engine as Element-wise multiplication. For some other mul
operation that is a part of special operators combination, then this
mul can be fused into these combination. Otherwise it will be mapped
to CPU. |
concatv2 / concatenate* |
axis |
concat |
axis |
Xilinx reduces the overhead resulting from the
concat by special reading or writing strategies and allocating the
on-chip memory carefully. |
pad / zeropadding2d* |
paddings |
pad |
paddings |
First compiler will try to fuse "CONSTANT" padding into
adjacent operations, e.g. convolution and pooling. If there is no
such operator, it still can be mapped to DPU when padding dimension
equals 4 and meets the hardware requirements. For "SYMMETRIC"
padding, it would be mapped to DPU. But "REFLECT" padding is not
supported by DPU. |
mode |
mode |
|
constant_values |
shape |
|
shape |
|
The shape operation would be removed. |
stridedslice |
begin |
strided_slice |
begin |
If they are shape-related operations, they would be
removed during compilation. If they are components of a
coarse-grained operation, they would be fused with adjacent
operations. Otherwise, they would be compiled into CPU
implementations. |
end |
end |
strides |
strides |
pack |
axis |
stack |
axis |
neg |
|
neg |
|
realdiv |
|
div |
|
sub |
|
sub |
|
prod |
axis |
reduction_product |
axis |
keep_dims |
keep_dims |
sum |
axis |
reduction_sum |
axis |
keep_dims |
keep_dims |
max |
axis |
reduction_max |
axis |
keep_dims |
keep_dims |
resizebilinear |
size |
resize |
size |
If the mode of the resize is
'BILINEAR', align_corner=false, half_pixel_centers = false, size =
2, 4, 8; align_corner=false, half_pixel_centers = true, size = 2, 4
can be transformed to DPU implementations (pad+depthwise-transposed
conv2d). If the mode of the resize is 'NEAREST' and the size is an
integer, the resize would be mapped to DPU
implementations. |
align_corners |
align_corners |
half_pixel_centers |
half_pixel_centers |
|
mode="BILINEAR" |
resizenearestneighbor |
size |
resize |
size |
align_corners |
align_corners |
half_pixel_centers |
half_pixel_centers |
|
mode="NEAREST" |
upsample2d/upsampling2d* |
size |
resize |
scale |
|
align_corners |
|
half_pixel_centers |
interpolation |
mode |
reshape |
shape |
reshape |
shape |
They would be transformed to the
reshape operation in some cases. Otherwise they would be mapped to
CPU. |
reshape* |
target_shape |
transpose |
perm |
transpose |
order |
squeeze |
axis |
squeeze |
axis |
exp |
|
exp |
|
They would only be compiled into
CPU implementations. |
softmax |
axis |
softmax |
axis |
sigmoid |
|
sigmoid |
|
square+ rsqrt+ maximum |
|
l2_normalize |
axis |
output = x / sqrt(max(sum(x ^ 2),
epsilon)) would be fused into a l2_normalize in XIR. |
|
epsilon |
- The OPs in TensorFlow listed above are
supported in XIR. All of them have CPU implementations in
the tool-chain.
- Operators with * represent that the version
of TensorFlow > 2.0.
|