placeholder / inputlayer* |
shape |
data |
shape |
Allocate memory for input
data. |
data_type |
const |
|
const |
data |
Allocate memory for const
data. |
|
shape |
|
data_type |
conv2d |
filter |
conv2d |
kernel |
Convolution Engine. |
strides |
|
stride |
|
|
pad([0, 0, 0, 0]) |
padding |
|
pad_mode(SAME or VALID) |
dilations |
|
dilation |
conv2d* |
kernel_size |
conv2d |
kernel |
strides |
stride |
padding |
pad([0, 0, 0, 0]) |
dilation_rate |
dilation |
use_bias |
|
group |
group |
depthwiseconv2dnative |
filter |
depthwise-conv2d |
kernel |
Depthwise-Convolution
Engine. |
strides |
stride |
explicit_paddings |
pad |
padding |
pad_mode(SAME or VALID) |
dilations |
dilation |
conv2dbackpropinput /
conv2dtranspose* |
filter |
transposed-conv2d |
kernel |
Convolution Engine. |
strides |
stride |
|
pad([0, 0, 0, 0]) |
padding |
pad_mode(SAME or VALID) |
dilations |
dilation |
spacetobacthnd + conv2d +
batchtospacend |
block_shape |
conv2d |
dilation |
Spacetobatch, Conv2d and
Batchtospace would be mapped to Convolution Engine when specific
requirements that Xilinx sets have been met. |
padding |
pad |
filter |
kernel |
strides |
stride |
padding |
pad_mode(SAME) |
dilations |
dilation |
block_shape |
|
crops |
|
matmul / dense* |
transpose_a |
conv2d / matmul |
transpose_a |
The matmul would be transformed to a
conv2d operation once the equivalent conv2d meets the hardware
requirements and can be mapped to DPU. |
transpose_b |
transpose_b |
maxpool / maxpooling2d* |
ksize |
maxpool2d |
kernel |
Pooling Engine. |
strides |
stride |
|
pad([0, 0, 0, 0]) |
padding |
pad_mode(SAME or VALID) |
avgpool / averagepooling2d* /
globalavgeragepooling2d* |
pool_size |
avgpool2d |
kernel |
Pooling Engine. |
strides |
stride |
|
pad([0, 0, 0, 0]) |
padding |
pad_mode(SAME or VALID) |
|
count_include_pad (false) |
|
count_include_invalid (true) |
mean |
axis |
avgpool / reduction_mean |
axis |
Mean operation would be transformed
to avgpool if the equivalent avgpool meets the hardware requirements
and can be mapped to DPU. |
keep_dims |
keep_dims |
relu |
|
relu |
|
Activations would be fused to
adjacent operations such as convolution, add, etc. |
relu6 |
|
relu6 |
|
leakyrelu |
alpha |
leaky_relu |
alpha |
fixneuron /
quantizelayer* |
bit_width |
fix |
bit_width |
It would be divided into float2fix
and fix2float during compilation, then the float2fix and fix2float
operations would be fused with adjacent operations into
course-grained operations. |
quantize_pos |
fix_point |
|
if_signed |
|
round_mode |
identity |
|
identity |
|
Identity would be removed. |
add, addv2 |
|
add |
|
If the add is an element-wise add, the add would
be mapped to DPU Element-wise Add Engine, if the add is an
channel-wise add, Xilinx searches for opportunities to fuse the add
with adjacent operations such as convolutions. |
concatv2 / concatenate* |
axis |
concat |
axis |
Xilinx reduces the overhead resulting from the
concat by special reading or writing strategies and allocating the
on-chip memory carefully. |
pad / zeropadding2d* |
paddings |
pad |
paddings |
"CONSTANT" padding would be fused
adjacent operations. "SYMMETRIC" padding would be mapped to DPU
instructions. "REFLECT" padding is not supported by DPU
yet. |
mode |
mode |
|
constant_values |
shape |
|
shape |
|
The shape operation would be removed. |
stridedslice |
begin |
stridedslice |
begin |
If they are shape-related
operations, they would be removed during compilation. If they are
components of a coarse-grained operation, they would be fused with
adjacent operations. Otherwise, they would be compiled into CPU
implementations. |
end |
end |
strides |
strides |
pack |
axis |
stack |
axis |
neg |
|
neg |
|
mul |
|
mul |
|
realdiv |
|
div |
|
sub |
|
sub |
|
prod |
axis |
reduction_product |
axis |
keep_dims |
keep_dims |
sum |
axis |
reduction_sum |
axis |
keep_dims |
keep_dims |
max |
axis |
reduction_max |
axis |
keep_dims |
keep_dims |
resizebilinear |
size |
resize |
size |
If the mode of the resize is
'BILINEAR', align_corner=false, half_pixel_centers = false, size =
2, 4, 8; align_corner=false, half_pixel_centers = true, size = 2, 4
can be transformed to DPU implementations (pad+depthwise-transposed
conv2d). If the mode of the resize is 'NEAREST' and the size is an
integer, the resize would be mapped to DPU
implementations. |
align_corners |
align_corners |
half_pixel_centers |
half_pixel_centers |
|
mode="BILINEAR" |
resizenearestneighbor |
size |
resize |
size |
align_corners |
align_corners |
half_pixel_centers |
half_pixel_centers |
|
mode="NEAREST" |
upsample2d/upsampling2d* |
size |
resize |
scale |
|
align_corners |
|
half_pixel_centers |
interpolation |
mode |
reshape |
shape |
reshape |
shape |
They would be transformed to the
reshape operation in some cases. Otherwise they would be mapped to
CPU. |
reshape* |
target_shape |
transpose |
perm |
transpose |
order |
squeeze |
axis |
squeeze |
axis |
exp |
|
exp |
|
They would only be compiled into
CPU implementations. |
softmax |
axis |
softmax |
axis |
sigmoid |
|
sigmoid |
|
square+ rsqrt+ maximum |
|
l2_normalize |
axis |
output = x / sqrt(max(sum(x ^ 2),
epsilon)) would be fused into a l2_normalize in XIR. |
|
epsilon |
- The OPs in TensorFlow listed above are
supported in XIR. All of them have CPU implementations in
the tool-chain.
- Operators with * represent that the version
of TensorFlow > 2.0.
|