placeholder / inputlayer* |
shape |
data |
shape |
Allocate memory for input
data. |
data_type |
const |
const |
data |
Allocate memory for const
data. |
shape |
data_type |
conv2d |
filter |
conv2d |
kernel |
Convolution Engine. |
strides |
stride |
pad([0, 0, 0, 0]) |
padding |
pad_mode(SAME or VALID) |
dilations |
dilation |
conv2d* |
kernel_size |
conv2d |
kernel |
strides |
stride |
padding |
pad([0, 0, 0, 0]) |
dilation_rate |
dilation |
use_bias |
group |
group |
depthwiseconv2dnative |
filter |
depthwise-conv2d |
kernel |
Engine. |
strides |
stride |
explicit_paddings |
pad |
padding |
pad_mode(SAME or VALID) |
dilations |
dilation |
conv2dbackpropinput /
conv2dtranspose* |
filter |
transposed-conv2d |
kernel |
Convolution Engine. |
strides |
stride |
pad([0, 0, 0, 0]) |
padding |
pad_mode(SAME or VALID) |
dilations |
dilation |
spacetobacthnd + conv2d +
batchtospacend |
block_shape |
conv2d |
dilation |
Spacetobatch, Conv2d and Batchtospace would be mapped to
Convolution Engine when specific requirements that AMD set have been met. |
padding |
pad |
filter |
kernel |
strides |
stride |
padding |
pad_mode(SAME) |
dilations |
dilation |
block_shape |
crops |
matmul / dense* |
transpose_a |
conv2d / matmul |
transpose_a |
The matmul would be transformed to a
conv2d operation once the equivalent conv2d meets the hardware
requirements and can be mapped to DPU. |
transpose_b |
transpose_b |
maxpool / maxpooling2d* / globalmaxpool2d* |
ksize |
maxpool2d |
kernel |
Pooling Engine. Attribute global will be set true when
the original pooling operator requires global reduction. |
strides |
stride |
pad([0, 0, 0, 0]) |
padding |
pad_mode(SAME or VALID) |
global |
avgpool / averagepooling2d* /
globalavgeragepooling2d* |
pool_size |
avgpool2d |
kernel |
Pooling Engine. Attribute global will be set true when
the original pooling operator requires global
reduction. |
strides |
stride |
pad([0, 0, 0, 0]) |
padding |
pad_mode(SAME or VALID) |
count_include_pad (false) |
count_include_invalid (true) |
global |
mean |
axis |
avgpool / reduction_mean |
axis |
Mean operation would be transformed
to avgpool if the equivalent avgpool meets the hardware requirements
and can be mapped to DPU. |
keep_dims |
keep_dims |
relu |
relu |
Activations would be fused to adjacent operations such
as convolution. |
relu6 |
relu6 |
leakyrelu |
alpha |
leaky_relu |
alpha |
fixneuron /
quantizelayer* |
bit_width |
fix |
bit_width |
It would be divided into float2fix
and fix2float during compilation, then the float2fix and fix2float
operations would be fused with adjacent operations into
course-grained operations. |
quantize_pos |
fix_point |
if_signed |
round_mode |
identity |
identity |
Identity would be removed. |
add, addv2 |
add |
If the add is an element-wise add, the add would be mapped to DPU
Element-wise Add Engine, if the add is a channel-wise add, AMD searches for opportunities to
fuse the add with adjacent operations such as convolutions. |
mul |
mul |
Mul can be mapped to Depthwise-Convolution Engine if one of its input
is constant. If its two inputs are in the same shape, it can be
mapped to Misc Engine as Element-wise multiplication. For some other
mul operation that is part of special operators combination, this
mul can be fused into these combinations. Otherwise, it will be
mapped to CPU. |
concatv2 / concatenate* |
axis |
concat |
axis |
AMD reduces the overhead
resulting from the concat by special reading or writing strategies
and allocating the on-chip memory carefully. |
pad / zeropadding2d* |
paddings |
pad |
paddings |
First compiler will try to fuse "CONSTANT" padding into
adjacent operations, for example, convolution and pooling. If no
such operator exists, it can still be mapped to DPU when the padding
dimension equals four and meets the hardware requirements. For
"SYMMETRIC" padding, it would be mapped to DPU. But the DPU does not
support "REFLECT" padding. |
mode |
mode |
constant_values |
shape |
shape |
The shape operation would be removed. |
stridedslice |
begin |
strided_slice |
begin |
If they are shape-related operations, they would be
removed during compilation. If they are components of a
coarse-grained operation, they would be fused with adjacent
operations. Otherwise, they would be compiled into CPU
implementations. |
end |
end |
strides |
strides |
pack |
axis |
stack |
axis |
neg |
neg |
realdiv |
div |
sub |
sub |
prod |
axis |
reduction_product |
axis |
keep_dims |
keep_dims |
sum |
axis |
reduction_sum |
axis |
keep_dims |
keep_dims |
max |
axis |
reduction_max |
axis |
keep_dims |
keep_dims |
resizebilinear |
size |
resize |
size |
If the mode of the resize is 'BILINEAR,
align_corner=false, half_pixel_centers = false, size = 2, 4, 8;
align_corner=false, half_pixel_centers = true, size = 2, 4 can be
transformed to DPU implementations (pad+depthwise-transposed
conv2d). If the resize mode is 'NEAREST' and the size is an integer,
the resize is mapped to DPU implementations. |
align_corners |
align_corners |
half_pixel_centers |
half_pixel_centers |
mode="BILINEAR" |
resizenearestneighbor |
size |
resize |
size |
align_corners |
align_corners |
half_pixel_centers |
half_pixel_centers |
mode="NEAREST" |
upsample2d/upsampling2d* |
size |
resize |
scale |
align_corners |
half_pixel_centers |
interpolation |
mode |
reshape |
shape |
reshape |
shape |
They would be transformed to the reshape operation in
some cases. Otherwise, they would be mapped to the CPU. |
reshape* |
target_shape |
transpose |
perm |
transpose |
order |
squeeze |
axis |
squeeze |
axis |
exp |
exp |
They would only be compiled into
CPU implementations. |
softmax |
axis |
softmax |
axis |
sigmoid |
sigmoid |
square+ rsqrt+ maximum |
l2_normalize |
axis |
output = x / sqrt(max(sum(x ^ 2),
epsilon)) would be fused into a l2_normalize in XIR. |
epsilon |
- The OPs in TensorFlow listed above are supported in XIR. All of them
have CPU implementations in the tool chain.
- Operators with * represent the version of TensorFlow > 2.0.