Parameter/tensor/zeros |
data |
const |
data |
Allocate memory for input
data. |
|
shape |
|
data_type |
Conv2d |
in_channels |
conv2d (groups = 1) /
depthwise-conv2d (groups = input channel) |
|
If groups == input channel, the
convolution would be compiled into Depthwise-Convolution Engine. If
groups == 1, the convolution would be mapped to Convolution Engine.
Otherwise, it would be mapped to the CPU. |
out_channels |
|
kernel_size |
kernel |
stride |
stride |
padding |
pad |
padding_mode('zeros') |
pad_mode (FLOOR) |
groups |
|
dilation |
dilation |
ConvTranspose2d |
in_channels |
transposed-conv2d (groups = 1) /
depthwise-transposed-conv2d (groups = input channel) |
|
If groups == input channel, the convolution would be
compiled into Depthwise-Convolution Engine. If groups == 1, the
convolution would be mapped to Convolution Engine. Otherwise, it
would be mapped to the CPU. For the output_padding feature, DPU is
not supported yet, so if the value is not all 0, this operator will
be assigned to CPU. |
out_channels |
|
kernel_size |
kernel |
stride |
stride |
padding |
pad |
output_padding |
output_padding |
padding_mode('zeros') |
pad_mode (FLOOR) |
groups |
|
dilation |
dilation |
matmul |
|
conv2d / matmul |
transpose_a |
The matmul would be transformed to conv2d and compiled to
Convolution Engine. If the matmul fails to be transformed, it would
be implemented by the CPU. |
|
transpose_b |
MaxPool2d /
AdaptiveMaxPool2d |
kernel_size |
maxpool2d |
kernel |
Pooling Engine |
stride |
stride |
padding |
pad |
ceil_mode |
pad_mode |
output_size (adaptive) |
global |
AvgPool2d /
AdaptiveAvgPool2d |
kernel_size |
avgpool2d |
kernel |
Pooling Engine |
stride |
stride |
padding |
pad |
ceil_mode |
pad_mode |
count_include_pad |
count_include_pad |
|
count_include_invalid (true) |
output_size (adaptive) |
global |
ReLU |
|
relu |
|
Activations would be fused to adjacent operations such as
convolution. |
LeakyReLU |
negative_slope |
leakyrelu |
alpha |
ReLU6 |
|
relu6 |
|
Hardtanh |
min_val = 0 |
|
max_val = 6 |
|
Hardsigmoid |
|
hard-sigmoid |
|
Hardswish |
|
hardswish |
|
ConstantPad2d / ZeroPad2d |
padding |
pad |
paddings |
First compiler will try to fuse "CONSTANT" padding into
adjacent operations, for example, convolution and pooling. If no
such operator exists, it can still be mapped to DPU when the padding
dimension equals four and meets the hardware requirements. |
value = 0 |
constant_values |
|
mode ("CONSTANT") |
add |
|
add |
|
If the add is an element-wise add, the add would be
mapped to DPU Element-wise Add Engine. If the add is a channel-wise
add, search for opportunities to fuse the add with adjacent
operations such as convolutions. If they are shape-related
operations, they would be removed during compilation. If they are
components of a coarse-grained operation, they would be fused with
adjacent operations. Otherwise, they would be compiled into CPU
implementations. Mul can be mapped to Depthwise-Convolution Engine
if one of its inputs is constant. If its two inputs are in the same
shape, it may be mapped to Misc Engine as Element-wise
multiplication. For some other mul operation that is part of special
operators combination, this mul can be fused into these
combinations. Otherwise, it will be mapped to the CPU. |
sub / rsub |
|
sub |
|
mul |
|
mul |
|
neg |
|
neg |
|
sum |
dim |
reduction_sum |
axis |
keepdim |
keep_dims |
max |
dim |
reduction_max |
axis |
keepdim |
keep_dims |
mean |
dim |
reduction_mean |
axis |
keepdim |
keep_dims |
interpolate / upsample /
upsample_bilinear / upsample_nearest |
size |
resize |
size |
If the mode of the resize is 'BILINEAR',
align_corner=false, half_pixel_centers = false, size = 2, 4, 8;
align_corner=false, half_pixel_centers = true, size = 2, 4 can be
transformed to DPU implementations (pad+depthwise-transposed
conv2d). If the resize mode is 'NEAREST' and the size is integers,
the resize would be mapped to DPU implementations. |
scale_factor |
|
mode |
mode |
align_corners |
align_corners |
|
half_pixel_centers = !align_corners |
transpose |
dim0 |
transpose |
order |
These operations would be transformed to the reshape
operation in some cases. Additionally, search for opportunities to
fuse the dimension transformation operations into special load or
save instructions of adjacent operations to reduce the overhead.
Otherwise, they would be mapped to the CPU. |
dim1 |
|
permute |
dims |
|
|
view/reshape |
size |
reshape |
shape |
flatten |
start_dim |
reshape/flatten |
start_axis |
end_dim |
end_axis |
squeeze |
dim |
reshape / squeeze |
axis |
cat |
dim |
concat |
axis |
Reduce the overhead resulting from the concat by
special reading or writing strategies and allocating the on-chip
memory carefully. |
aten::slice* |
dim |
strided_slice |
|
If the strided_slice is
shape-related or is the component of a coarse-grained operation, it
would be removed. Otherwise, the strided_slice would be compiled
into CPU implementations. |
start |
begin |
end |
end |
step |
strides |
BatchNorm2d |
eps |
depthwise-conv2d /
scale |
epsilon |
If the batch_norm is quantized and can be transformed to
a depthwise-conv2d equivalently, it would be transformed to
depthwise-conv2d and the compiler would search for compilation
opportunities to map the batch_norm into DPU implementations.
Otherwise, the batch_norm would be executed by the CPU. |
|
axis |
|
moving_mean |
|
moving_var |
|
gamma |
|
beta |
softmax |
dim |
softmax |
axis |
They would only be compiled into
CPU implementations. |
Tanh |
|
tanh |
|
Sigmoid |
|
sigmoid |
|
PixelShuffle |
upscale_factor |
pixel_shuffle |
scale |
They would be transformed to tile if convolution were
input. |
|
|
|
upscale=True |
PixelUnshuffle |
downscale_factor |
pixel_shuffle |
scale |
|
|
|
upscale=False |
- If the slice of tensor in PyTorch is
written in the Python syntax, it is transformed into
aten::slice .
|