Parameter/tensor/zeros |
data |
const |
data |
Allocate memory for input
data. |
|
shape |
|
data_type |
Conv2d |
in_channels |
conv2d (groups = 1) /
depthwise-conv2d (groups = input channel) |
|
If groups == input channel, the
convolution would be compiled into Depthwise-Convolution Engine. If
groups == 1, the convolution would be mapped to Convolution Engine.
Otherwise, it would be mapped to the CPU. |
out_channels |
|
kernel_size |
kernel |
stride |
stride |
padding |
pad |
padding_mode('zeros') |
pad_mode (FLOOR) |
groups |
|
dilation |
dilation |
ConvTranspose2d |
in_channels |
transposed-conv2d (groups = 1) /
depthwise-transposed-conv2d (groups = input channel) |
|
If groups == input channel, the
convolution would be compiled into Depthwise-Convolution Engine. If
groups == 1, the convolution would be mapped to Convolution Engine.
Otherwise, it would be mapped to the CPU. |
out_channels |
|
kernel_size |
kernel |
stride |
stride |
padding |
pad |
padding_mode('zeros') |
pad_mode (FLOOR) |
groups |
|
dilation |
dilation |
matmul |
|
conv2d / matmul |
transpose_a |
The matmul would be transformed to
conv2d and compiled to Convolution Engine. If the matmul fails to be
transformed, it would be implemented by CPU. |
|
transpose_b |
MaxPool2d /
AdaptiveMaxPool2d |
kernel_size |
maxpool2d |
kernel |
Pooling Engine |
stride |
stride |
padding |
pad |
ceil_mode |
pad_mode |
output_size (adaptive) |
global |
AvgPool2d /
AdaptiveAvgPool2d |
kernel_size |
avgpool2d |
kernel |
Pooling Engine |
stride |
stride |
padding |
pad |
ceil_mode |
pad_mode |
count_include_pad |
count_include_pad |
|
count_include_invalid (true) |
output_size (adaptive) |
global |
ReLU |
|
relu |
|
Activations would be fused to adjacent operations such as
convolution and add. |
LeakyReLU |
negative_slope |
leakyrelu |
alpha |
ReLU6 |
|
relu6 |
|
Hardtanh |
min_val = 0 |
|
max_val = 6 |
|
Hardsigmoid |
|
hard-sigmoid |
|
Hardswish |
|
hardswish |
|
ConstantPad2d / ZeroPad2d |
padding |
pad |
paddings |
"CONSTANT" padding would be fused
adjacent operations. |
value = 0 |
mode ("CONSTANT") |
add |
|
add |
|
If the add is an element-wise add,
the add would be mapped to DPU Element-wise Add Engine. If the add
is a channel-wise add, search for opportunities to fuse the add with
adjacent operations such as convolutions. If they are shape-related
operations, they would be removed during compilation. If they are
components of a coarse-grained operation, they would be fused with
adjacent operations. Otherwise, they would be compiled into CPU
implementations. |
sub / rsub |
|
sub |
|
mul |
|
mul |
|
neg |
|
neg |
|
sum |
dim |
reduction_sum |
axis |
keepdim |
keep_dims |
max |
dim |
reduction_max |
axis |
keepdim |
keep_dims |
mean |
dim |
reduction_mean |
axis |
keepdim |
keep_dims |
interpolate / upsample /
upsample_bilinear / upsample_nearest |
size |
resize |
size |
If the mode of the resize is
'BILINEAR', align_corner=false, half_pixel_centers = false, size =
2, 4, 8; align_corner=false, half_pixel_centers = true, size = 2, 4
can be transformed to DPU implementations (pad+depthwise-transposed
conv2d). If the mode of the resize is 'NEAREST' and the size are
integers, the resize would be mapped to DPU
implementations. |
scale_factor |
|
mode |
mode |
align_corners |
align_corners |
|
half_pixel_centers = !align_corners |
transpose |
dim0 |
transpose |
order |
These operations would be transformed to the reshape
operation in some cases. Additionally, search for opportunities to
fuse the dimension transformation operations into special load or
save instructions of adjacent operations to reduce the overhead.
Otherwise, they would be mapped to CPU. |
dim1 |
|
permute |
dims |
|
|
view/reshape |
size |
reshape |
shape |
flatten |
start_dim |
reshape / flatten |
start_axis |
end_dim |
end_axis |
squeeze |
dim |
reshape / squeeze |
axis |
cat |
dim |
concat |
axis |
Reduce the overhead resulting from the concat by
special reading or writing strategies and allocating the on-chip
memory carefully. |
aten::slice* |
dim |
strided_slice |
|
If the strided_slice is
shape-related or is the component of a coarse-grained operation, it
would be removed. Otherwise, the strided_slice would be compiled
into CPU implementations. |
start |
begin |
end |
end |
step |
strides |
BatchNorm2d |
eps |
depthwise-conv2d /
scale |
epsilon |
If the batch_norm is quantized and
can be transformed to a depthwise-conv2d equivalently, it would be
transformed to depthwise-conv2d and the compiler would search for
compilation opportunities to map the batch_norm into DPU
implementations. Otherwise, the batch_norm would be executed by
CPU. |
|
axis |
|
moving_mean |
|
moving_var |
|
gamma |
|
beta |
softmax |
dim |
softmax |
axis |
They would only be compiled into
CPU implementations. |
Tanh |
|
tanh |
|
Sigmoid |
|
sigmoid |
|
PixelShuffle |
upscale_factor |
pixel_shuffle |
scale |
They would be transformed to tile if there's convolution
as its input. |
|
|
|
upscale=True |
PixelUnshuffle |
downscale_factor |
pixel_shuffle |
scale |
|
|
|
upscale=False |
- If the slice of tensor in PyTorch is
written in the Python syntax, it is transformed into
aten::slice .
|