Parameter |
data |
const |
data |
Allocate memory for input
data. |
|
shape |
|
data_type |
Conv2d |
in_channels |
conv2d (groups = 1) /
depthwise-conv2d (groups = input channel) |
|
If groups == input channel, the convolution would be
compiled into Depthwise-Convolution Engine. If groups == 1, the
convolution would be mapped to Convolution Engine. Otherwise, it
would be mapped to the CPU. |
out_channels |
|
kernel_size |
kernel |
stride |
stride |
padding |
pad |
padding_mode('zeros') |
pad_mode (FLOOR) |
groups |
|
dilation |
dilation |
ConvTranspose2d |
in_channels |
transposed-conv2d (groups = 1) /
depthwise-transposed-conv2d (groups = input channel) |
|
If groups == input channel, the convolution would be
compiled into Depthwise-Convolution Engine. If groups == 1, the
convolution would be mapped to Convolution Engine. Otherwise, it
would be mapped to the CPU. |
out_channels |
|
kernel_size |
kernel |
stride |
stride |
padding |
pad |
padding_mode('zeros') |
pad_mode (FLOOR) |
groups |
|
dilation |
dilation |
matmul |
|
conv2d / matmul |
transpose_a |
The matmul would be transformed to
conv2d and compiled to Convolution Engine. If the matmul fails to be
transformed, it would be implemented by CPU. |
|
transpose_b |
MaxPool2d /
AdaptiveMaxPool2d |
kernel_size |
maxpool2d |
kernel |
Pooling Engine |
stride |
stride |
padding |
pad |
ceil_mode |
pad_mode |
output_size (adaptive) |
global |
AvgPool2d /
AdaptiveAvgPool2d |
kernel_size |
avgpool2d |
kernel |
Pooling Engine |
stride |
stride |
padding |
pad |
ceil_mode |
pad_mode |
count_include_pad |
count_include_pad |
|
count_include_invalid (true) |
output_size (adaptive) |
global |
ReLU |
|
relu |
|
Activations would be fused to
adjacent operations such as convolution, add, etc. |
LeakyReLU |
negative_slope |
leakyrelu |
alpha |
ReLU6 |
|
relu6 |
|
Hardtanh |
min_val = 0 |
|
max_val = 6 |
|
ConstantPad2d / ZeroPad2d |
padding |
pad |
paddings |
"CONSTANT" padding would be fused
adjacent operations. |
value = 0 |
mode ("CONSTANT") |
add |
|
add |
|
If the add is an element-wise add, the add would be
mapped to DPU Element-wise Add Engine. If the add is a channel-wise
add, search for opportunities to fuse the add with adjacent
operations such as convolutions. If they are shape-related
operations, they would be removed during compilation. If they are
components of a coarse-grained operation, they would be fused with
adjacent operations. Otherwise, they would be compiled into CPU
implementations. |
sub / rsub |
|
sub |
|
mul |
|
mul |
|
max |
dim |
reduction_max |
axis |
keepdim |
keep_dims |
mean |
dim |
reduction_mean |
axis |
keepdim |
keep_dims |
interpolate / upsample /
upsample_bilinear / upsample_nearest |
size |
resize |
size |
If the mode of the resize is
'BILINEAR', align_corner=false, half_pixel_centers = false, size =
2, 4, 8; align_corner=false, half_pixel_centers = true, size = 2, 4
can be transformed to DPU implementations (pad+depthwise-transposed
conv2d). If the mode of the resize is 'NEAREST' and the size are
integers, the resize would be mapped to DPU
implementations. |
scale_factor |
|
mode |
mode |
align_corners |
align_corners |
|
half_pixel_centers = !align_corners |
transpose |
dim0 |
transpose |
order |
These operations would be transformed to the reshape
operation in some cases. Additionally, search for opportunities to
fuse the dimension transformation operations into special load/save
instrutions of adjacent operations to reduce the overhead.
Otherwise, they would be mapped to CPU. |
dim1 |
|
permute |
dims |
|
|
view |
size |
reshape |
shape |
flatten |
start_dim |
reshape / flatten |
start_axis |
end_dim |
end_axis |
squeeze |
dim |
reshape / squeeze |
axis |
cat |
dim |
concat |
axis |
Reduce the overhead resulting from the concat by special reading or
writing strategies and allocating the on-chip memory
carefully. |
aten::slice* |
dim |
strided_slice |
|
If the strided_slice is
shape-related or is the component of a coarse-grained operation,
it would be removed. Otherwise, the strided_slice would be compiled
into CPU implementations. |
start |
begin |
end |
end |
step |
strides |
BatchNorm2d |
eps |
depthwise-conv2d /
batchnorm |
epsilon |
If the batch_norm is quantized and
can be transformed to a depthwise-conv2d equivalently, it would be
transformed to depthwise-conv2d and the compiler would search for
compilation opportunities to map the batch_norm into DPU
implementations. Otherwise, the batch_norm would be executed by
CPU. |
|
axis |
|
moving_mean |
|
moving_var |
|
gamma |
|
beta |
softmax |
dim |
softmax |
axis |
They would only be compiled into
CPU implementations. |
Tanh |
|
tanh |
|
Sigmoid |
|
sigmoid |
|
- If the slice of tensor in PyTorch is written in the Python
syntax, it is transformed into
aten::slice .
|