Operators Supported by PyTorch - 1.4.1 English

Vitis AI User Guide (UG1414)

Document ID
UG1414
Release Date
2021-12-13
Version
1.4.1 English
Table 1. Operators Supported by PyTorch
PyTorch XIR DPU Implementation 
API Attributes OP name Attributes
Parameter   data const   data Allocate memory for input data.  
  shape
  data_type
Conv2d        in_channels conv2d (groups = 1) / depthwise-conv2d (groups = input channel)          If groups == input channel, the convolution would be compiled into Depthwise-Convolution Engine. If groups == 1, the convolution would be mapped to Convolution Engine. Otherwise, it would be mapped to the CPU.       
out_channels  
kernel_size kernel
stride stride
padding pad
padding_mode('zeros') pad_mode (FLOOR)
groups  
dilation dilation
ConvTranspose2d        in_channels transposed-conv2d (groups = 1) / depthwise-transposed-conv2d (groups = input channel)          If groups == input channel, the convolution would be compiled into Depthwise-Convolution Engine. If groups == 1, the convolution would be mapped to Convolution Engine. Otherwise, it would be mapped to the CPU.
out_channels  
kernel_size kernel
stride stride
padding pad
padding_mode('zeros') pad_mode (FLOOR)
groups  
dilation dilation
matmul    conv2d / matmul  transpose_a The matmul would be transformed to conv2d and compiled to Convolution Engine. If the matmul fails to be transformed, it would be implemented by CPU. 
  transpose_b
MaxPool2d / AdaptiveMaxPool2d     kernel_size maxpool2d     kernel Pooling Engine    
stride stride
padding pad
ceil_mode pad_mode
output_size (adaptive) global
AvgPool2d / AdaptiveAvgPool2d       kernel_size avgpool2d        kernel Pooling Engine      
stride stride
padding pad
ceil_mode pad_mode
count_include_pad count_include_pad
  count_include_invalid (true)
output_size (adaptive) global
ReLU   relu   Activations would be fused to adjacent operations such as convolution, add, etc.    
LeakyReLU negative_slope leakyrelu alpha
ReLU6   relu6    
Hardtanh  min_val = 0  
max_val = 6  
ConstantPad2d / ZeroPad2d  padding pad  paddings "CONSTANT" padding would be fused adjacent operations. 
value = 0 mode ("CONSTANT")
add   add   If the add is an element-wise add, the add would be mapped to DPU Element-wise Add Engine. If the add is a channel-wise add, search for opportunities to fuse the add with adjacent operations such as convolutions. If they are shape-related operations, they would be removed during compilation. If they are components of a coarse-grained operation, they would be fused with adjacent operations. Otherwise, they would be compiled into CPU implementations.      
sub / rsub   sub  
mul   mul  
max  dim reduction_max  axis
keepdim keep_dims
mean  dim reduction_mean  axis
keepdim keep_dims
interpolate / upsample / upsample_bilinear / upsample_nearest     size resize     size If the mode of the resize is 'BILINEAR', align_corner=false, half_pixel_centers = false, size = 2, 4, 8; align_corner=false, half_pixel_centers = true, size = 2, 4 can be transformed to DPU implementations (pad+depthwise-transposed conv2d). If the mode of the resize is 'NEAREST' and the size are integers, the resize would be mapped to DPU implementations.    
scale_factor  
mode mode
align_corners align_corners
  half_pixel_centers = !align_corners
transpose  dim0 transpose  order These operations would be transformed to the reshape operation in some cases. Additionally, search for opportunities to fuse the dimension transformation operations into special load/save instrutions of adjacent operations to reduce the overhead. Otherwise, they would be mapped to CPU.      
dim1  
permute dims    
view size reshape shape
flatten  start_dim reshape / flatten  start_axis
end_dim end_axis
squeeze dim reshape / squeeze axis
cat dim concat axis Reduce the overhead resulting from the concat by special reading or writing strategies and allocating the on-chip memory carefully.
aten::slice*    dim strided_slice   If the strided_slice is shape-related or is the component of a coarse-grained operation, it would be removed. Otherwise, the strided_slice would be compiled into CPU implementations.   
start begin
end end
step strides
BatchNorm2d      eps depthwise-conv2d / batchnorm      epsilon If the batch_norm is quantized and can be transformed to a depthwise-conv2d equivalently, it would be transformed to depthwise-conv2d and the compiler would search for compilation opportunities to map the batch_norm into DPU implementations. Otherwise, the batch_norm would be executed by CPU.
  axis
  moving_mean
  moving_var
  gamma
  beta
softmax dim softmax axis They would only be compiled into CPU implementations. 
Tanh   tanh  
Sigmoid   sigmoid  
  1. If the slice of tensor in PyTorch is written in the Python syntax, it is transformed into aten::slice.