The output buffer length is calculated using the formula shown in the OUT_BUFFER_LEN table: ceil((TP_F_LEN + TP_G_LEN - 1), LANES) for FULL mode, ceil(TP_F_LEN, LANES) for SAME mode, and ceil((TP_F_LEN - TP_G_LEN + 1), LANES) for VALID mode.
Here, LANES is the number of parallel data lanes available in the AIE hardware, ensuring the buffer is always sized to the next multiple of the hardware lanes for efficient vector processing.
Formula for ceil:
ceil(a,b) ==> (((a+b-1)/b) * b)
| TP_COMPUTE_MODE | MODE NAME | OUT_BUFFER_LEN |
|---|---|---|
| 0 | FULL | ceil((TP_F_LEN + TP_G_LEN - 1), LANES) |
| 1 | SAME | ceil(TP_F_LEN, LANES) |
| 2 | VALID | ceil((TP_F_LEN - TP_G_LEN + 1), LANES) |
Where:
TP_F_LENis the length of input F vector.TP_G_LENis the length of input G vector.LANESis the number of parallel data lanes available in the AIE hardware, which depends on the data type combination used. See LANES
| InputF Data Type | InputG Data Type | Output Data Type | AIE-1 Lanes | AIE-ML Lanes | AIE-MLv2 Lanes |
|---|---|---|---|---|---|
| int8 | int8 | int16 | 0 | 32 | 64 |
| int16 | int16 | int32 | 16 | 16 | 32 |
| int32 | int16 | int32 | 8 | 16 | 32 |
| cint16 | int16 | cint16 | 8 | 16 | 32 |
| cint16 | int16 | cint32 | 8 | 16 | 32 |
| cint16 | int32 | cint32 | 8 | 16 | 16 |
| cint16 | cint16 | cint32 | 8 | 16 | 16 |
| cint32 | int16 | cint32 | 4 | 16 | 16 |
| cint32 | cint16 | cint32 | 4 | 8 | 16 |
| float | float | float | 8 | 32 | 16 |
| cfloat | float | cfloat | 4 | 0 | 0 |
| cfloat | cfloat | cfloat | 4 | 0 | 0 |
| bfloat16 | bfloat16 | float | 0 | 16 | 64 |
Note
Please refer to UG1603 Number of Lanes supported by Sliding Multiplication.
Example Config:
Data_F - int16, Data_G - int16, Data_Out - int32, Func_Type = 1 (conv), compute_mode = 0 (FULL), 1 (SAME), 2 (VALID), F_LEN = 64, G_LEN = 32. in_F[F_LEN] = [1, 2, 3, ..., 64] in_G[G_LEN] = [1, 2, 3, ..., 32] FULL Mode: OUT_DATA_LEN = (TP_F_LEN + TP_G_LEN - 1) --> (64+32-1) --> 95 LANES = 16 for int16xint16 data combo Output_Buffer_len = ceil(95,16) --> (((95+16-1)/16)*16) --> ((110/16)*16) --> (6*16)--> 96 Therefore, the output buffer has 95 valid output samples and 1 zero sample. SAME Mode: OUT_DATA_LEN = TP_F_LEN --> 64 LANES = 16 for int16xint16 data combo Output_Buffer_len = ceil(64,16) --> (((64+16-1)/16)*16) --> ((79/16)*16) --> (4*16)--> 64 Therefore, the output buffer has 64 valid output samples. VALID Mode: OUT_DATA_LEN = (TP_F_LEN - TP_G_LEN + 1) --> (64-32+1) --> 33 LANES = 16 for int16xint16 data combo Output_Buffer_len = ceil(33,16) --> (((33+16-1)/16)*16) --> ((48/16)*16) --> (3*16)--> 48 Therefore, the output buffer has 33 valid output samples and 15 zero samples.