XMC THROUGHPUT_FACTOR - 2024.2 English

Vitis Model Composer User Guide (UG1483)

Document ID
UG1483
Release Date
2024-11-13
Version
2024.2 English
The Model Composer THROUGHPUT_FACTOR pragma provides some control over the throughput of an xmcImportFunction block. You can add the THROUGHPUT_FACTOR pragma to your function header file, along with the SUPPORTS_STREAMING pragma as shown in the following example:
#pragma XMC THROUGHPUT_FACTOR TF_param: 1,2,4
#pragma XMC SUPPORTS_STREAMING
template<int ROWS, int COLS, int TF_param>
void DilationWrap(const uint8_t src[ROWS][COLS], uint8_t dst[ROWS][COLS])
The syntax of the pragma as shown in the prior example is:
#pragma XMC THROUGHPUT_FACTOR TF_param: 1,2,4

Where:

  • The TF_param must be an int type template parameter, as is in the example above.
  • It is optional, though recommended, to specify any specific throughput factors that are supported by the function. In the example above, 1,2,4 specifies the supported throughput factors in the pragma, expressed as positive integers, and must include the value 1. If you do not explicitly specify the throughput factors, the TF_param is assumed to be valid for any positive throughput factor up to the upper limit of 16 that is supported by Model Composer.
As discussed in Controlling the Throughput of the Implementation, you specify the throughput factor for the model in the Model Composer Hub block. You can specify a throughput factor for the Hub block that divides evenly into one of the THROUGHPUT_FACTOR values on the xmcImportFunction block.
Important: If the throughput factor of the Hub block does not match, or does not divide evenly into the THROUGHPUT_FACTOR specified by the xmcImportFunction block, then the throughput is reduced to 1 for the block function.

Note the following requirements:

  • THROUGHPUT_FACTOR pragma must be used on Template functions.
  • THROUGHPUT_FACTOR pragma must be used with SUPPORTS_STREAMING pragma.
  • Only one THROUGHPUT_FACTOR pragma can be specified for an xmcImportFunction block.
  • The block function will be called with actual arguments that have cyclic ARRAY_RESHAPE directives with factor=TF (see example below). For more information on the ARRAY_RESHAPE pragma, refer to HLS Pragmas in the Vitis Unified Software Platform Acceleration Development Reference Guide (UG1702).
  • The read accesses from a non-scalar input argument of the function should be compliant with the requirements for streaming, and AMD Vitis™ HLS should be able to combine groups of TF reads into 1 read of the reshaped array.
  • The write accesses into a non-scalar output argument of the function should be compliant with the requirements for streaming, and AMD Vitis™ HLS should be able to combine groups of TF writes into 1 write of the reshaped array.
The following is an example function specifying both SUPPORTS_STREAMING and THROUGHPUT_FACTOR pragmas:
#include <stdint.h>
 
#pragma XMC THROUGHPUT_FACTOR TF: 1, 2, 4, 8, 16
#pragma XMC SUPPORTS_STREAMING
template<int TF>
void mac(const int32_t In1[240], const int32_t In2[240], const int32_t In3[240],
         int32_t Out1 [240])
{
    #pragma HLS ARRAY_RESHAPE variable=In1 cyclic factor=TF
    #pragma HLS ARRAY_RESHAPE variable=In2 cyclic factor=TF
    #pragma HLS ARRAY_RESHAPE variable=In3 cyclic factor=TF
    #pragma HLS ARRAY_RESHAPE variable=Out1 cyclic factor=TF
 
    for (uint32_t k0 = 0; k0 < 240 / TF; ++k0) {
        #pragma HLS pipeline II=1
        int32_t Product_in2m[TF];
        int32_t Sum_in2m[TF];
        int32_t Product_in1m[TF];
        int32_t Sum_outm[TF];
        for (uint32_t k1 = 0; k1 < TF; ++k1) {
            Product_in2m[k1] = In2[(k0 * TF + k1)];
        }
        for (uint32_t k1 = 0; k1 < TF; ++k1) {
            Sum_in2m[k1] = In3[(k0 * TF + k1)];
        }
        for (uint32_t k1 = 0; k1 < TF; ++k1) {
            Product_in1m[k1] = In1[(k0 * TF + k1)];
        }
        for (uint32_t k1 = 0; k1 < TF; ++k1) {
            int32_t Product_in2s;
            int32_t Sum_in2s;
            int32_t Product_in1s;
            int32_t Product_outs;
            int32_t Sum_outs;
            Product_in2s = Product_in2m[k1];
            Sum_in2s = Sum_in2m[k1];
            Product_in1s = Product_in1m[k1];
            Product_outs = Product_in1s * Product_in2s;
            Sum_outs = Product_outs + Sum_in2s;
            Sum_outm[k1] = Sum_outs;
        }
        for (uint32_t k1 = 0; k1 < TF; ++k1) {
            Out1[(k0 * TF + k1)] = Sum_outm[k1];
        }
    }
}