Where possible, illegal values for template parameters, or illegal combinations of values for template parameters are detected at compilation time.
Where an illegal configuration is detected, compilation will fail with an error message indicating the constraint in question.
However, no attempt has been made to detect an error upon configurations which are simply too large for the resource available, as the library element cannot know how much of the device is used by the user code and also because the resource limits vary by device. In these cases, compilation will likely fail, but due to the over-use of a resource detected by the aie tools.
For example, an FFT of TT_DATA = cint16
can be supported up to TP_POINT_SIZE=65536
using TP_PARALLEL_POWER=4
.
A similarly configured FFT with TT_DATA=cint32
will not compile because the per-tile memory use, which is constant and predictable, is exceeded. This condition is detected and an error would be issued.
An FFT with TT_DATA=cint32
and TP_PARALLEL_POWER=5
should, in theory, be possible to implement, but this will use 192 tiles directly and will use the memory of many other tiles, so is likely to exceed the capacity of the AIE array. However, the available capacity cannot easily be determined, so no error check is applied here.
The largest point size which can be supported in a single kernel is limited by data memory availability. Since iobuffer connections default to double buffering for maximal throughput, the choice of TP_API
(iobuffer or streams) affects the maximum point size, since the limit will be reached for iobuffers for a lower TP_POINT_SIZE
than for streams. The following table indicates the maximum point size possible for a single kernel for various values of TT_DATA
and TP_API
.
TT_DATA | Max Point Size | |
---|---|---|
TP_API=0 (iobuffer I/O) | TP_API=1 (stream I/O) | |
cint16 | 2048 | 4096 |
cint32 | 2048 | 4096 |
cfloat | 2048 | 2048 |
The maximum point size supported per kernel puts a practical limit on the maximum point size supported when using TP_PARALLEL_POWER>1
. This is because the largest devices available currently support a maximum TP_PARALLEL_POWER
of 4. .The largest possible FFT can be found by multiplying the values in the table by 2^4. E.g. the largest practical FFT with stream IO and cint16
data is 4096 << 4 = 65536. However, the extensive use of neighboring tile RAM makes placement a challenge the the mapper, so 32768 may be a practical upper limit for cint32
.