Description
This message reports the kernel port through which the maximum data of the total amount of host data was transferred to the kernel.
Explanation
In some cases, excessive data transfers are happening, as each successive execution of an algorithm requires only a small amount of additional data when compared to a previous call of an algorithm.
For example, if we consider a 3x3 convolution matrix, only one location is computed for each 9 values being communicated when brute force data transfer is implemented. However, if we consider an image being processed, a single value would be sufficient to be communicated if line buffers (internal memory banks) are deployed in the implementation.
To identify such situations, this message provides an indication of which port is consuming how much of the total amount of data being transferred. It is up to the programmer to ensure that this is not unnecessarily repeating data which could be stored between algorithm invocations.
Recommendation
Understanding the algorithm being implemented is key to achieving an optimized implementation on the accelerator. This is specifically true with respect of the interface requirements of the algorithm. If the same data is transferred multiple times through the interfaces, consider alternative implementations with temporary storage on the accelerator for an optimized data transfer.