By default, the linker builds a single hardware instance from a kernel. If the host program will execute the same kernel multiple times, due to data processing requirements for instance, then it must execute the kernel on the hardware accelerator in a sequential manner. This can impact overall application performance. However, you can customize the kernel linking stage to instantiate multiple hardware compute units (CUs) from a single kernel. This can improve performance as the host program can now make multiple overlapping kernel calls, executing kernels concurrently by running separate compute units.
Multiple CUs of a kernel can be created by using the connectivity.nk option in the v++ config
file during linking. Edit a config file to include the needed options, and specify it in
the v++ command line with the --config option, as described in v++ Command.
vadd kernel, two hardware
instances can be implemented in the config file as follows:
[connectivity]
#nk=<kernel name>:<number>:<cu_name>,<cu_name>...
nk=vadd:2
Where:
-
<kernel_name> - Specifies the name of the kernel to instantiate multiple times.
-
<number> - The number of kernel instances, or CUs, to implement in hardware.
-
<cu_name>,<cu_name>... - Specifies the instance names for the specified number of instances. This is optional, and the CU name will default to kernel_1 when it is not specified. Notice that the delimiter between kernel instances is a comma.
v++
command line:
v++ --config vadd_config.cfg ...
xclbinutil command to examine the contents of the
xclbin file. Refer to xclbinutil Utility. vadd kernel, named vadd_X, vadd_Y, and vadd_Z in the
xclbin binary file:
[connectivity]
nk=vadd:3:vadd_X,vadd_Y,vadd_Z