By default, the linker builds a single hardware instance from a kernel. If the host program will execute the same kernel multiple times, due to data processing requirements for instance, then it must execute the kernel on the hardware accelerator in a sequential manner. This can impact overall application performance. However, you can customize the kernel linking stage to instantiate multiple hardware compute units (CUs) from a single kernel. This can improve performance as the host program can now make multiple overlapping kernel calls, executing kernels concurrently by running separate compute units.
Multiple CUs of a kernel can be created by using the connectivity.nk
option in the v++
config
file during linking. Edit a config file to include the needed options, and specify it in
the v++
command line with the --config
option, as described in v++ Command.
vadd
kernel, two hardware
instances can be implemented in the config file as follows:
[connectivity]
#nk=<kernel name>:<number>:<cu_name>,<cu_name>...
nk=vadd:2
Where:
-
<kernel_name>
- Specifies the name of the kernel to instantiate multiple times.
-
<number>
- The number of kernel instances, or CUs, to implement in hardware.
-
<cu_name>,<cu_name>...
- Specifies the instance names for the specified number of instances. This is optional, and the CU name will default to kernel_1 when it is not specified. Notice that the delimiter between kernel instances is a comma.
v++
command line:
v++ --config vadd_config.cfg ...
xclbinutil
command to examine the contents of the
xclbin file. Refer to xclbinutil Utility. vadd
kernel, named vadd_X
, vadd_Y
, and vadd_Z
in the
xclbin
binary file:
[connectivity]
nk=vadd:3:vadd_X,vadd_Y,vadd_Z