CU Cluster and Multi-Card Support - 2023.2 English

Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393)

Document ID
Release Date
2023.2 English

If a host machine has only one accelerator card installed, VSC will try to use that card. There will be a fatal error if the card does not match the platform that the design was compiled for. However, on a host machine which has multiple cards installed, VSC will by default pick the first card that exactly matches the platform the design was compiled for. The host code can override the default in two ways:

  • Setting environment variable XILINX_SC_CARD to the desired <cardIndex>.
  • From the host code, call the following API before making any other calls.
    VPP_ACC call: my_acc::add_card(<cardIndex>)
Tip: With multiple cards installed, if there is any mismatch between the names of installed platforms and the platform the design was compiled for VSC will not identify a default card. In such a case you must specify the card index as shown above. For example, platform xilinx_u2_gen3x4_xdma_gc_base_2 will not match a design compiled for xilinx_u2_gen3x4_xdma_gc_2_202110_1, even though the platforms are compatible.

As shown in the sysc_multi_card example in Supported Platforms and Startup Examples, if the host has identical accelerator cards installed you can use multiple cards to run your VSC accelerator. This is supported in a mode where all CUs of any given card are running as a separate compute cluster as explained below.

The separate compute cluster mode is useful for performance improvement in scenarios like the U2 card with a local smartSSD. This example code show below creates CU-clusters and assigns a card to each of them. Then, the user code can perform data selection based on the index-i to ensure that the subsequent compute() job will automatically use the card-i because the selected data-i resides on the same SSD.

VPP_CC* cuCluster = new VPP_CC[ncards];
for (int i = 0; i < ncards; ++i) {
    my_acc::add_card(cuCluster[i], i);
for (int i = 0; i < ncards; ++i) {
        [=]() -> bool {
            ... // data-i selection
        , cuCluster[i]);
        [=]() {
       , cuCluster[i]);
for (int i = 0; i < ncards; ++i) {