Multi-Process Access Control
Xilinx Runtime (XRT) APIs provide multi-process support for controlling the AI Engine array and graphs. It supports three operating modes on AI Engine array and graphs.
- Exclusive Mode
- Provides full access to the AI Engine array or graph.
- Primary Mode
- Provides full access to the AI Engine array or graph.
- Shared Mode
- Provides only non-destructive access to the AI Engine array or graph.
In summary, the three modes offer different levels of access control for multiple processes:
- Exclusive Mode
- Single process with full control.
- Primary Mode
- Single process with full control, other processes can read without modifying the state.
- Shared Mode
- Multiple processes can read, but no modifications are allowed.
The XRT C++ API extends the xrt::aie::device
class to open device with access mode in xrt/xrt_aie.h
:
-
xrt::aie::access_mode::exclusive
(Exclusive mode) -
xrt::aie::access_mode::primary
(Primary mode) -
xrt::aie::access_mode::shared
(Shared mode)
The XRT C++ API extends the xrt::graph
class to open graph with access mode in xrt/xrt_graph.h
:
-
xrt::graph::access_mode::exclusive
(Exclusive mode) -
xrt::graph::access_mode::primary
(Primary mode) -
xrt::graph::access_mode::shared
(Shared mode)
Graphs must be closed before closing the array. After all open AI Engine arrays and graphs are closed, they can be reopened, but some restrictions apply as listed below.
The following restrictions apply for the AI Engine array multi-process support.
- Only one process can open AI Engine array in exclusive mode. After opening in exclusive mode, the AI Engine cannot be opened again in any mode by the same or other processes.
- Only one process can open AI Engine array in the primary mode. After opening in primary mode, it cannot be opened again in exclusive mode or primary mode, but it can be opened in shared mode again.
The following restrictions apply for the AI Engine graph multi-process support.
- A graph cannot be opened multiple times within a single process.
- If an AI Engine graph is opened in exclusive mode, it cannot be reopened in any mode again.
- Multiple processes can open a graph in shared mode, but only one process can open a graph in primary mode.
Multi-Thread Access Control
It is recommended that multiple threads use the same model as multiple processes. However, because the AI Engine device handle and graph handle are sharable between threads, it is legal to use the same device handle or graph handle between threads. The host application is responsible for synchronizing the AI Engine array state and graph state between threads, especially when multiple threads are the exclusive or primary owner of the AI Engine array or graph.
Use Cases
The following figure shows multiple-process and multiple-thread use cases based on the access mode restrictions:
The figure illustrates how different processes can access the AI Engine devices and graphs.
Use Case 1:
- Process 1 opens
xrt::aie::device
orxrt::graph
in exclusive mode. - Process 2 cannot open them because exclusive mode grants full control to Process 1. No other process can access the resource.
Use Case 2:
- Process 1 opens
xrt::aie::device
orxrt::graph
in primary mode. - Process 2 can open them in shared mode only.
- In shared mode, Process 2 can only perform non-destructive operations on the AI Engine, such as asynchronous RTP read. Non-destructive operations are those that do not change the state of the AI Engine.
Use Case 3:
- Thread 1 opens
xrt::aie::device
orxrt::graph
without any mode restrictions. - Both Thread 1 and Thread 2 can operate on the AI Engine.
- However, you must ensure that thread violations are handled appropriately to avoid conflicts between the threads.
A sample code to demonstrate Use Case 2 is as follows.
#include <stdlib.h>
#include <fstream>
#include <iostream>
#include <unistd.h>
#include <sys/wait.h>
#include "adf/adf_api/XRTConfig.h"
#include "experimental/xrt_aie.h"
#include "experimental/xrt_graph.h"
#include "experimental/xrt_kernel.h"
//8192 matches 32 iterations of graph::run
#define OUTPUT_SIZE 8192
int value1[16] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16};
int value2[16] = {-1,-2,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,-13,-14,-15,-16};
using namespace adf;
int run(int argc, char* argv[],int id){
std::cout<<"Child process "<<id<<" start"<<std::endl;
//TARGET_DEVICE macro needs to be passed from gcc command line
if(argc != 2) {
std::cout << "Usage: " << argv[0] <<" <xclbin>" << std::endl;
return EXIT_FAILURE;
}
char* xclbinFilename = argv[1];
std::string graph_name=std::string("gr[")+std::to_string(id)+"]";
std::string rtp_inout_name=std::string("gr[")+std::to_string(id)+std::string("].k.inout[0]");
int ret;
int value_readback[16]={0};
if(fork()==0){//child child process
xrt::aie::device device{0, xrt::aie::device::access_mode::shared};
auto uuid = device.load_xclbin(xclbinFilename);
xrt::graph graph{device, uuid, graph_name, xrt::graph::access_mode::shared};
graph.read(rtp_inout_name, value_readback);
std::cout<<"Add value read back are:";
for(int i=0;i<16;i++){
std::cout<<value_readback[i]<<",\t";
}
std::cout<<std::endl;
std::cout<<"child child process exit"<<std::endl;
exit(0);
}
xrt::aie::device device{0}; // default primary context
auto uuid = device.load_xclbin(xclbinFilename);
xrt::graph graph{device, uuid, graph_name}; // default primary context
std::string rtp_in_name=std::string("gr[")+std::to_string(id)+std::string("].k.in[1]");
graph.update(rtp_in_name, value1);
graph.run(16); // 16 iterations
graph.wait(0); // wait 0 => wait till graph is done
std::cout<<"Graph wait done"<<std::endl;
//second run
graph.update(rtp_in_name.data(), value2);
graph.run(16); // 16 iterations;
while(wait(NULL)>0){//Wait for child child process
}
graph.wait(0); // wait 0 => wait till graph is done
std::cout<<"Child process:"<<id<<" done"<<std::endl;
return 0;
}
int main(int argc, char* argv[])
{
try {
for(int i=0;i<GRAPH_NUM;i++){
if(fork()==0){//child
auto match = run(argc, argv,i);
std::cout << "TEST child " <<i<< (match ? " FAILED" : " PASSED") << "\n";
return (match ? EXIT_FAILURE : EXIT_SUCCESS);
}else{
size_t output_size_in_bytes = OUTPUT_SIZE * sizeof(int);
//TARGET_DEVICE macro needs to be passed from gcc command line
if(argc != 2) {
std::cout << "Usage: " << argv[0] <<" <xclbin>" << std::endl;
return EXIT_FAILURE;
}
char* xclbinFilename = argv[1];
int ret;
// Open xclbin
auto device = xrt::device(0); //device index=0
auto uuid = device.load_xclbin(xclbinFilename);
// s2mm & data_generator kernel handle
std::string s2mm_kernel_name=std::string("s2mm:{s2mm_")+std::to_string(i+1)+std::string("}");
xrt::kernel s2mm = xrt::kernel(device, uuid, s2mm_kernel_name.data());
std::string data_generator_kernel_name=std::string("data_generator:{data_generator_")+std::to_string(i+1)+std::string("}");
xrt::kernel data_generator = xrt::kernel(device, uuid, data_generator_kernel_name.data());
// output memory
auto out_bo=xrt::bo(device, output_size_in_bytes,s2mm.group_id(0));
auto host_out=out_bo.map<int*>();
auto s2mm_run = s2mm(out_bo, nullptr, OUTPUT_SIZE);//1st run for s2mm has started
auto data_generator_run = data_generator(nullptr, OUTPUT_SIZE);
// wait for s2mm done
std::cout<<"Waiting s2mm to complete"<<std::endl;
auto state = s2mm_run.wait();
std::cout << "s2mm "<<" completed with status(" << state << ")"<<std::endl;
out_bo.sync(XCL_BO_SYNC_BO_FROM_DEVICE);
int match = 0;
int counter=0;
for (int i = 0; i < OUTPUT_SIZE/2/16; i++) {
for(int j=0;j<16;j++){
if(host_out[i*16+j]!=counter+value1[j]){
std::cout<<"ERROR: num="<<i*16+j<<" out="<<host_out[i*16+j]<<std::endl;
match=1;
break;
}
counter++;
}
}
for(int i=OUTPUT_SIZE/2/16;i<OUTPUT_SIZE/16;i++){
for(int j=0;j<16;j++){
if(host_out[i*16+j]!=counter+value2[j]){
std::cout<<"ERROR: num="<<i*16+j<<" out="<<host_out[i*16+j]<<std::endl;
match=1;
break;
}
counter++;
}
}
std::cout << "TEST " <<i<< (match ? " FAILED" : " PASSED") << "\n";
while(wait(NULL)>0){//Wait for all child process
}
std::cout<<"all done"<<std::endl;
return (match ? EXIT_FAILURE : EXIT_SUCCESS);
}
}
}
catch (std::exception const& e) {
std::cout << "Exception: " << e.what() << "\n";
std::cout << "FAILED TEST\n";
return 1;
}
}