The workshop consist of the following three parts:
- A vector of Worker: Each Worker instance will manage one AMD Alveo™ card and managed device buffers their host mapping pinned buffers. They will handle 1) migration of input data from the pinned memory to the device memory, 2) kernel arguments setup and kernel call, 3) migration of meta data from the device memory to the pinned memory, 4) migration of the result data from the device memory to the pinned memory based on metadata.
- 1 MemCoppier, from host to pinned: has eight threads to performance the memcpy task from the input memory to the host mapping pinned buffer.
- 1 MemCoppier, from pinned to host: has eight threads to performance the memcpy task from the input memory to the host mapping pinned buffer.
The workshop’s constructor will find all cards with the same desired shell, and load them with the xclbin files provided. After the constructor is done, it will create same number of workers with cards for managements. The OpenCL™ related context, program, kernel, and command queue will only be released if the release function is called.
Workshop supports to performance join on multiple cards, with asynchronous input and output. Take reference of the L3/tests/gqe/join case as an example of how to notify readiness of each input section and how to wait for readiness of the output sections.
The workshop supports two solultions for Join. Solution 1 is like Joiner’s solution 1, and solution 2 is like Joiner’s solution 2. It does not provide a standalone solution 0 because it is could be covered by solution 1. Workshop will handle task distribution between workers so this will be transparent to the caller.