Overview
Simulation and Emulation using external traffic generators can be run by launching the simulator/emulator and the traffic generator (TG) at the same time (in parallel). These TG can be written either in Python or in C++, using multi-threading capabilities of these two languages.
Writing a Traffic Generator in Python
# Mandatory
import os, sys
import multiprocessing as mp
import threading
import struct
from xilinx_xtlm import ipc_axis_master_util
from xilinx_xtlm import ipc_axis_slave_util
from xilinx_xtlm import xtlm_ipc
# Optionnal, just for ease of use
import numpy as np
import logging
mm2s_util = ipc_axis_master_util("DataIn1")
self.s2mm_util = ipc_axis_slave_util("DataOut1")
The function handling the port (mm2s
in the below example) should be launched as a separate
process:tx = mp.Process(target=mm2s)
tx.start()
At the end of the function the process should be
stopped:tx.join()
This
is a blocking function that waits for the end of the function before ending the
process.pipes
. The parent process
declares the pipe and the communication is operated using send
and recv
functions:parent0, child0 = mp.Pipe()
child0.send(Tx_data)
Rx_data = parent0.recv()
self.s2mm_util = ipc_axis_slave_util("DataOut1")
The variable payload
is actually a
structure that contains different fields:-
data_length
is the number of bytes of the data. -
data
is the data itself. -
tlast
is the TLAST flag which is set to true or false.
payload = xtlm_ipc.axi_stream_packet()
Then, set the values of the different fields, and send it to the
AI Engine array using the b_transpor
method:mm2s_util.b_transport(payload)
Formatting Data with Traffic Generators in Python
To emulate AXI4-Stream transactions AXI Traffic Generators require the payload data to be broken into appropriately sized bursts. For example, to send 128 bytes with a PLIO width of 32 bits (4 bytes) requires 128 bytes/4 bytes = 32 AXI4-Stream transactions. Converting between bytes arrays and AXI transactions can be handled in Python.
The Python struct
library
provides a mechanism to convert between Python and C data types. Specifically, the
struct.pack
and struct.unpack
functions pack and unpack byte arrays according to a
format string argument. The following table shows format strings for common C data
types and PLIO widths.
For more information see: https://docs.python.org/3/library/struct.html
Data Type | PLIO Width | Python Code Snippet |
---|---|---|
cfloat | PLIO32 | N/A |
PLIO64 |
rVec =
np.real(data)
|
|
PLIO128 | ||
cint16 | PLIO32 |
rVec =
np.real(data).astype(np.int16)
|
PLIO64 | ||
PLIO128 | ||
int8 | PLIO32 |
intvec =
np.real(data).astype(np.int8)
|
PLIO64 | ||
PLIO128 | ||
int32 | PLIO32 |
intvec =
np.real(data).astype(np.int32)
|
PLIO64 | ||
PLIO128 |
Writing a Traffic Generator in C++
# Libraries directories
PROTO_PATH=$(XILINX_VIVADO)/data/simmodels/xsim/2022.1/lnx64/6.2.0/ext/protobuf/
IPC_XTLM= $(XILINX_VIVADO)/data/emulation/ip_utils/xtlm_ipc/xtlm_ipc_v1_0/cpp/src/
IPC_XTLM_INC= $(XILINX_VIVADO)/data/emulation/ip_utils/xtlm_ipc/xtlm_ipc_v1_0/cpp/inc/
LOCAL_IPC= $(IPC_XTLM)../
LD_LIBRARY_PATH:=$(XILINX_VIVADO)/data/simmodels/xsim/2022.1/lnx64/6.2.0/ext/protobuf/:$(XILINX_VIVADO)/lib/lnx64.o/Default:$(XILINX_VIVADO)/lib/lnx64.o/:$(LD_LIBRARY_PATH)
# Kernel directories
PLKERNELS_DIR := ../../pl_kernels
PLKERNELS := $(PLKERNELS_DIR)/polar_clip.cpp
PLHEADERS := $(PLKERNELS_DIR)/polar_clip.hpp $(PLKERNELS_DIR)/s2mm.hpp $(PLKERNELS_DIR)/mm2s.hpp
# XTLM source files
IPC_SRC := $(LOCAL_IPC)/src/axis/*.cpp $(LOCAL_IPC)/src/common/*.cpp $(LOCAL_IPC)/src/common/*.cc
# Compiler/linker flags
INC_FLAGS := -I$(LOCAL_IPC)/inc -I$(LOCAL_IPC)/inc/axis/ -I$(LOCAL_IPC)/inc/common/ -I$(PROTO_PATH)/include/ -I$(PLKERNELS_DIR) -I$(XILINX_HLS)/include
LIB_FLAGS := -L$(PROTO_PATH)/ -lprotobuf -L$(XILINX_VIVADO)/lib/lnx64.o/ -lrdizlib -L$(GCC)/../../lib64/ -lstdc++ -lpthread
# Compilation
compile: main.cpp $(PLHEADERS) $(PLKERNELS)
$(GCC) -g main.cpp $(PLKERNELS) $(IPC_SRC) $(INC_FLAGS) $(LIB_FLAGS) -o chain
The headers useful for handling these libraries
are:# For the traffic generator
#include "xtlm_ipc.h"
#include <thread>
# Transmitter Traffic Generator
using b_init_socket = xtlm_ipc::axis_initiator_socket_util<xtlm_ipc::BLOCKING>;
# Receiver Traffic Generator
using b_targ_socket = xtlm_ipc::axis_target_socket_util<xtlm_ipc::BLOCKING>;
class mm2s
{
std::thread m_thread;
std::unique_ptr<b_init_socket> m_socket_ptr;
int count;
void sock_data_handler()
{
m_socket_ptr = std::make_unique<b_init_socket>(m_sock_name);
std::vector<char> data_to_send;
while (count<512)
{
// Create a data to send ot the AI Engine Arra (vector of bytes)
data_to_send = ...;
m_socket_ptr->transport(data_to_send,count%128==127?true:false); // transport(data, tlast), 128 sample frame
count++;
}
}
protected :
// Name of the socket
const std::string m_sock_name;
public:
mm2s(const std::string sock_name) :
m_sock_name(sock_name), m_socket_ptr(nullptr),count(0)
{}
void run()
{
m_thread = std::thread(&mm2s::sock_data_handler, this);
}
// This function allows the user to check for the end of the transmission
int dataTransferred()
{
return(count);
}
// The destructor ends the thread
virtual ~mm2s()
{
std::cout << this->m_sock_name << " before join " << std::endl;
if(m_thread.joinable())
m_thread.join();
std::cout << this->m_sock_name << " after join " << std::endl;
}
};
The main
function is very simple as
is meant only to start the various components of the traffic generator, while
inserting some delays in between them to allow for the system to initialize without
pushing too
much:int main(int argc, char *argv[])
{
mm2s chain_1_mm2s("DataIn1");
polar_clip chain_1_pc ("clip_in", "clip_out");
s2mm_chain_1_s2mm("DataOut1");
using namespace std::chrono_literals;
chain_1_mm2s.run();
std::cout << "Started mm2s " << std::endl;
std::this_thread::sleep_for(500ms);
chain_1_pc.run();
std::cout << "Started polar_clip " << std::endl;
std::this_thread::sleep_for(400ms);
chain_1_s2mm.run();
std::cout << "Started s2mm " << std::endl;
# Waits for the end of the simulation (1024 samples received from S2MM block)
while(chain_1_s2mm.dataTransferred()!=1024)
{
// Waits 2s before retesting
std::this_thread::sleep_for(2s);
}
return(0)
}
The interest of the C++ traffic generator is that you can use and
test your HLS kernels as soon as they are created, without having to synthesize them
in a .xo
file. This allows you to add more and
more realism and flexibility to your simulations without having to recreate a
.xclbin
file.