The PL master kernels are the dlbf_data, dlbf_coeff, ulbf_data, and ulbf_coeff kernels. A dlbf_data PL kernel stores the reference input data matrices for the downlink subgraph in the AI Engine graph. The dlbf_coeff PL kernel stores the reference input coefficients for the downlink subgraph. The ulbf_data PL kernel stores the input data for the uplink subgraph. The ulbf_coeff stores the input coefficient data for the uplink subgraph.
Open the Vivado projects for these PL kernels and review their source code. They are all composed of the same modules: a AXI BRAM Controller IP, a control status register (CSR) module, a clock domain crossing (CDC) module, and multiple data master modules. The data master modules are initialized with reference input data and coefficients from *_hex.mem files in the data/ folder.
The *_hex.mem files are generated by a python script that converts decimal data in *.txt files to hexidecimal. For example:
#Decimal data in dlbf_cin00.txt
-1893 3687 -6157 -1324
#Hexidecimal conversion in dlbf_cin00_hex.mem
fad4e7f30e67f89b
The rightmost decimal data (-1893) is converted to the leftmost hexidecimal data (f89b).
Below is a block diagram of how data in the PL Master kernels is requested by CIPS and sent to the AI Engine.
Each PL master kernel connects to one of the 16 AXI4-Lite PL interfaces on the custom platform built in Module 01 (Creating a Custom Platform). Through this connection, the CIPS block sends AXI control signals to data master modules and receives AXI status signals.
AXI BRAM Controller: Writes control signals to the CSR module and reads status signals from the CSR module at 100 MHz.
Control Status Register (CSR) Module: A register interface that the AXI BRAM controller uses to access data masters. Below is the control and status register map for one data master module.
Control and Status Register Address Map
Register Space Offset |
Bits and Name |
R/W? |
Description |
|---|---|---|---|
0x0 |
[31:0] ID |
R |
32 bit ID register. |
0x4 |
[0] RESET |
W |
1: assert, 0: deassert. Also assigned to the |
0x4 |
[4] GO |
W |
1: start PL traffic, 0: stop PL traffic. Also assigned to the |
0x8 |
[11:0] BLOCK_SIZE |
W |
Sets the block size of stream frame. Block size is the number of 64-bit TDATA packets to send to the AI Engine. TLAST is asserted for every <BLOCK_SIZE> number of cycles. Also assigned to the |
0xC |
[11:0] NITER |
W |
Sets the number of iterations of the data. The number of iterations is the number of <BLOCK_SIZE> data chunks to send to the AI Engine. If this set to 0, data is transmitted to the AI Engine forever. Also assigned to the |
0x10 |
[15:0] ROLLOVER_ADDR |
W |
When BRAM addresses reach this rollover address, they reset to address 0. In this design, the rollover address is set to the address of four <BLOCK_SIZE> chunks of data (that is, 4*<BLOCK_SIZE>). Also assigned to the |
0x20 |
[0] MASTER_DONE |
R |
When this status register becomes 1’b, the data master is done sending data to the AI Engine. Also assigned to the |
The CSR Module RTL definitions are located here:
dlbf_data/hdl/ulbf_data_csr_cntrl.v
dlbf_coeffs/hdl/dlbf_coeffs_csr_cntrl.v
ulbf_data/hdl/ulbf_data_csr_cntrl.v
ulbf_coeffs/hdl/ulbf_coeffs_csr_cntrl.v
Clock Domain Crossing (CDC) Module: The control and status signals sent to the CSR module sync up with the data master modules through a clock domain crossing (CDC) module. It converts the 100 MHz control and status signals from CIPS to 400 MHz signals. The data master modules operate at 400 MHz. It also works the other way as well (converting 400 MHz signals from the data master modules to 100 MHz signals for CIPS).
Data Master Modules: Contain BRAM instances that store input data sent to the AI Engine. They are initialized by
data/*_hex.memfiles with input data. There are four data master modules in thedlbf_dataanddlbf_coeffsPL kernels. There are eight data master modules in theulbf_dataandulbf_coeffsPL kernels.