The implementation is shown in the following figure:
The kernel does the following steps:
- Set Uram: Load the original cids of the graph and scan vertices to set URAM. If vertex’s cid appears first, the flag on URAM is written true. Otherwise, the flag is written false.
- Lookup HBM: Lookup HBM to get new cid that have been written success, put it into stream. If the cid hasn’t written success, the cid is put as a waiting buffer. The buffer is a first-in first-out circular cache and read it regularly.
- Updated HBM: Scan stream to get new cid and write back to HBM.