The detail algorithm implemention is illustrated as below:
As it is shown in the aboved pictures, every PE directly have 3 AXI port for the input of offset, indice and weight (CSR format data) and the data should be partitioned in host side. The internal function in the PE perform searching and matching index to find out the similarity between reference vertex and the others. The overall diagram of sparse similarity kernel have a insert sort module which return the top K number of similarity values. The maximum number of K is a template number which can be changed by rebuilding the xclbin. The default value of top K is 32.