Look at the following code that computes the document score.
for(unsigned int doc=0, n=0; doc<total_num_docs;doc++) { profile_score[doc] = 0.0; unsigned int size = doc_sizes[doc]; for (unsigned i = 0; i < size ; i++,n++) { if (inh_flags[n]) { unsigned curr_entry = input_doc_words[n]; unsigned frequency = curr_entry & 0x00ff; unsigned word_id = curr_entry >> 8; profile_score[doc]+= profile_weights[word_id] * (unsigned long)frequency; } } }
The compute score requires one memory access to
profile_weights
, one accumulation, and one multiplication operation.The memory accesses are random because they depend on the
word_id
and therefore, the content of each document.The size of the
profile_weights
array is 128 MB and must be stored in the DDR memory connected to the FPGA. Non-sequential accesses to DDR are big performance bottlenecks. Because accesses to theprofile_weights
array are random, implementing this function on the FPGA would not provide much performance benefit, and because the function takes only about 11% of the total running time, you can keep this function on the host CPU.Based on this analysis, it is only beneficial to accelerate the Compute Output Flags from the Hash section on the FPGA. The execution of the Compute Document Score section can be kept on the host CPU.
Close the file editor.