Evaluate the Second “for” Loop in the runOnCPU Function—”Profile Compute Score” Functionality - 2022.2 English

Vitis Tutorials: Hardware Acceleration (XD099)

Document ID
XD099
Release Date
2022-12-01
Version
2022.2 English
  1. Look at the following code that computes the document score.

    for(unsigned int doc=0, n=0; doc<total_num_docs;doc++)
    {
      profile_score[doc] = 0.0;
      unsigned int size = doc_sizes[doc];
    
      for (unsigned i = 0; i < size ; i++,n++)
      {
        if (inh_flags[n])
        {
          unsigned curr_entry = input_doc_words[n];
          unsigned frequency = curr_entry & 0x00ff;
          unsigned word_id = curr_entry >> 8;
          profile_score[doc]+= profile_weights[word_id] * (unsigned long)frequency;
        }
      }
    }
    
    • The compute score requires one memory access to profile_weights, one accumulation, and one multiplication operation.

    • The memory accesses are random because they depend on the word_id and therefore, the content of each document.

    • The size of the profile_weights array is 128 MB and must be stored in the DDR memory connected to the FPGA. Non-sequential accesses to DDR are big performance bottlenecks. Because accesses to the profile_weights array are random, implementing this function on the FPGA would not provide much performance benefit, and because the function takes only about 11% of the total running time, you can keep this function on the host CPU.

      Based on this analysis, it is only beneficial to accelerate the Compute Output Flags from the Hash section on the FPGA. The execution of the Compute Document Score section can be kept on the host CPU.

  2. Close the file editor.