Start by measuring the runtime and throughput performance, to identify
bottlenecks of the current application running on your existing platform. These
performance numbers should be generated for the entire application (end-to-end) as well
as for each major function in the application. The most effective way is to run the
application with profiling tools, like valgrind
, callgrind
, and GNU gprof
.
The profiling data generated by these tools show the call graph with the number of calls
to all functions and their execution time. These numbers provide the baseline for most
of the subsequent analysis process. The functions that consume the most execution time
are good candidates to be offloaded and accelerated onto FPGAs.