Start by measuring the runtime and throughput performance, to identify
bottlenecks of the current application running on your existing platform. These
performance numbers should be generated for the entire application (end-to-end), in
addition for each major function in the application. The most effective way is to run
the application with profiling tools, like valgrind
,
callgrind
, and GNU gprof
. The profiling data generated by these tools show the call graph
with the number of calls to all functions and their execution time. These numbers
provide the baseline for most of the subsequent analysis process. The functions that
consume the most execution time are good candidates to be offloaded and accelerated onto
FPGAs.