A good place to start is use uProfPcm to get an overall understanding of application performance issues. The data generated is for the whole application, that is, there is no data per function. Data can either be cumulative for the whole execution, or timestamp data.
The data generated can be used to understand if the code is frontend bound or backend bound, core bound, or memory bound, etc. There is also information on CPU, L3 cache and memory utilization, and vector instruction mix.
A good analysis of overall performance can be obtained using the following
command, which collects data from all cores (
-a), adds
a time stamp in the time series report in the .csv file (-s), collects xgmi data and generates an html
report.AMDuProfPcm -a -s --html --collect-xgmi -o pcm_data.csv <executable> <input parameters>
Figure 1. uProfPcm Html Pipeline Utilization Output
Additional Reading (uProf User Guide)