Depending on whether the debug is related to functionality or performance, different debug strategies can be adopted. For functionality debug, a few methods suggested above can be adopted depending on application type, OS usage, and application/kernel level debug. For performance debugging, if the application is in the control path, function profiling using a built-in timer (TTC or global counter in Cortex-A72 processor) can be done to check if the function that is performing accelerator callback is slowing down at some point and further debug can happen from the software point-of-view.
If the software application is in the data path, similar profiling to function profiling, you can perform if the buffer allocation is causing any overhead that is delaying the calling of accelerator function. Typically an accelerator requiring a large contiguous memory is allocated using contiguous memory allocator. If the contiguous memory is used by other accelerators, memory allocation might take longer. In such scenarios, a static allocation might help. Other debug strategies for prioritizing hardware/software performance issues can be counting number of interrupts in a specific timing window. If the number of interrupts are not smaller compared to expected, further debug can be done from the hardware point-of-view.