In this section… 1 Introductory reference module in which we run the CPU version of the algorithm in ./cpu_src 4 Alveo U50 modules, located under the ./docs directory Instructions in local readme files for each module
Introduction — CPU Run: The C++ implementation of the algorithm
Run a C++ non-accelerated version of Cholesky algorithm
Module 1: Setting up the design and establish a performance baseline
Understand the host OpenCL APIs that help connect to the kernel implemented onto an AMD device
Verify results through emulation both at the software level (sw_emu) and the hardware level (hw_emu)
Evaluate the performance by visualizing the timeline trace with Vitis Analyzer
Launch Vitis HLS to review the kernel optimizations
Module 2: This version of the code explicitely applies the
PIPELINE
andINTERFACE
directiveLearn about these pragmas and their impact on designs
Module 3: Change
double
data types tofloat
Run hardware emulation and then Vitis Analyzer and Vitis HLS
Measure the impact on physical resources required to implement the design and performance
Module 4: Back to using
double
, the task parallelism pragma is applied to improve resultsRe-arrange code to enable the task parallelism optimization
DATAFLOW
pragmaEvaluate the performance improvement with Vitis Analyzer
Use Vitis HLS to confirm the new micro-architecture created by dataflow
Generate the binary (xclbin) to program the card and measure the actual performance
Copyright © 2020–2023 Advanced Micro Devices, Inc