In this step, the HLS tool will run CSYNTH, VIVADO_SYN and VIVADO_IMPL flow to generate the IP file.
- Build and run one of the following using U200 platform
make run PLATFORM=xilinx_u200_gen3x16_xdma_2_202110_1.xpfm VIVADO_IMPL=1 # PLATFORM is case-insensitive and support awk regex. # Alternatively, the FPGA part can be speficied via XPART. When XPART is set, PLATFORM will be ignored. make run XPART=xcu200-fsgd2104-2-e VIVADO_IMPL=1
Example output:
Implementation tool: Xilinx Vivado v.2022.1 ... #=== Post-Implementation Resource usage === SLICE: 0 LUT: 7945 FF: 8073 DSP: 12 BRAM: 5 URAM: 0 LATCH: 0 SRL: 678 CLB: 1746 #=== Final timing === CP required: 3.330 CP achieved post-synthesis: 3.605 CP achieved post-implementation: 3.347 Timing not met
The report shows ‘timing not met’, that means the Vivado implementation process cannot achieve the targeted frequency (300MHz set in the run_hls.tcl). As this module always plays a role of bottleneck in entire JPGE decoding architecture, the final JPEG decoder should be likely to work at 270 to 280 MHz. That is a common situation for complex HLS designs. This tutorial will not discuss solutions for timing problem but for most of cases we still have a chance to improve the frequency.
Based on the above results, we can make some estimates about the throughputs, including:
- The design can process a Huffman symbol up to 270 million per second
- Assuming that if the compression ratio is 4 ~ 8 for a JPEG image, the final output speed will be up to 1 ~ 2GB of YUV data per second
- If the inverse quantization and inverse DCT transform modules need matching throughput of Huffman, it is best to recovery 4 ~ 8 pixels in a cycle
Compared with synthesis, using Export
can obtain more accurate performance and resource consumption. Users usually needn’t to do Export
for each design iteration, but it is recommended to periodically perform Export
to confirm whether the performance and area of the design can meet the requirement.