(3) Synthesis: - 2024.1 English

Vitis Libraries

Release Date
2024-08-06
Version
2024.1 English
  1. Build and run one of the following using U200 platform
make run PLATFORM=xilinx_u200_gen3x16_xdma_2_202110_1.xpfm CSYNTH=1

# PLATFORM is case-insensitive and support awk regex.

# Alternatively, the FPGA part can be speficied via XPART. When XPART is set, PLATFORM will be ignored.

make run XPART=xcu200-fsgd2104-2-e CSYNTH=1
  1. Quick reset the top-level functions so that they can focus more on a few functions of interest
vi run_hls.tcl

# update the "set_top kernel_parser_decoder", for example "set_top Huffman_decoder", the name of top is the function name in the design codes.
set_top kernel_parser_decoder --> set_top Huffman_decoder

Then rerun the command of CSYNTH, will allow user to analyze the performance bottlenecks of “Huffman_decoder” function, or run rapid synthesis and simulation without any source code modification.

Example Synthesis output:

Vitis HLS - High-Level Synthesis from C, C++ and OpenCL v2022.1 (64-bit)
...

INFO: [HLS 200-1510] Running: set_top kernel_parser_decoder
INFO: [HLS 200-1510] Running: open_solution -reset solution1
...

INFO: [VHDL 208-304] Generating VHDL RTL for kernel_parser_decoder.
INFO: [VLOG 209-307] Generating Verilog RTL for kernel_parser_decoder.
INFO: [HLS 200-790] **** Loop Constraint Status: All loop constraints were NOT satisfied.
INFO: [HLS 200-789] **** Estimated Fmax: 271.96 MHz
INFO: [HLS 200-111] Finished Command csynth_design CPU user time: 65.56 seconds. CPU system time: 4.61 seconds. Elapsed time: 73.87 seconds; current allocated memory: 448.0
00 MB.
INFO: [HLS 200-112] Total CPU user time: 71.64 seconds. Total CPU system time: 6.21 seconds. Total elapsed time: 80.36 seconds; peak allocated memory: 1.195 GB.

Loop constraints may not be satisfied, as the goal of loop is set to 300MHz in the run_hls.tcl, and different hls tool version may result in different “Estimated Fmax”.

  1. Check the unsatisfied path

Read the report of CSYNTH, grep “critical path” like below:

INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [HLS 200-42] -- Implementing module 'Huffman_decoder_Pipeline_DECODE_LOOP'
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [SCHED 204-11] Starting scheduling ...
INFO: [SCHED 204-61] Pipelining loop 'DECODE_LOOP'.
INFO: [HLS 200-1470] Pipelining result : Target II = 1, Final II = 1, Depth = 4, loop 'DECODE_LOOP'
WARNING: [HLS 200-1016] The critical path in module 'Huffman_decoder_Pipeline_DECODE_LOOP' consists of the following:   'add' operation
('add_ln503', Vitis_Libraries/codec/L1/src/XAcc_jpegdecoder.cpp:503) [582]  (0.705 ns)
   'shl' operation ('shl_ln503', Vitis_Libraries/codec/L1/src/XAcc_jpegdecoder.cpp:503) [584]  (0 ns)
   'icmp' operation ('icmp_ln503', Vitis_Libraries/codec/L1/src/XAcc_jpegdecoder.cpp:503) [585]  (0.859 ns)
   'and' operation ('and_ln503', Vitis_Libraries/codec/L1/src/XAcc_jpegdecoder.cpp:503) [591]  (0 ns)
   'select' operation ('select_ln503', Vitis_Libraries/codec/L1/src/XAcc_jpegdecoder.cpp:503) [592]  (0 ns)
   'select' operation ('block_tmp', Vitis_Libraries/codec/L1/src/XAcc_jpegdecoder.cpp:498) [593]  (0.243 ns)
   'add' operation ('block', Vitis_Libraries/codec/L1/src/XAcc_jpegdecoder.cpp:516) [599]  (0.785 ns)
   multiplexor before 'phi' operation ('block') with incoming values : ('lastDC_load', Vitis_Libraries/codec/L1/src/XAcc_jpegdecoder.cpp:516) ('block',
   Vitis_Libraries/codec/L1/src/XAcc_jpegdecoder.cpp:516) [628]  (0.387 ns)
   'phi' operation ('block') with incoming values : ('lastDC_load', Vitis_Libraries/codec/L1/src/XAcc_jpegdecoder.cpp:516) ('block',
   Vitis_Libraries/codec/L1/src/XAcc_jpegdecoder.cpp:516) [628]  (0 ns)
   multiplexor before 'phi' operation ('empty_304', Vitis_Libraries/codec/L1/src/XAcc_jpegdecoder.cpp:516) with incoming values : ('lastDC_load',
   Vitis_Libraries/codec/L1/src/XAcc_jpegdecoder.cpp:516) ('block', Vitis_Libraries/codec/L1/src/XAcc_jpegdecoder.cpp:516) ('lastDC_load_1') [632]
   (0.387 ns)
   'phi' operation ('empty_304', Vitis_Libraries/codec/L1/src/XAcc_jpegdecoder.cpp:516) with incoming values : ('lastDC_load', Vitis_Libraries/codec/
   L1/src/XAcc_jpegdecoder.cpp:516) ('block', Vitis_Libraries/codec/L1/src/XAcc_jpegdecoder.cpp:516) ('lastDC_load_1') [632]  (0 ns)
   'select' operation ('select_ln549_2', Vitis_Libraries/codec/L1/src/XAcc_jpegdecoder.cpp:549) [641]  (0.243 ns)
   'store' operation ('lastDC_write_ln592', Vitis_Libraries/codec/L1/src/XAcc_jpegdecoder.cpp:592) of variable 'select_ln549_2',
   Vitis_Libraries/codec/L1/src/XAcc_jpegdecoder.cpp:549 on local variable 'op' [651]  (0.453 ns)
...

Then check the report for this loop: use command “vi test.prj/solution1/syn/report/Huffman_decoder_Pipeline_DECODE_LOOP_csynth.rpt ” in the meanwhile open the GUI.

In the Schedule Viewer in GUI, users could check the details of the circuit:

../_images/L2jpegdec-6.PNG

Comparing the two above, it can be seen that the timing is not satisfied because the number of bits of the shift register and comparator is large. There is no better optimization method for this situation. Users can reduce the bit width of this circuit according to their needs to improve the timing. Of course, this change may also lead to a reduction in bandwidth, so there needs a trade-off between the width and frequency to achieve the best performance.