Notes and Suggestions

Versal ACAP DSP Engine Architecture Manual (AM004)

Document ID
AM004
Release Date
2022-09-11
Revision
1.2.1 English
  • Implement small multiplies (for example, 4 × 4 multiplies) and small bit-width adders and counters using the CLB logic LUTs and carry chain. If the design has a large number of small add operations and/or counters, take advantage of the SIMD mode and implement the operation in DSP58. Factor of 2x area and power savings occur, when compared to using interconnect logic, whenever input registers are also folded into DSP58 for SIMD mode functions.
  • Always sign extend the input operands when implementing smaller bit width functions. For lower power in the programmable logic (PL), push operands into MSBs and ground (GND) LSBs.
  • While cascading different DSP58s, match the pipestages of the different signal paths.
  • Implement a count-up-by-one counter within the DSP58 using the CARRYIN input. A count-by-N or variable-bit counter can use the C or A:B inputs.
  • DSP58 counters can be used to implement control logic that runs at maximum speed.
  • Use SRL16s/SRL32s in the CLB and block RAM to store filter coefficients or act as a register file or memory elements in conjunction with DSP58. The bit pitch of the input bits is designed to pitch match the CLB and block RAM.
  • The block RAM can also be used as a fast, finite state machine to drive the control logic for the DSP design.
  • DSP58 can also be used with a processor, for example, MicroBlaze™ or PicoBlaze™ processors, for hardware acceleration of processor functions.
  • Use a pipeline register at the output of an SRL16 or block RAM before connecting it to the input of DSP58. This ensures the best performance of input operands feeding DSP58.
  • The register at the output of the SRL16 in DSP58 has a reset pin and a clock-enable pin. To reset the SRL16, a zero is input into the SRL16 for 16 clock cycles while holding the reset of the output register High. This capability is particularly useful in implementing filters where the SRL16s are used to store the data inputs.