Versal ACAP DSP Engine Architecture Manual (AM004)

Document ID
Release Date
1.2.1 English

Binary division can be implemented in DSP58 by performing a shift and subtract or a multiply and subtract. DSP58 includes a shifter, a multiplier, and adder/subtracter unit to implement binary division. The division by subtraction and division by multiplication algorithms are shown in the following sections. The algorithms assume:

  1. N>D
  2. N and D are both positive

If either N or D is negative, use the same algorithm by taking the absolute positive values for N and D and making the appropriate sign change in the result. The terms N and D in the algorithm refer to the number to be divided (N) and the divisor (D). The terms Q and R in the algorithm refer to the quotient and the remainder, respectively.

Dividing with Subtraction

If N is an 8-bit integer and D is not more than 8-bits wide, N/D = Q + R.

  1. Assign the value 00000000 to the 8-bit register R.
  2. Shift the R register one bit to the left and fill in the LSB with N[8-n].
  3. Calculate R – D.
  4. Set R and set Q.
    • If R – D is positive, set Q[8-n] to 1 and R = R – D.
    • If R – D is negative, set Q[8-n] to 0 and R = R.
  5. Repeat steps 2 to 4, filling in R[0] each time with N[8-n], where n is the number of the iteration. Q[8-n] is filled each time in Step 4. The range of n is 1 to 8.

After the eighth iteration, Q[7:0] contains the quotient, and R[7:0] contains the remainder.

Dividing with Multiplication

The multiply and subtract method consists of rewriting N/D = Q as N = D × (Q + R). The answer is calculated using the following steps for an 8-bit N/D.

  1. Set the initial value of Q[8-n] = 1 and the bits right of Q[8-n] to 0.
  2. Calculate D×Q.
  3. Calculate N – (D×Q).
    • If step 2 is positive, N > (D×Q), set Q[8-n] to a 1.
    • If step 2 is negative, N < (D×Q), set Q[8-n] to a 0.
  4. Repeat steps 1 to 3.

After the eighth iteration, Q[7:0] contains the quotient and N – (D×Q) contains the remainder. To map to DSP58, N is applied to the C input, D is applied to the B input and Q (the whole bus) is applied to A. The initial value Q[8-n] is set at the A input and after the eighth iteration, the output register P contains the remainder. Both of these division implementations are possible in one DSP58 and the latency is eight clock cycles for the fully combinational case. The latency increases if registers are used in the DSP.

The reference design files associated with the two division use cases are available in the division/division_sub and division/division_mult directories in the design archive file, am004-versal-dsp-engine.zip.