In a conventional DDS (sometimes known as an Numerically Controlled Oscillator), a phase increment value is added to a phase accumulator on each cycle. The value of the phase accumulator is effectively the phase part of a unit vector in polar form. This unit vector is then converted to cartesian form by the lookup of sin and cos values from a table of precomputed values. These cartesian values are then output.
The AIE is a vector processor where multiple data samples are operated upon each cycle. This is often referred to as Super Sample Rate, where the data rate exceeds the clock rate. The AIE DSP Library DDS is Super Sample Rate for maximal performance. The operation to convert phase to sin/cos values is a scalar operation, not a vector operation. The performance per kernel is limited by the number of data samples which can be written out per cycle. This number of samples differs according to data type size. Call this value N. Therefore, the implementation is for the phase accumulator to have N*phase_increment per cycle. This phase value is then converted to sin/cos values as described earlier. To convert this into N output samples, this scalar cartesian value is then multiplied by a vector of precomputed cartesian offset values to give the output values for samples 0, 1, 2, ..., N-1
.
The precomputation occurs at construction time. The vector of offset values is created by a series of polar to cartesian lookups using 0, phase_increment*1, phase_increment*2, ..., phase_increment*(N-1)
.
It should be noted that the cartesian values for lookup in hardware are scaled to use the full range of int16, so -1 becomes -32768, but +1 is saturated to +32767. Also, following the runtime multiplication of the looked-up cartesian value for a cycle by the precomputed vector, scaling down and rounding will lead to slightly different values than if the lookup had been used directly for each output value. In other words, the DDS output is not bit-accurate to the sin/cos lookup intrinsic.
DDS Implementation shows the construction-time creation of a vector of offsets, then the runtime use of this vector to create multiple outputs from a single sin/cos lookup each cycle.