AI Engine Development
See Vitis™ Development Environment on xilinx.com See Vitis™ AI Development Environment on xilinx.comMUltiple SIgnal Classification (MUSIC) Algorithm on AI Engine
Version: Vitis 2024.1
Table of Contents
Introduction
This tutorial implements the Multiple Signal Classification (MUSIC) Algorithm [1] on the AI Engine. MUSIC is a popular algorithm for Direction of Arrival (DOA) estimation in antenna array systems. This tutorial assumes a system model with an 8-element uniform linear array (ULA). The algorithm chosen for AI Engine implementation operates directly on 128 x 8 data snapshots using a QR-Factorization followed by a Singular Value Decomposition (SVD) to obtain the basis vectors of the noise subspace. The QRD algorithm uses the well known modified Gram-Schmidt approach. The SVD algorithm adopts a one-sided Jacobi rotation approach with a fixed set of four iterations.
This tutorial implements the MUSIC algorithm fully in the AI Engine and validates its performance in real time hardware running on the VC1902 device (-2M speed grade) on the VCK190 evaluation board. A Hardware-in-the-Loop (HIL) demonstrator connects a host computer running MATLAB over TCP/IP to the VCK190 board that delivers buffered array snapshots to the board and receives DOA estimates back in real time to demonstrate a best effort DOA estimation throughput rate of 1 MHz. The following table summarizes the system parameters for this tutorial.
This MUSIC tutorial was co-developed by AMD and our third party partner, Fidus Systems.
Parameter | Value | Units |
---|---|---|
Uniform Linear Array | $N=8 $ | elements |
Array element spacing | $d=\lambda/2$ | m |
Data snapshot | 128 x 8 | samples |
Target throughput | 1 | $\mu s$ |
Subspace algorithm | QRD + SVD | n/a |
One-sided Jacobi SVD | 4 | iterations |
MUSIC spectrum resolution | 256 | bins |
MUSIC spectrum sweep range | 9 to 248 | bins |
Minimum DOA separation | 6 | deg |
System Model
The system model consists of:
A uniform linear array (ULA) with $N$ equally $d$-spaced antenna elements and,
A set of $S$ sources emitting or echoing narrow-band independent signals $\textbf{x}_1,\ldots,\textbf{x}_S$.
The direction of arrival (in azimuth) of these signals at the ULA are $\theta_1,\ldots,\theta_S$.
The signal $\textbf{a}(t)=[a_1(t),\ldots,a_N(t)]$ received by the ULA at time $t$ can be expressed in matrix form as $\textbf{a}=\textbf{D}\times\textbf{x}+\textbf{w}$ where $\mathbf{D}=[d_{\theta_1},\ldots,d_{\theta_S}]^T$ and $\textbf{d}_{\theta_k}$ is the $k$-th steering vector for the ULA. The vector $\textbf{w}$ is uncorrelated white Gaussian noise. The data vectors $\textbf{a}(t)$ may be collected into a $128 \times 8$ snapshot matrix $\textbf{A}$ to collect one sample from each of $8$ elements of the array obtained over $128$ consecutive time instants. The following diagram shows the ULA receiving scenario.
Subspace Algorithm
MUSIC performs DOA estimation using a subspace approach involving a peak search across the noise subspace of the array. The critical first step requires identification of the basis vectors of this noise subspace. Several approaches are possible. Eigenspace methods are popular. This tutorial adopts an algorithm based on QR-Decomposition and SVD as these algorithms can be implemented efficiently on the AI Engine. Data flow is simplified because this approach operates directly on the snapshot matrix $\textbf{A}$.
The figure below demonstrates the overall concept. The snapshot matrix $\textbf{A}$ is “tall and skinny” with dimensions $128\times 8$. The basis vectors of the noise subspace $\textbf{V}_r$ may be computed from a two step procedure. First, a QR-Decomposition of the snapshot matrix $\textbf{A}=\textbf{Q}\textbf{R}$ produces the $\textbf{R}$ matrix with upper triangular portion $\textbf{R}_r$. The $\textbf{Q}$ matrix may be discarded. Second, the SVD of $\textbf{R}_r=\textbf{U}_r\textbf{S}_r\textbf{V}_r^\dagger$ provides a basis for the desired noise subsplace by selecting the appropriate columns of $\textbf{V}_r^\dagger$ based on identifying the noise subspace singular values from $\textbf{S}_r$. This subspace identification may be performed simply by extracting the $S$ smallest singular values if $S$ is known, or the # of active signals may be identified online using simple (ie. “thresholding”) or more advanced (i,e., “information theoretic”) techniques. This tutorial assumes $S$ is known.
MUSIC Spectrum Estimation
Once identified, MUSIC uses the columns of the noise subspace $\textbf{V}_r$ to identify incident signal directions orthogonal to that subspace. This involves sweeping the steering vector (as a function of the DOA) against these noise subspace basis vectors to compute the strength of the so-called MUSIC pseudo-spectrum given in the following equation. The index $P$ represents the index of the first noise singular value (assuming they have been sorted in descending order). Also $\textbf{V}_r(j)$ denotes the $j$-th column of $\textbf{V}_r$. Once again, $P$ may be assumed known or estimated using various techniques. The $\times$ operator below represents an inner product. The $\textbf{s}(k)$ represents the steering vector which is a function of the array manifold and the presumed direction of arrival.
MUSIC may solve the above equation either directly by looking for the peaks of the pseudo-spectrum that occur when the steering vector becomes orthogonal to the noise subspace. This occurs in the preceding equation when its denominator goes to zero. Computing these peaks requires a costly division operator. Instead, the denominator may be inspected instead for its nulls. This generally gives a very similar result but requires no division operator. This tutorial uses the latter approach to reduce the compute workload.
MATLAB Model
MATLAB models of MUSIC validate the algorithmic approach taken and provide a means to synthesize I/O data for the AI Engine implementation. The system model may be found in the <path-to-repo>/matlab/System
folder. The system parameters shown in the code below may be configured in the Configuration/testCfg.m
file. After editing this file (or going with its default settings), the MUSIC model may be run using the testMusic.m
script from the <path-to-repo>/matlab/System
folder.
Running the MATLAB model produces the console and figure output shown below. Some statistics are given for the accuracy of the Gram-Schmidt QRD used by the tutorial versus the built-in qr()
MATLAB function, along with the difference between the MATLAB svd()
versus the one-sided Jacobi algorithm with four iterations used by the tutorial. The figure plots the peaks of the MUSIC pseudo-spectrum along with the nulls of its corresponding denominator which are identified instead by the approach adopted in the tutorial. In addition to the console and figure output, the MATLAB model generates a detailed dump of the system signals in the top-level data
folder.
AI Engine Subgraph Designs
The full MUSIC algorithm on AI Engine is built from a data flow graph containing six different subgraphs: IO Adapter, QRD, SVD, DOA, Scanner, and Finder. This section provides an overview of each one of these subgraphs.
IO Adapter Subgraph
The IO Adapter subgraph delivers the $\textbf{A}$ matrix from the input PLIO to the QRD subgraph. Buffers are used for I/O for all downstream MUSIC subgraphs. All of these subgraphs may use a single I/O buffer read over a high bandwidth memory interface, moving from tile to tile in a linear fashion. This is clarified in more detail below. No bandwidth limitations are encountered downstream due to this use of the 256-bit AI Engine memory interface. However, the design must be fed by two PLIO streams @ 32-bits for 1250 MHz to achieve a 1 $\mu s$ throughput overall. The IO Adapter subgraph sinks two input PLIO streams and combines them into a single output buffer containing the input $\textbf{A}$ matrix to be processed by the first QRD subgraph. Two streams are required because the $128\times 8$ elements of $\textbf{A}$ cannot be transferred over a single PLIO in 1 $\mu s$. The following block diagram shows the AI Engine physical array view for the IO Adapter subgraph.
QRD Subgraph
The following MATLAB code shows the QRD algorithm adopted in this tutorial. It contains an inner loop and an outer loop. Its loop structure has been modified for more efficient AI Engine implementation. The outer loop updates of R(kk,kk)
and Q(:,kk)
have been moved from after to before the inner loop. Also, the inner loop indices now run from kk+1
to C
instead of from 1
to kk-1
. This admits an easier to implement form of software pipelining in which a single AI Engine tile may be assigned to compute a single iteration of each outer loop body. For example, the first tile computes the R(1,1)
and Q(:,1)
updates and then performs all inner loop computations corresponding to kk=1
. This tile will then pass all values of R
and Q
to the next tile in the chain. The second tile will accept these inputs but only update R(2,2)
and Q(:,2)
and all inner loop computations corresponding to kk=2
. This approach requires a total of $8$ tiles to process the $128\times 8$ snapshot matrix $\textbf{A}$ for MUSIC.
function [Q,R] = qrd_mgssr_hw_model(A)
[R,C] = size(A);
Q = single(A);
R = single(zeros(C,C));
% Here, 'kk' now represents the tile
% --> Each tile performs its outer loop and then performs the inner loop iterations for all
% columns that follow
for kk = 1 : C
R(kk,kk) = norm(Q(:,kk));
Q(:,kk) = Q(:,kk) * (1.0/R(kk,kk));
for ii = kk+1 : C
R(kk,ii) = transpose(conj(Q(:,kk))) * Q(:,ii);
Q(:,ii) = Q(:,ii) - R(kk,ii) * Q(:,kk);
end
end
end
It turns out this $8$-tile solution does not provide sufficient compute capacity to achieve the 1 $\mu s$ throughput objective of the tutorial. Additional throughput can be achieved by partitioning each inner loop body to its own tile. Similarly, each outer loop body may be partitioned to its own tile. The algorithm exhibits $C$ outer loop iterations, and $C-k$ inner loop body iterations for each outer loop $k$. It follows the total # of tiles required is $C + C(C-1)/2$. For the $8$ columns here, this equals $8+8\times7/2=36$ tiles.
The following diagram shows the AI Engine floorplan for this $36$-tile solution. Here, no attempt has been made to floorplan the design; the tools elect by default to simply use the second row for buffers. These could be co-located in the first row for many of the tiles; indeed this occurs in the final floorplan shown below.
The following diagram shows additional details of the AI Engine QRD $norm()$ kernel code. The code is partitioned into three separate workloads:
The “Initialization” code accepts the $R$ and $Q$ inputs from the previous tile and initializes $R$ to zero for the first tile.
The “QRD Norm” code computes the $norm()$ required by the QRD outer loop body, updates the appropriate $Q$ column.
The “Output” code delivers the updated $R$ and $Q$ values to the following tile. The $Q$ is not returned by the last tile.
Note that kernels with indices $0,8,15,21,26,30,33,35$ perform the outer loop $norm()$ operations whereas the remaining tiles compute the inner loop bodies. Only the upper triangular portion of the $R$ matrix is updated as it is passed through the AI Engine pipeline.