SVD Subgraph - 2024.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2024-12-06
Version
2024.2 English

The following MATLAB code shows the SVD algorithm adopted in this tutorial. It contains three nested loops. The outer-most loop performs identical “iterations”. This tutorial performs a fixed set of $N_I=4$ iterations per SVD. The inner two loops admit a structure similar to the QRD analyzed above except all compute workloads are contained in the inner most loop only; there is no workload in the outer of these two loops. The inner loop workload involves computing a $2\times2$ Jacobi rotation matrix Rot and then applying that matrix to both V and W.

function [U,S,V] = svd_one_sided_jacobi( A, max_iter )
   if     (nargin == 1)  max_iter = 4;
   elseif (nargin ~= 2) error('Incorrect signature'); 
   end
   
   % Perform one-sided Jacobi for MUSIC application (no need to compute 'U' in principle):
   [m,n] = size(A);
   V = single(eye(n));
   W = single(A);

   for iter = 1 : max_iter
      for p = 1 : n-1
        for q = p+1 : n
           Rot = compute_rotation( W(:,p), W(:,q) );
           V(:,[p,q]) = V(:,[p,q]) * Rot;
           W(:,[p,q]) = W(:,[p,q]) * Rot;
        end
      end
   end
   
   % Compute singular values and 'U':
   U = single(zeros(m,n));
   S = single(zeros(n,n));
   for ii = 1 : n
     S(ii,ii) = sqrt(W(:,ii)'*W(:,ii));
     U(:,ii) = W(:,ii)/S(ii,ii);
   end
end

For completeness, the following MATLAB code defines the compute workload for the one-sided Jacobi rotation. This is defined to render the two vectors Xv and Yv orthogonal. This workload contains some vectorizable dot product operations along with some \(sqrt()\), \(inv()\), and squaring operations. These may be mapped to the AI Engine vector data path or can leverage the non-linear hardware accelerator on the scalar data path.

function [res] = calc_ei_2t(x,y)
   R = sqrt(x^2+y^2);
   res = complex(x/R,y/R);
end

function [res] = calc_ei_t(sin_2t,cos_2t)
   R = sqrt((1+cos_2t)^2 + sin_2t^2);
   res = complex((1+cos_2t)/R, sin_2t/R);
end

function [Rot] = calc_rot( eit, c, s )
   Rot = [      eit *complex(0,s),       eit*complex(c,0);
           conj(eit)*complex(-c,0), conj(eit)*complex(0,-s) ];
end

function [Rot] = compute_rotation( Xv, Yv )
   Hpp = real(transpose(Xv)*conj(Xv));
   Hqq = real(transpose(Yv)*conj(Yv));
   tmp = transpose(Xv)*conj(Yv);
   Hrr = real(tmp);
   Hjj = imag(tmp);
   
   ei_2t1 = calc_ei_2t(Hjj,Hrr);
   ei_t1  = calc_ei_t(imag(ei_2t1),real(ei_2t1));
   
   tx = 0.5*(Hqq-Hpp);
   ty = Hrr * imag(ei_2t1) + Hjj * real(ei_2t1);
   
   ei_2t2 = calc_ei_2t(tx,ty);
   ei_t2  = calc_ei_t(imag(ei_2t2), real(ei_2t2));
   
   Rot = calc_rot( ei_t1, real(ei_t2), imag(ei_t2) );
end

The SVD may be parallelized across multiple AI Engine tiles in a manner very similar to the QRD. The most aggressive scheme assigns a single AI Engine tile to each inner-most loop body; in essence the system of three nested loops is flattened fully. It turns out this scheme is overkill for the throughput target of 1 $\mu s\(. Instead, it is possible to partition three inner loop body workloads to each AI Engine tile and still meet the requirement. This saves considerable resources; only \)38$ tiles are required in total for the SVD.

The screenshot below shows the kernel object creation in the adf::graph of the SVD graph implementation in svd_graph.h. The code comments indicate which indices \((p,q)\) are assigned to each tile. Each tile is assigned three inner loop workloads. The last tile is only assigned two inner loop workloads, but it also performs the final workload to compute the singular values required by the MUSIC algorithm for identifying the noise subspace basis vectors.

figure

The following diagram shows the AI Engine physical array view for the SVD subgraph. Once again, the data flow graph has a linear structure similar to the QRD graph, although less memory resources are required because the $\textbf{U}\(, \)\textbf{S}\(, and \)\textbf{V}$ matrices are all $8\times 8$ in this case.

figure