Implementation - 2023.2 English

The PCA of N components of an m-by-n matrix A is given by the following process:

Calculate the covariance matrix of A

\[\Sigma = \frac{1}{n-1}((A-\bar{A})^T(A-\bar{A}))\]

\[\bar{A} = \frac{1}{n}\sum_{k=1}^{n}A_i\]

Solve n-by-n covariance matrix for its n-by-n eigen-vectors (\(V\)) and n eigen-values (\(D\))
Sort the eigen-values from largest to smallest and then select the top \(N\) eigen-values and their corresponding eigen-vectors.

Once the process is completed there are several outputs available from the library:

ExplainedVariance: This is a vector N wide which corresponds to the selected sorted eigen-values.
Components: These are the N eigen-vectors associated with the selected eigen-values of the original matrix.
LoadingsMatrix: The loadings matrix represent the weigths associated to each original variable when calculating the principal components. It can be computed as follows:

\[Loadings=Components*\sqrt{ExplainedVariance^T}\]

Note

Due to the arbitrary sign of eigen-vectors, them being implementation dependent, calculations of the loadings matrix could return inverted values in a non-deterministic way. To avoid that, we use the same convention as matlab, where the sign for the first element of each eigen-vector must be positive, multiplying the whole vector by \(-1\) otherwise.

Below is a diagram of the internal implementation of PCA:

Architectural diagram of Principal Component Analysis implementation