The PCA of N components of an m-by-n matrix A is given by the following process:
- Calculate the covariance matrix of A
\[\Sigma = \frac{1}{n-1}((A-\bar{A})^T(A-\bar{A}))\]
\[\bar{A} = \frac{1}{n}\sum_{k=1}^{n}A_i\]
- Solve n-by-n covariance matrix for its n-by-n eigen-vectors (\(V\)) and n eigen-values (\(D\))
- Sort the eigen-values from largest to smallest and then select the top \(N\) eigen-values and their corresponding eigen-vectors.
Once the process is completed, there are several outputs available from the library:
- ExplainedVariance: This is a vector N wide which corresponds to the selected sorted eigen-values.
- Components: These are the N eigen-vectors associated with the selected eigen-values of the original matrix.
- LoadingsMatrix: The loadings matrix represent the weights associated to each original variable when calculating the principal components. It can be computed as follows:
\[Loadings=Components*\sqrt{ExplainedVariance^T}\]
Note
Due to the arbitrary sign of eigen-vectors, them being implementation dependent, calculations of the loadings matrix could return inverted values in a non-deterministic way. To avoid that, use the same convention as matlab, where the sign for the first element of each eigen-vector must be positive, multiplying the whole vector by \(-1\) otherwise.
Below is a diagram of the internal implementation of PCA: