The available Python options are detailed in the aoclda.factorization.PCA() class constructor.
The following options can be set using da_options_set_?:
Option Name |
Type |
Default |
Description |
Constraints |
|---|---|---|---|---|
pca method |
string |
\(s=\) covariance |
Compute PCA based on the covariance or correlation matrix. |
\(s=\) correlation, covariance, or svd. |
degrees of freedom |
string |
\(s=\) unbiased |
Whether to use biased or unbiased estimators for standard deviations and variances. |
\(s=\) biased, or unbiased. |
store u |
integer |
\(i=0\) |
Whether or not to store the matrix U from the SVD. |
\(0 \le i \le 1\) |
n_components |
integer |
\(i=1\) |
Number of principal components to compute. If 0, then all components will be kept. |
\(0 \le i\) |
svd solver |
string |
\(s=\) auto |
Which LAPACK routine to use for the underlying singular value decomposition. |
\(s=\) auto, gesdd, gesvd, gesvdx, or syevd. |
check data |
string |
\(s=\) no |
Check input data for NaNs prior to performing computation. |
\(s=\) no, or yes. |
storage order |
string |
\(s=\) column-major |
Whether data is supplied and returned in row- or column-major order. |
\(s=\) c, column-major, f, fortran, or row-major. |
whiten |
integer |
\(i=0\) |
Whether or not we whiten when transforming the data. |
\(0 \le i \le 1\) |
If the pca method option is set to svd then no standardization is performed. This option should be used if the input data is already standardized or if an explicit singular value decomposition is required. Note, however, that if the columns of the data matrix are not mean-centered, then the computed variance and total_variance will be meaningless.
If a full decomposition is required (so that all principal components are found) then svd solver should be set to gesdd. The LAPACK routines DGESDD or SGESDD (for double and single precision data respectively) will then be used. This choice offers the best performance, while maintaining high accuracy. Note that if internal heuristics determine that it is useful, a QR decomposition may be performed prior to the SVD.
If svd solver is set to syevd then the SVD will be found by explicitly forming the covariance or correlation matrix and using LAPACK routines DSYEVD or SSYEVD to perform an eigendecomposition. This is very fast for tall, thin data matrices but for wider matrices it requires a lot of memory. The method is also more susceptible to ill-conditioning so must be used with care. It is incompatible with the store U option.
svd solver should only be set to gesvd (so that the LAPACK routines DGESVD or SGESVD are used) if there is insufficient memory for the workspace requirements of gesdd, or if gesdd encounters convergence issues. If only one or two principal components are required then, depending on your data matrix, gesvdx may be faster (so that the LAPACK routines DGESVDX or SGESVDX are used).
If svd solver is set to auto, then DGESDD or SGESDD will be used unless internal heuristics determine that the eigendecomposition may be used.
If store U is set to 1, then the matrix \(U\) from the SVD will be stored and used to ensure deterministic results in the signs of the principal components. Note that there may be a small performance penalty in setting this option and it cannot be used if svd solver is set to syevd.
If whiten is set to 1, then the data is whitened upon transformation. This divides each principal component by its corresponding singular value and multiplies the component by a dimensional factor so the transformed data, specifically that data used to fit the PCA, has a unit diagonal covariance matrix.