- aoclda.basic_stats.mean(X, axis='col')#
Arithmetic mean of a data matrix along the specified axis.
For a dataset \(\{x_1, ..., x_n\}\), the arithmetic mean, \(\bar{x}\), is defined as
\[\bar{x}=\frac{1}{n}\sum_{i=1}^{n} x_i.\]- Parameters:
X (array-like) – data matrix of shape (n_samples, n_features).
axis (str, optional) – axis over which means are calculated.
- Returns:
Calculated means.
- Return type:
numpy.ndarray. Depending on
axiscan have shape (n_samples, ), (n_features, ) or (1, )
- aoclda.basic_stats.harmonic_mean(X, axis='col')#
Harmonic mean of a data matrix along the specified axis.
For a dataset \(\{x_1, ..., x_n\}\), the harmonic mean, \(\bar{x}_{harm}\), is defined as
\[\bar{x}_{harm}=\frac{n}{\sum_{i=1}^{n} \frac{1}{x_i}}.\]- Parameters:
X (array-like) – data matrix of shape (n_samples, n_features).
axis (str, optional) – axis over which means are calculated.
- Returns:
Calculated harmonic means.
- Return type:
numpy.ndarray. Depending on
axiscan have shape (n_samples, ), (n_features, ) or (1, )
- aoclda.basic_stats.geometric_mean(X, axis='col')#
Geometric mean of a data matrix along the specified axis.
For a dataset \(\{x_1, ..., x_n\}\), the harmonic mean, \(\bar{x}_{geom}\), is defined as
\[\bar{x}_{geom} = \left(\prod_{i=1}^n x_i\right)^{\frac{1}{n}} \equiv \exp\left(\frac{1}{n}\sum_{i=1}^n\ln x_i\right).\]- Parameters:
X (array-like) – data matrix of shape (n_samples, n_features).
axis (str, optional) – axis over which means are calculated.
- Returns:
Calculated geometric means.
- Return type:
numpy.ndarray. Depending on
axiscan have shape (n_samples, ), (n_features, ) or (1, )
- aoclda.basic_stats.variance(X, dof=0, axis='col')#
Variance of a data matrix along the specified axis.
For a dataset \(\{x_1, ..., x_n\}\), the variance, \(s^2\), is defined as
\[s^2 = \frac{1}{\text{dof}}\sum_{i=1}^n(x_i-\bar{x})^2,\]where dof is the number of degrees of freedom. Setting \(\text{dof} = n\) gives the sample variance, whereas setting \(\text{dof}=n-1\) gives an unbiased estimate of the population variance.
- Parameters:
X (array-like) – data matrix of shape (n_samples, n_features).
dof (int, optional) –
number of degrees of freedom used to compute the variance
If
dof< 0 - the degrees of freedom will be set to the number of observations, where the number of observations is n_samples for column-wise variances, n_features for row-wise variances and n_samples \(\times\) n_features for the overall varianceIf
dof= 0 - the degrees of freedom will be set to the number of observations - 1.If
dof> 0 - the degrees of freedom will be set to the specified value.
axis (str, optional) – The axis over which variances are calculated.
- Returns:
Calculated variances.
- Return type:
numpy.ndarray. Depending on
axiscan have shape (n_samples, ), (n_features, ) or (1, )
- aoclda.basic_stats.skewness(X, axis='col')#
Skewness of a data matrix along the specified axis.
The skewness is computed as the Fischer-Pearson coefficient of skewness (that is, with the central moments scaled by the number of observations, see cite:t:da_kozw2000).
For a dataset \(\{x_1, ..., x_n\}\), the skewness, \(g_1\), is defined as
\[g_1 = \frac{\frac{1}{n}\sum_{i=1}^n(x_i-\bar{x})^3} {\left[\frac{1}{n}\sum_{i=1}^n(x_i-\bar{x})^2\right]^{3/2}}.\]- Parameters:
X (array-like) – data matrix of shape (n_samples, n_features).
axis (str, optional) – axis over which skewnesses are calculated.
- Returns:
Calculated skewnesses.
- Return type:
numpy.ndarray. Depending on
axiscan have shape (n_samples, ), (n_features, ) or (1, )
- aoclda.basic_stats.kurtosis(X, axis='col')#
Kurtosis of a data matrix along the specified axis.
The kurtosis is computed using Fischer’s coefficient of excess kurtosis (that is, with the central moments scaled by the number of observations and 3 subtracted to ensure normally distributed data gives a value of 0, see cite:t:da_kozw2000).
For a dataset \(\{x_1, ..., x_n\}\), the kurtosis, \(g_2\), is defined as
\[g_2 = \frac{\frac{1}{n}\sum_{i=1}^n(x_i-\bar{x})^4} {\left[\frac{1}{n}\sum_{i=1}^n(x_i-\bar{x})^2\right]^{2}}-3.\]- Parameters:
X (array-like) – data matrix of shape (n_samples, n_features).
axis (str, optional) – axis over which kurtoses are calculated.
- Returns:
Calculated kurtoses.
- Return type:
numpy.ndarray. Depending on
axiscan have shape (n_samples, ), (n_features, ) or (1, )
- aoclda.basic_stats.moment(X, k, mean=None, axis='col')#
Central moment of a data matrix along the specified axis.
For a dataset \(\{x_1, ..., x_n\}\), the \(k\)-th central moment, \(m_k\), is defined as
\[m_k=\frac{1}{n}\sum_{i=1}^n(x_i-\bar{x})^k.\]Here, the moments are scaled by the number of observations along the specified axis. The function gives you the option of supplying precomputed means (via the argument
mean) about which the moments are computed. Otherwise it will compute the means itself.- Parameters:
X (array-like) – data matrix of shape (n_samples, n_features).
k (int) – the order of the moment to be computed.
mean (array-like, optional) – 1D array with precomputed means
axis (str, optional) – axis over which moments are calculated.
- Returns:
Calculated moments.
- Return type:
numpy.ndarray. Depending on
axiscan have shape (n_samples, ), (n_features, ) or (1, )
- aoclda.basic_stats.quantile(X, q, method='linear', axis='col')#
Selected quantile of a data matrix along the specified axis.
Computes the q-th quantiles of a data matrix along the specified axis. Note that there are multiple ways to define quantiles. The available quantile types correspond to the 9 different quantile types commonly used (see Hyndman and Fan [1996] for further details). These can specified using the
methodparameter. In each case a number \(h\) is computed, corresponding to the approximate location in the data array of the required quantileq.Note
Methods
'inverted_cdf','averaged_inverted_cdf'and'closest_observation'give discontinuous results.Method
'median_unbiased'is recommended if the sample distribution function is unknown.Method
'normal_unbiased'is recommended if the sample distribution function is known to be normal.Method
'closest_observation'in contrast to NumPy, R and SAS rounds to nearest order statistic and NOT nearest even order statistic.
- Parameters:
X (array-like) – data matrix of shape (n_samples, n_features).
q (float) – the quantile required, must lie in the interval [0,1].
method (str, optional) –
specifies the method used to compute the quantiles.
If
method = 'inverted_cdf'\(h=n\times q\), return \(\texttt{x[i]}\) where \(i = \lceil h \rceil\).If
method = 'averaged_inverted_cdf'\(h=n\times q + 0.5\), return \((\texttt{x[i]}+\texttt{x[j]})/2\) where \(i = \lceil h-1/2 \rceil\) and \(j = \lfloor h+1/2 \rfloor\).If
method = 'closest_observation'\(h=n\times q - 0.5\), return \(\texttt{x[i]}\) where \(i = \lfloor h \rceil\) is the nearest integer to \(h\).If
method = 'interpolated_inverted_cdf'\(h=n\times q\), return \(\texttt{x[i]} + (h-\lfloor h \rfloor)(\texttt{x[j]}-\texttt{x[i]})\) where \(i = \lfloor h\rfloor\) and \(j = \lceil h \rceil\).If
method = 'hazen'\(h=n\times q + 0.5\), return \(\texttt{x[i]} + (h-\lfloor h \rfloor)(\texttt{x[j]}-\texttt{x[i]})\) where \(i = \lfloor h\rfloor\) and \(j = \lceil h \rceil\).If
method = 'weibull'\(h=(n + 1)\times q\), return \(\texttt{x[i]} + (h-\lfloor h \rfloor)(\texttt{x[j]}-\texttt{x[i]})\) where \(i = \lfloor h\rfloor\) and \(j = \lceil h \rceil\).If
method = 'linear'\(h=(n - 1)\times q + 1\), return \(\texttt{x[i]} + (h-\lfloor h \rfloor)(\texttt{x[j]}-\texttt{x[i]})\) where \(i = \lfloor h\rfloor\) and \(j = \lceil h \rceil\).If
method = 'median_unbiased'\(h=(n + 1/3)\times q + 1/3\), return \(\texttt{x[i]} + (h-\lfloor h \rfloor)(\texttt{x[j]}-\texttt{x[i]})\) where \(i = \lfloor h\rfloor\) and \(j = \lceil h \rceil\).If
method = 'normal_unbiased'\(h=(n + 1/4)\times q + 3/8\), return \(\texttt{x[i]} + (h-\lfloor h \rfloor)(\texttt{x[j]}-\texttt{x[i]})\) where \(i = \lfloor h\rfloor\) and \(j = \lceil h \rceil\).
axis (str, optional) – The axis over which quantiles are calculated.
- Returns:
Calculated quantiles.
- Return type:
numpy.ndarray. Depending on
axiscan have shape (n_samples, ), (n_features, ) or (1, )
- aoclda.basic_stats.five_point_summary(X, axis='col')#
Summary statistics of a data matrix along the specified axis.
Computes the maximum, minimum, median and upper/lower hinges of a data array along the specified axis.
Note
On large datasets, this function is more efficient than calling
quantile()five times because it uses partly sorted arrays after each stage.The
'weibull'definition of quantiles is used to calculate the statistics.
- Parameters:
X (array-like) – data matrix of shape (n_samples, n_features).
axis (str, optional) – axis over which summary is calculated.
- Returns:
Tuple with calculated minimum, lower hinge, median, upper hinge and maximum, respectively.
- Return type:
tuple of numpy.ndarray. Depending on an
axisnumpy.ndarray can have shape (n_samples, ), (n_features, ) or (1, )
- aoclda.basic_stats.standardize(X, shift=None, scale=None, dof=0, reverse=False, inplace=False, axis='col')#
Standardize a data matrix along the specified axis.
This function can be called in various different ways
If the arrays
shiftandscaleare both null, then the mean and standard deviations will be computed along the appropriate axis and will be used to shift and scale the data.If the arrays
shiftandscaleare both supplied, then the data matrixXwill be shifted (by subtracting the values inshift) then scaled (by dividing by the values inscale) along the selected axis.If one of the arrays
shiftorscaleis null then it will be ignored and only the other will be used (so that the data is only shifted or only scaled).
In each case, if a 0 scaling factor is encountered then it will not be used.
An additional computational mode is available by setting
reverse = True. In this case the standardization is reversed, so that the data matrix is multiplied by the values in scale before adding the values in shift. This enables users to undo the standardization after the data has been used in another computation.Note
The
inplacefunctionality will only work if the suppliedXarray is F-contiguous
- Parameters:
X (array-like) – data matrix of shape (n_samples, n_features).
shift (array-like, optional) – 1D array of values used for shifting the data.
scale (array-like, optional) – 1D array of values used for scaling the data.
dof (int, optional) –
number of degrees of freedom used to compute standard deviations
If
dof< 0 - the degrees of freedom will be set to the number of observations in specified axis.If
dof= 0 - the degrees of freedom will be set to the number of observations - 1.If
dof> 0 - the degrees of freedom will be set to the specified value.
reverse (bool, optional) –
determines whether or not the standardization proceeds in reverse
If
reverse = false- the data matrix will be shifted (by subtracting the values inshift) then scaled (by dividing by the values inscale).If
reverse = true- the data matrix will be scaled (by multiplying by the values inscale) then shifted (by adding the values inshift).
inplace (bool, optional) – determines whether the standardization is done without a copy
axis (str, optional) – axis over which matrix is standardized.
- Returns:
Standardized matrix
- Return type:
numpy.ndarray of shape (n_samples, n_features)
- aoclda.basic_stats.covariance_matrix(X, dof=0)#
Covariance matrix of a data matrix, with the rows treated as observations and the columns treated as variables.
For a dataset \(X = [\textbf{x}_1, \dots, \textbf{x}_{n_{\text{cols}}}]^T\) with column means \(\{\bar{x}_1, \dots, \bar{x}_{n_{\text{cols}}}\}\) the \((i,j)\) element of the covariance matrix is given by covariance between \(\textbf{x}_i\) and \(\textbf{x}_j\):
\[\text{cov}(i,j) = \frac{1}{\text{dof}}(\textbf{x}_i- \bar{x}_i)\cdot(\textbf{x}_j-\bar{x}_j),\]where dof is the number of degrees of freedom. Setting \(\text{dof} = n_{\text{cols}}\) gives the sample covariances, whereas setting \(\text{dof} = n_{\text{cols}} -1\) gives unbiased estimates of the population covariances. The argument
dofis used to specify the number of degrees of freedom.- Parameters:
X (array-like) – data matrix of shape (n_samples, n_features).
dof (int, optional) –
number of degrees of freedom used to compute covariances
If
dof< 0 - the degrees of freedom will be set to the number of observations.If
dof= 0 - the degrees of freedom will be set to the number of observations - 1.If
dof> 0 - the degrees of freedom will be set to the specified value.
assume_centered (bool, optional) – If False, centers the input matrix by subtracting the column means. If True, assumes that the input data is already centered (mean = 0) and skips the centering step for computational efficiency. Default: False.
- Returns:
Covariance matrix
- Return type:
numpy.ndarray of shape (n_features, n_features)
- aoclda.basic_stats.correlation_matrix(X)#
Correlation matrix of a data matrix, with the rows treated as observations and the columns treated as variables.
For a dataset \(X = [\textbf{x}_1, \dots, \textbf{x}_{n_{\text{cols}}}]^T\) with column means \(\{\bar{x}_1, \dots, \bar{x}_{n_{\text{cols}}}\}\) and column standard deviations \(\{\sigma_1, \dots, \sigma_{n_{\text{cols}}}\}\) the \((i,j)\) element of the correlation matrix is given by correlation between \(\textbf{x}_i\) and \(\textbf{x}_j\):
\[\text{corr}(i,j) = \frac{\text{cov}(i,j)}{\sigma_i\sigma_j}.\]Note that the values in the correlation matrix are independent of the number of degrees of freedom used to compute the standard deviations and covariances.
- Parameters:
X (array-like) – data matrix of shape (n_samples, n_features).
- Returns:
Correlation matrix
- Return type:
numpy.ndarray of shape (n_features, n_features)
-
da_status da_mean_s(da_order order, da_axis axis, da_int n_rows, da_int n_cols, const float *X, da_int ldx, float *mean)#
-
da_status da_mean_d(da_order order, da_axis axis, da_int n_rows, da_int n_cols, const double *X, da_int ldx, double *mean)#
Arithmetic mean of a data matrix.
For a dataset \(\{x_1, \dots, x_n\}\), the arithmetic mean, \(\bar{x}\), is defined as
\[ \bar{x} = \frac{1}{n}\sum_{i=1}^n x_i. \]- Parameters:
order – [in] a da_order enumerated type, specifying whether
Xis stored in row-major order or column-major order.axis – [in] a da_axis enumerated type, specifying whether means are computed by row, by column, or overall.
n_rows – [in] the number of rows in the data matrix. Constraint:
n_rows\(\ge 1\).n_cols – [in] the number of columns in the data matrix. Constraint:
n_cols\(\ge 1\).X – [in] the
n_rows\(\times \)n_colsdata matrix.ldx – [in] the leading dimension of the data matrix. Constraint:
ldx\(\ge\)n_rowsiforder=column_major, orldx\(\ge\)n_colsiforder=row_major.mean – [out] the array which will hold the computed means. If
axis= da_axis_col the array must be at least of size \(n_cols\). Ifaxis= da_axis_row the array must be at least of size \(n_rows\). Ifaxis= da_axis_all the array must be at least of size 1.
- Returns:
da_status. The function returns:
da_status_success - the operation was successfully completed.
da_status_invalid_leading_dimension - the constraint on
ldxwas violated.da_status_invalid_pointer - one of the arrays
Xormeanis null.da_status_invalid_array_dimension - either
n_rows\(< 1\) orn_cols\(< 1\).
-
da_status da_geometric_mean_s(da_order order, da_axis axis, da_int n_rows, da_int n_cols, const float *X, da_int ldx, float *geometric_mean)#
-
da_status da_geometric_mean_d(da_order order, da_axis axis, da_int n_rows, da_int n_cols, const double *X, da_int ldx, double *geometric_mean)#
Geometric mean of a data matrix.
For a dataset \(\{x_1, \dots, x_n\}\), the geometric mean, \(\bar{x}_{geom}\), is defined as
\[ \bar{x}_{geom} = \left(\prod_{i=1}^n x_i\right)^{\frac{1}{n}} \equiv \exp\left(\frac{1}{n}\sum_{i=1}^n\ln x_i\right). \]- Parameters:
order – [in] a da_order enumerated type, specifying whether
Xis stored in row-major order or column-major order.axis – [in] a da_axis enumerated type, specifying whether geometric means are computed by row, by column, or overall.
n_rows – [in] the number of rows in the data matrix. Constraint:
n_rows\(\ge 1\).n_cols – [in] the number of columns in the data matrix. Constraint:
n_cols\(\ge 1\).X – [in] the
n_rows\(\times \)n_colsdata matrix.ldx – [in] the leading dimension of the data matrix. Constraint:
ldx\(\ge\)n_rowsiforder=column_major, orldx\(\ge\)n_colsiforder=row_major.geometric_mean – [out] the array which will hold the computed geometric means. If
axis= da_axis_col the array must be at least of size p. Ifaxis= da_axis_row the array must be at least of size n. Ifaxis= da_axis_all the array must be at least of size 1.
- Returns:
da_status. The function returns:
da_status_success - the operation was successfully completed.
da_status_invalid_leading_dimension - the constraint on
ldxwas violated.da_status_invalid_pointer - one of the arrays
Xorgeometric_meanis null.da_status_invalid_array_dimension - either
n_rows\(< 1\) orn_cols\(< 1\).da_status_negative_data -
Xcontains negative data. The geometric mean is not defined.
-
da_status da_harmonic_mean_s(da_order order, da_axis axis, da_int n_rows, da_int n_cols, const float *X, da_int ldx, float *harmonic_mean)#
-
da_status da_harmonic_mean_d(da_order order, da_axis axis, da_int n_rows, da_int n_cols, const double *X, da_int ldx, double *harmonic_mean)#
Harmonic mean of a data matrix.
For a dataset \(\{x_1, \dots, x_n\}\), the harmonic mean, \(\bar{x}_{harm}\), is defined as
\[ \bar{x}_{harm} = \frac{n}{\sum_{i=1}^n \frac{1}{x_i}}. \]- Parameters:
order – [in] a da_order enumerated type, specifying whether
Xis stored in row-major order or column-major order.axis – [in] a da_axis enumerated type, specifying whether harmonic means are computed by row, by column, or overall.
n_rows – [in] the number of rows in the data matrix. Constraint:
n_rows\(\ge 1\).n_cols – [in] the number of columns in the data matrix. Constraint:
n_cols\(\ge 1\).X – [in] the
n_rows\(\times \)n_colsdata matrix.ldx – [in] the leading dimension of the data matrix. Constraint:
ldx\(\ge\)n_rowsiforder=column_major, orldx\(\ge\)n_colsiforder=row_major.harmonic_mean – [out] the array which will hold the computed harmonic means. If
axis= da_axis_col the array must be at least of size p. Ifaxis= da_axis_row the array must be at least of size n. Ifaxis= da_axis_all the array must be at least of size 1.
- Returns:
da_status. The function returns:
da_status_success - the operation was successfully completed.
da_status_invalid_leading_dimension - the constraint on
ldxwas violated.da_status_invalid_pointer - one of the arrays
Xorharmonic_meanis null.da_status_invalid_array_dimension - either
n_rows\(< 1\) orn_cols\(< 1\).
-
da_status da_variance_s(da_order order, da_axis axis, da_int n_rows, da_int n_cols, const float *X, da_int ldx, da_int dof, float *mean, float *variance)#
-
da_status da_variance_d(da_order order, da_axis axis, da_int n_rows, da_int n_cols, const double *X, da_int ldx, da_int dof, double *mean, double *variance)#
Arithmetic mean and variance of a data matrix.
For a dataset \(\{x_1, \dots, x_n\}\), the variance, \(s^2\), is defined as
\[ s^2 = \frac{1}{\text{dof}}\sum_{i=1}^n(x_i-\bar{x})^2, \]where dof is the number of degrees of freedom. Setting \(\text{dof} = n \) gives the sample variance, whereas setting \(\text{dof} = n -1 \) gives an unbiased estimate of the population variance. The argumentdofis used to specify the number of degrees of freedom.- Parameters:
order – [in] a da_order enumerated type, specifying whether
Xis stored in row-major order or column-major order.axis – [in] a da_axis enumerated type, specifying whether statistics are computed by row, by column, or overall.
n_rows – [in] the number of rows in the data matrix. Constraint:
n_rows\(\ge 1\).n_cols – [in] the number of columns in the data matrix. Constraint:
n_cols\(\ge 1\).X – [in] the
n_rows\(\times \)n_colsdata matrix.ldx – [in] the leading dimension of the data matrix. Constraint:
ldx\(\ge\)n_rowsiforder=column_major, orldx\(\ge\)n_colsiforder=row_major.dof – [in] the number of degrees of freedom used to compute the variance:
dof< 0 - the degrees of freedom will be set to the number of observations, where the number of observations isn_rowsfor column-wise variances,n_colsfor row-wise variances andn_rows\(\times \)n_colsfor the overall variance.dof= 0 - the degrees of freedom will be set to the number of observations - 1.dof> 0 - the degrees of freedom will be set to the specified value.
mean – [out] the array which will hold the computed means. If
axis= da_axis_col the array must be at least of size p. Ifaxis= da_axis_row the array must be at least of size n. Ifaxis= da_axis_all the array must be at least of size 1.variance – [out] the array which will hold the computed variances. If
axis= da_axis_col the array must be at least of size p. Ifaxis= da_axis_row the array must be at least of size n. Ifaxis= da_axis_all the array must be at least of size 1.
- Returns:
da_status. The function returns:
da_status_success - the operation was successfully completed.
da_status_invalid_leading_dimension - the constraint on
ldxwas violated.da_status_invalid_pointer - one of the arrays
X,meanorvarianceis null.da_status_invalid_array_dimension - either
n_rows\(< 1\) orn_cols\(< 1\).
-
da_status da_skewness_s(da_order order, da_axis axis, da_int n_rows, da_int n_cols, const float *X, da_int ldx, float *mean, float *variance, float *skewness)#
-
da_status da_skewness_d(da_order order, da_axis axis, da_int n_rows, da_int n_cols, const double *X, da_int ldx, double *mean, double *variance, double *skewness)#
Arithmetic mean, variance and skewness of a data matrix.
The skewness is computed as the Fischer-Pearson coefficient of skewness (that is, with the central moments scaled by the number of observations, see cite:t:da_kozw2000).
Thus, for a dataset \(\{x_1, \dots, x_n\}\), the skewness, \(g_1\), is defined as\[ g_1 = \frac{\frac{1}{n}\sum_{i=1}^n(x_i-\bar{x})^3}{\left[\frac{1}{n}\sum_{i=1}^n(x_i-\bar{x})^2\right]^{3/2}}. \]The degrees of freedom used to compute the variance is given by the number of observations, where the number of observations isn_rowsfor column-wise variances,n_colsfor row-wise variances andn_rows\(\times \)n_colsfor the overall variance.- Parameters:
order – [in] a da_order enumerated type, specifying whether
Xis stored in row-major order or column-major order.axis – [in] a da_axis enumerated type, specifying whether statistics are computed by row, by column, or overall.
n_rows – [in] the number of rows in the data matrix. Constraint:
n_rows\(\ge 1\).n_cols – [in] the number of columns in the data matrix. Constraint:
n_cols\(\ge 1\).X – [in] the
n_rows\(\times \)n_colsdata matrix.ldx – [in] the leading dimension of the data matrix. Constraint:
ldx\(\ge\)n_rowsiforder=column_major, orldx\(\ge\)n_colsiforder=row_major.mean – [out] the array which will hold the computed means. If
axis= da_axis_col the array must be at least of size p. Ifaxis= da_axis_row the array must be at least of size n. Ifaxis= da_axis_all the array must be at least of size 1.variance – [out] the array which will hold the computed variances. If
axis= da_axis_col the array must be at least of size p. Ifaxis= da_axis_row the array must be at least of size n. Ifaxis= da_axis_all the array must be at least of size 1.skewness – [out] the array which will hold the computed skewnesses. If
axis= da_axis_col the array must be at least of size p. Ifaxis= da_axis_row the array must be at least of size n. Ifaxis= da_axis_all the array must be at least of size 1.
- Returns:
da_status. The function returns:
da_status_success - the operation was successfully completed.
da_status_invalid_leading_dimension - the constraint on
ldxwas violated.da_status_invalid_pointer - one of the arrays
X,mean,varianceorskewnessis null.da_status_invalid_array_dimension - either
n_rows\(< 1\) orn_cols\(< 1\).
-
da_status da_kurtosis_s(da_order order, da_axis axis, da_int n_rows, da_int n_cols, const float *X, da_int ldx, float *mean, float *variance, float *kurtosis)#
-
da_status da_kurtosis_d(da_order order, da_axis axis, da_int n_rows, da_int n_cols, const double *X, da_int ldx, double *mean, double *variance, double *kurtosis)#
Arithmetic mean, variance and kurtosis of a data matrix.
The kurtosis is computed using Fischer’s coefficient of excess kurtosis (that is, with the central moments scaled by the number of observations and 3 subtracted to ensure normally distributed data gives a value of 0, see cite:t:da_kozw2000).
Thus, for a dataset \(\{x_1, \dots, x_n\}\), the kurtosis, \(g_2\), is defined as\[ g_2 = \frac{\frac{1}{n}\sum_{i=1}^n(x_i-\bar{x})^4}{\left[\frac{1}{n}\sum_{i=1}^n(x_i-\bar{x})^2\right]^{2}}-3. \]The degrees of freedom used to compute the variance is given by the number of observations, where the number of observations isn_rowsfor column-wise variances,n_colsfor row-wise variances andn_rows\(\times \)n_colsfor the overall variance.- Parameters:
order – [in] a da_order enumerated type, specifying whether
Xis stored in row-major order or column-major order.axis – [in] a da_axis enumerated type, specifying whether statistics are computed by row, by column, or overall.
n_rows – [in] the number of rows in the data matrix. Constraint:
n_rows\(\ge 1\).n_cols – [in] the number of columns in the data matrix. Constraint:
n_cols\(\ge 1\).X – [in] the
n_rows\(\times \)n_colsdata matrix.ldx – [in] the leading dimension of the data matrix. Constraint:
ldx\(\ge\)n_rowsiforder=column_major, orldx\(\ge\)n_colsiforder=row_major.mean – [out] the array which will hold the computed means. If
axis= da_axis_col the array must be at least of size p. Ifaxis= da_axis_row the array must be at least of size n. Ifaxis= da_axis_all the array must be at least of size 1.variance – [out] the array which will hold the computed variances. If
axis= da_axis_col the array must be at least of size p. Ifaxis= da_axis_row the array must be at least of size n. Ifaxis= da_axis_all the array must be at least of size 1.kurtosis – [out] the array which will hold the computed kurtoses. If
axis= da_axis_col the array must be at least of size p. Ifaxis= da_axis_row the array must be at least of size n. Ifaxis= da_axis_all the array must be at least of size 1.
- Returns:
da_status. The function returns:
da_status_success - the operation was successfully completed.
da_status_invalid_leading_dimension - the constraint on
ldxwas violated.da_status_invalid_pointer - one of the arrays
X,mean,varianceorkurtosisis null.da_status_invalid_array_dimension - either
n_rows\(< 1\) orn_cols\(< 1\).
-
da_status da_moment_s(da_order order, da_axis axis, da_int n_rows, da_int n_cols, const float *X, da_int ldx, da_int k, da_int use_precomputed_mean, float *mean, float *moment)#
-
da_status da_moment_d(da_order order, da_axis axis, da_int n_rows, da_int n_cols, const double *X, da_int ldx, da_int k, da_int use_precomputed_mean, double *mean, double *moment)#
Central moment of a data matrix.
For a dataset \(\{x_1, \dots, x_n\}\), the kth central moment, \(m_k\), is defined as
\[ m_k=\frac{1}{n}\sum_{i=1}^n(x_i-\bar{x})^k. \]Here, the moments are scaled by the number of observations:n_rowsfor column-wise moments,n_colsfor row-wise moments andn_rows\(\times \)n_colsfor the overall moment. The function gives you the option of supplying precomputed means about which the moments are computed. Otherwise it will compute the means and return them along with the moments.- Parameters:
order – [in] a da_order enumerated type, specifying whether
Xis stored in row-major order or column-major order.axis – [in] a da_axis enumerated type, specifying whether moments are computed by row, by column, or overall.
n_rows – [in] the number of rows in the data matrix. Constraint:
n_rows\(\ge 1\).n_cols – [in] the number of columns in the data matrix. Constraint:
n_cols\(\ge 1\).X – [in] the
n_rows\(\times \)n_colsdata matrix.ldx – [in] the leading dimension of the data matrix. Constraint:
ldx\(\ge\)n_rowsiforder=column_major, orldx\(\ge\)n_colsiforder=row_major.k – [in] the order of the moment to be computed. Constraint: k \(>\) 0.
use_precomputed_mean – [in] if nonzero, then means supplied by the calling program will be used. Otherwise means will be computed internally and returned to the calling program.
mean – [inout] the array which will hold the computed means. If use_precomputed_mean is zero then this array need not be set on entry. If
axis= da_axis_col the array must be at least of size p. Ifaxis= da_axis_row the array must be at least of size n. Ifaxis= da_axis_all the array must be at least of size 1.moment – [out] the array which will hold the computed moments. If
axis= da_axis_col the array must be at least of size p. Ifaxis= da_axis_row the array must be at least of size n. Ifaxis= da_axis_all the array must be at least of size 1.
- Returns:
da_status. The function returns:
da_status_success - the operation was successfully completed.
da_status_invalid_leading_dimension - the constraint on
ldxwas violated.da_status_invalid_pointer - one of the arrays
Xormeanis null.da_status_invalid_array_dimension - either
n_rows\(< 1\) orn_cols\(< 1\).da_status_invalid_input - \(k < 0\).
-
da_status da_quantile_s(da_order order, da_axis axis, da_int n_rows, da_int n_cols, const float *X, da_int ldx, float q, float *quantile, da_quantile_type quantile_type)#
-
da_status da_quantile_d(da_order order, da_axis axis, da_int n_rows, da_int n_cols, const double *X, da_int ldx, double q, double *quantile, da_quantile_type quantile_type)#
Selected quantile of a data matrix.
Computes the qth quantiles of a data array along the specified axis. Note that there are multiple ways to define quantiles. These are specified using the da_quantile_type enum.
- Parameters:
order – [in] a da_order enumerated type, specifying whether
Xis stored in row-major order or column-major order.axis – [in] a da_axis enumerated type, specifying whether quantiles are computed by row, by column, or overall.
n_rows – [in] the number of rows in the data matrix. Constraint:
n_rows\(\ge 1\).n_cols – [in] the number of columns in the data matrix. Constraint:
n_cols\(\ge 1\).X – [in] the
n_rows\(\times \)n_colsdata matrix.ldx – [in] the leading dimension of the data matrix. Constraint:
ldx\(\ge\)n_rowsiforder=column_major, orldx\(\ge\)n_colsiforder=row_major.q – [in] the quantile required. Constraint: q must lie in the interval [0,1].
quantile – [out] the array which will hold the computed quantiles. If
axis= da_axis_col the array must be at least of size p. Ifaxis= da_axis_row the array must be at least of size n. Ifaxis= da_axis_all the array must be at least of size 1.quantile_type – [in] specifies the method used to compute the quantiles.
- Returns:
da_status. The function returns:
da_status_success - the operation was successfully completed.
da_status_invalid_leading_dimension - the constraint on
ldxwas violated.da_status_invalid_pointer - one of the arrays
Xorquantileis null.da_status_invalid_array_dimension - either
n_rows\(< 1\) orn_cols\(< 1\).da_status_invalid_input -
qis not in the interval \([0,1]\).da_status_memory_error - a memory allocation error occurred.
-
da_status da_five_point_summary_s(da_order order, da_axis axis, da_int n_rows, da_int n_cols, const float *X, da_int ldx, float *minimum, float *lower_hinge, float *median, float *upper_hinge, float *maximum)#
-
da_status da_five_point_summary_d(da_order order, da_axis axis, da_int n_rows, da_int n_cols, const double *X, da_int ldx, double *minimum, double *lower_hinge, double *median, double *upper_hinge, double *maximum)#
Summary statistics of a data matrix.
Computes the maximum, minimum, median and upper/lower hinges of a data array along the specified axis.
- Parameters:
order – [in] a da_order enumerated type, specifying whether
Xis stored in row-major order or column-major order.axis – [in] a da_axis enumerated type, specifying whether statistics are computed by row, by column, or overall.
n_rows – [in] the number of rows in the data matrix. Constraint:
n_rows\(\ge 1\).n_cols – [in] the number of columns in the data matrix. Constraint:
n_cols\(\ge 1\).X – [in] the
n_rows\(\times \)n_colsdata matrix.ldx – [in] the leading dimension of the data matrix. Constraint:
ldx\(\ge\)n_rowsiforder=column_major, orldx\(\ge\)n_colsiforder=row_major.minimum – [out] the array which will hold the computed minima. If
axis= da_axis_col the array must be at least of size p. Ifaxis= da_axis_row the array must be at least of size n. Ifaxis= da_axis_all the array must be at least of size 1.lower_hinge – [out] the array which will hold the computed lower_hinges. If
axis= da_axis_col the array must be at least of size p. Ifaxis= da_axis_row the array must be at least of size n. Ifaxis= da_axis_all the array must be at least of size 1.median – [out] the array which will hold the computed medians. If
axis= da_axis_col the array must be at least of size p. Ifaxis= da_axis_row the array must be at least of size n. Ifaxis= da_axis_all the array must be at least of size 1.upper_hinge – [out] the array which will hold the computed upper_hinges. If
axis= da_axis_col the array must be at least of size p. Ifaxis= da_axis_row the array must be at least of size n. Ifaxis= da_axis_all the array must be at least of size 1.maximum – [out] the array which will hold the computed maxima. If
axis= da_axis_col the array must be at least of size p. Ifaxis= da_axis_row the array must be at least of size n. Ifaxis= da_axis_all the array must be at least of size 1.
- Returns:
da_status. The function returns:
da_status_success - the operation was successfully completed.
da_status_invalid_leading_dimension - the constraint on
ldxwas violated.da_status_invalid_pointer - one of the array arguments is null.
da_status_invalid_array_dimension - either
n_rows\(< 1\) orn_cols\(< 1\).da_status_memory_error - a memory allocation error occurred.
-
da_status da_standardize_s(da_order order, da_axis axis, da_int n_rows, da_int n_cols, float *X, da_int ldx, da_int dof, da_int mode, float *shift, float *scale)#
-
da_status da_standardize_d(da_order order, da_axis axis, da_int n_rows, da_int n_cols, double *X, da_int ldx, da_int dof, da_int mode, double *shift, double *scale)#
Standardize a data matrix.
This routine can be called in various different ways.
If the arrays
shiftandscaleare both null, then the mean and standard deviations will be computed along the appropriate axis and will be used to shift and scale the data.If the arrays
shiftandscaleare both supplied, then the data matrixXwill be shifted (by subtracting the values inshift) then scaled (by dividing by the values inscale) along the selected axis.If one of the arrays
shiftorscalecontains only zeros, then the mean or standard deviations about the supplied means will be computed as appropriate and stored in that array before being used to standardize the data.If one of the arrays
shiftorscaleis null then it will be ignored and only the other will be used (so that the data is only shifted or only scaled).In each case, if a 0 scaling factor is encountered then it will not be used.
An additional computational mode is available by setting
mode= 1. In this case the standardization is reversed, so that the data matrix is multiplied by the values inscalebefore adding the values inshift. This enables users to undo the standardization after the data has been used in another computation.- Parameters:
order – [in] a da_order enumerated type, specifying whether
Xis stored in row-major order or column-major order.axis – [in] a da_axis enumerated type, specifying whether statistics are computed by row, by column, or overall.
n_rows – [in] the number of rows in the data matrix. Constraint:
n_rows\(\ge 1\).n_cols – [in] the number of columns in the data matrix. Constraint:
n_cols\(\ge 1\).X – [in] the
n_rows\(\times \)n_colsdata matrix.ldx – [in] the leading dimension of the data matrix. Constraint:
ldx\(\ge\)n_rowsiforder=column_major, orldx\(\ge\)n_colsiforder=row_major.dof – [in] the number of degrees of freedom used to compute standard deviations:
dof< 0 - the degrees of freedom will be set to the number of observations, where the number of observations isn_rowsfor the column-wise computation,n_colsfor the row-wise computation andn_rows\(\times \)n_colsfor the overall computation.dof= 0 - the degrees of freedom will be set to the number of observations - 1.dof> 0 - the degrees of freedom will be set to the specified value.
mode – [in] determines whether or not the standardization proceeds in reverse:
mode= 0 - the data matrix will be shifted (by subtracting the values inshift) then scaled (by dividing by the values inscale).mode= 1 - the data matrix will be scaled (by multiplying by the values inscale) then shifted (by adding the values inshift).
shift – [in] the array of values for shifting the data. Can be null (see above). If
axis= da_axis_col the array must be at least of size p. Ifaxis= da_axis_row the array must be at least of size n. Ifaxis= da_axis_all the array must be at least of size 1.scale – [in] the array of values for scaling the data. Can be null (see above). If
axis= da_axis_col the array must be at least of size p. Ifaxis= da_axis_row the array must be at least of size n. Ifaxis= da_axis_all the array must be at least of size 1.
- Returns:
da_status. The function returns:
da_status_success - the operation was successfully completed.
da_status_invalid_input -
modemust be either 0 or 1.da_status_invalid_leading_dimension - the constraint on
ldxwas violated.da_status_invalid_pointer -
Xis null.da_status_invalid_array_dimension - either
n_rows\(< 1\) orn_cols\(< 1\).
-
da_status da_covariance_matrix_s(da_order order, da_int n_rows, da_int n_cols, const float *X, da_int ldx, da_int dof, float *cov, da_int ldcov, da_int assume_centered)#
-
da_status da_covariance_matrix_d(da_order order, da_int n_rows, da_int n_cols, const double *X, da_int ldx, da_int dof, double *cov, da_int ldcov, da_int assume_centered)#
Covariance matrix of a data matrix, with the rows treated as observations and the columns treated as variables.
For a dataset \(X = [\textbf{x}_1, \dots, \textbf{x}_{n_{\text{cols}}}]^T\) with column means \(\{\bar{x}_1, \dots, \bar{x}_{n_{\text{cols}}}\}\), the \((i, j)\) element of the covariance matrix is given by the covariance between \(\textbf{x}_i\) and \(\textbf{x}_j\):
\[ \text{cov}(i,j) = \frac{1}{\text{dof}}(\textbf{x}_i-\bar{x}_i)\cdot(\textbf{x}_j-\bar{x}_j), \]where dof is the number of degrees of freedom. Setting \(\text{dof} = n_{\text{cols}} \) gives the sample covariances, whereas setting \(\text{dof} = n_{\text{cols}} -1 \) gives unbiased estimates of the population covariances. The argumentdofis used to specify the number of degrees of freedom.- Parameters:
order – [in] a da_order enumerated type, specifying whether
Xandcovare stored in row-major order or column-major order.n_rows – [in] the number of rows in the data matrix. Constraint:
n_rows\(\ge 1\).n_cols – [in] the number of columns in the data matrix. Constraint:
n_cols\(\ge 1\).X – [in] the
n_rows\(\times \)n_colsdata matrix.ldx – [in] the leading dimension of the data matrix. Constraint:
ldx\(\ge\)n_rowsiforder=column_major, orldx\(\ge\)n_colsiforder=row_major.dof – [in] the number of degrees of freedom used to compute the covariances:
dof< 0 - the degrees of freedom will be set ton_rows.dof= 0 - the degrees of freedom will be set ton_rows- 1.dof> 0 - the degrees of freedom will be set to the specified value.
cov – [out] the array which will hold the
n_cols\(\times \)n_colscovariance matrix. The matrix will be returned with the same storage order as the input data.ldcov – [in] the leading dimension of the covariance matrix. Constraint:
ldcov\(>\)n_cols.assume_centered – [in] if equal to 1, assumes the input matrix
Xis already mean-centered and skips the centering step for computational efficiency. If equal to 0, centers the data by subtracting column means. Accepted values: 0 and 1.
- Returns:
da_status. The function returns:
da_status_success - the operation was successfully completed.
da_status_invalid_leading_dimension - one of the constraints on
ldxorldcovwas violated.da_status_invalid_pointer - one of the arrays
Xorcovis null.da_status_invalid_array_dimension - either
n_rows\(< 1\) orn_cols\(< 1\).da_status_memory_error - a memory allocation error occurred.
-
da_status da_correlation_matrix_s(da_order order, da_int n_rows, da_int n_cols, const float *X, da_int ldx, float *corr, da_int ldcorr)#
-
da_status da_correlation_matrix_d(da_order order, da_int n_rows, da_int n_cols, const double *X, da_int ldx, double *corr, da_int ldcorr)#
Correlation matrix of a data matrix, with the rows treated as observations and the columns treated as variables.
For a dataset \(X = [\textbf{x}_1, \dots, \textbf{x}_{n_{\text{cols}}}]^T\) with column means \(\{\bar{x}_1, \dots, \bar{x}_{n_{\text{cols}}}\}\) and column standard deviations \(\{\sigma_1, \dots, \sigma_{n_{\text{cols}}}\}\), the \((i, j)\) element of the correlation matrix is given by the correlation between \(\textbf{x}_i\) and \(\textbf{x}_j\):
\[ \text{corr}(i,j) = \frac{\text{cov}(i,j)}{\sigma_i\sigma_j}. \]Note that the values in the correlation matrix are independent of the number of degrees of freedom used to compute the standard deviations and covariances.- Parameters:
order – [in] a da_order enumerated type, specifying whether
Xandcorrare stored in row-major order or column-major order.n_rows – [in] the number of rows in the data matrix. Constraint:
n_rows\(\ge 1\).n_cols – [in] the number of columns in the data matrix. Constraint:
n_cols\(\ge 1\).X – [in] the
n_rows\(\times \)n_colsdata matrix.ldx – [in] the leading dimension of the data matrix. Constraint:
ldx\(\ge\)n_rowsiforder=column_major, orldx\(\ge\)n_colsiforder=row_major.corr – [out] the array which will hold the
n_cols\(\times \)n_colscorrelation matrix. Must be of size at leastn_cols\(\times \)ldcorr. The matrix will be returned with the same storage order as the input data.ldcorr – [in] the leading dimension of the correlation matrix. Constraint:
ldcorr\(>\)n_cols.
- Returns:
da_status. The function returns:
da_status_success - the operation was successfully completed.
da_status_invalid_leading_dimension - one of the constraints on
ldxorldcorrwas violated.da_status_invalid_pointer - one of the arrays
Xorcorris null.da_status_invalid_array_dimension - either
n_rows\(< 1\) orn_cols\(< 1\).da_status_memory_error - a memory allocation error occurred.
-
enum da_axis_#
Defines whether to compute statistical quantities by row, by column or for the whole data matrix.
Values:
-
enumerator da_axis_col#
Compute statistics column wise.
-
enumerator da_axis_row#
Compute statistics row wise.
-
enumerator da_axis_all#
Compute statistics for the whole data matrix.
-
enumerator da_axis_col#
-
typedef enum da_quantile_type_ da_quantile_type#
Alias for the da_quantile_type_ enum.
-
enum da_quantile_type_#
Defines the method used to compute quantiles in da_quantile_s and da_quantile_d.
The available quantile types correspond to the 9 different quantile types commonly used (see cite:t:da_hyfa96 for further details). It is recommended to use type 6 or type 7 as a default.
Notes about the available types:
Types 1, 2 and 3 give discontinuous results.
Type 8 is recommended if the sample distribution function is unknown.
Type 9 is recommended if the sample distribution function is known to be normal.
In each case a number \(h\) is computed, corresponding to the approximate location in the data array of the required quantile,
q\(\in [0,1]\). Then the quantile is computed as follows:Values:
-
enumerator da_quantile_type_1#
\(h=n \times q\); return \(\texttt{x[i]}\), where \(i = \lceil h \rceil\).
-
enumerator da_quantile_type_2#
\(h=n \times q+0.5\); return \((\texttt{x[i]}+\texttt{x[j]})/2\), where \(i = \lceil h-1/2\rceil\) and \(j = \lfloor h+1/2\rfloor\).
-
enumerator da_quantile_type_3#
\(h=n \times q-0.5\); return \(\texttt{x[i]}\), where \(i\) is the nearest integer to \(h\).
-
enumerator da_quantile_type_4#
\(h=n \times q\); return \(\texttt{x[i]} + (h-\lfloor h \rfloor)(\texttt{x[j]}-\texttt{x[i]})\), where \(i = \lfloor h\rfloor\) and \(j = \lceil h \rceil\).
-
enumerator da_quantile_type_5#
\(h=n \times q+0.5\); return \(\texttt{x[i]} + (h-\lfloor h \rfloor)(\texttt{x[j]}-\texttt{x[i]})\), where \(i = \lfloor h\rfloor\) and \(j = \lceil h \rceil\).
-
enumerator da_quantile_type_6#
\(h=(n+1) \times q\); return \(\texttt{x[i]} + (h-\lfloor h \rfloor)(\texttt{x[j]}-\texttt{x[i]})\), where \(i = \lfloor h\rfloor\) and \(j = \lceil h \rceil\).
-
enumerator da_quantile_type_7#
\(h=(n-1) \times q+1\); return \(\texttt{x[i]} + (h-\lfloor h \rfloor)(\texttt{x[j]}-\texttt{x[i]})\), where \(i = \lfloor h\rfloor\) and \(j = \lceil h \rceil\).
-
enumerator da_quantile_type_8#
\(h=(n+1/3) \times q + 1/3\); return \(\texttt{x[i]} + (h-\lfloor h \rfloor)(\texttt{x[j]}-\texttt{x[i]})\), where \(i = \lfloor h\rfloor\) and \(j = \lceil h \rceil\).
-
enumerator da_quantile_type_9#
\(h=(n+1/4) \times q + 3/8\); return \(\texttt{x[i]} + (h-\lfloor h \rfloor)(\texttt{x[j]}-\texttt{x[i]})\), where \(i = \lfloor h\rfloor\) and \(j = \lceil h \rceil\).