The available Python options are detailed in the aoclda.decision_tree.decision_tree() and
aoclda.decision_forest.decision_forest() class constructors.
The following options can be set using da_options_set_?:
Option name |
Type |
Default |
Description |
Constraints |
|---|---|---|---|---|
category split strategy |
string |
\(s=\) ordered |
How to split categorical features: split one category from all other or consider them ordered. |
\(s=\) one-vs-all, or ordered. |
maximum bins |
integer |
\(i=256\) |
Maximum number of bins in histograms. |
\(2 \le i \le 65535\) |
histogram |
string |
\(s=\) no |
Choose whether to use histograms constructed from the data matrix X. |
\(s=\) no, or yes. |
feature threshold |
real |
\(r=1e-05\) |
Minimum difference in feature value required for splitting. |
\(0 \le r\) |
storage order |
string |
\(s=\) column-major |
Whether data is supplied and returned in row- or column-major order. |
\(s=\) c, column-major, f, fortran, or row-major. |
check data |
string |
\(s=\) no |
Check input data for NaNs prior to performing computation. |
\(s=\) no, or yes. |
minimum split score |
real |
\(r=1e-05\) |
Minimum score needed for a node to be considered for splitting. |
\(0 \le r \le 1\) |
maximum features |
integer |
\(i=0\) |
Set the number of features to consider when ‘features selection’ is set to ‘custom’. 0 means take all the features. |
\(0 \le i\) |
number of trees |
integer |
\(i=100\) |
Set the number of trees to compute. |
\(1 \le i\) |
seed |
integer |
\(i=-1\) |
Set random seed for the random number generator. If the value is -1, a random seed is automatically generated. In this case the resulting classification will create non-reproducible results. |
\(-1 \le i\) |
node minimum samples |
integer |
\(i=2\) |
Minimum number of samples to consider a node for splitting. |
\(1 \le i\) |
maximum depth |
integer |
\(i=29\) |
Set the maximum depth of trees. |
\(0 \le i \le 29\) |
scoring function |
string |
\(s=\) gini |
Select scoring function to use. |
\(s=\) cross-entropy, entropy, gini, misclass, misclassification, or misclassification-error. |
minimum impurity decrease |
real |
\(r=0\) |
Minimum score improvement needed to consider a split from the parent node. |
\(0 \le r\) |
block size |
integer |
\(i=256\) |
Set the size of the blocks for parallel computations. |
\(1 \le i \le 2147483647\) |
features selection |
string |
\(s=\) sqrt |
Select how many features to use for each split. ‘custom’ reads the ‘maximum features’ option, proportion reads the ‘proportion features’ option. ‘all’, ‘sqrt’ and ‘log2’ select respectively all, the square root or the base-2 logarithm of the total number of features. |
\(s=\) all, custom, log2, proportion, or sqrt. |
bootstrap |
string |
\(s=\) yes |
Select whether to bootstrap the samples in the trees. |
\(s=\) no, or yes. |
bootstrap samples factor |
real |
\(r=1\) |
Proportion of samples to draw from the data set to build each tree if ‘bootstrap’ was set to ‘yes’. |
\(0 < r \le 1\) |
proportion features |
real |
\(r=0.1\) |
Set the proportion of features to consider when ‘features selection’ is set to ‘proportion’. |
\(0 < r \le 1\) |