DBSCAN - 5.2 English - 68552

AOCL API Guide (68552)

Document ID
68552
Release Date
2025-12-29
Version
5.2 English
class aoclda.clustering.DBSCAN(min_samples=5, metric='euclidean', algorithm='brute', leaf_size=30, eps=0.5, power=2.0, check_data=false)#

DBSCAN clustering.

Partition a data matrix into clusters using DBSCAN clustering.

Parameters:
  • min_samples (int, optional) – Minimum number of neighborhood samples for a sample point to be considered a core point. Default = 5.

  • metric (str, optional) – The distance metric used to compare sample points. Available metrics are ‘euclidean’, ‘l2’, ‘sqeuclidean’ (squared Euclidean distances), ‘manhattan’, ‘l1’, ‘cityblock’, ‘cosine’, or ‘minkowski’. Default = ‘euclidean’.

  • algorithm (str, optional) – The algorithm used to compute the clusters. Available options are ‘auto’, ‘ball_tree’, ‘brute’ and ‘kd_tree’. k-d trees are likely to be fastest for lower dimensional datasets, and ball trees may be preferred when data is not aligned along coordinate axes, but trees cannot not be used with the cosine distance, the squared Euclidean distance, or with the Minkowski distance with power less than 1.0. Default = ‘auto’.

  • leaf_size (int, optional) – Leaf size for the k-d tree algorithm. Default = 30.

  • eps (float, optional) – Maximum distance between two samples for them to be considered in each other’s neighborhood. Default = 0.5.

  • power (float, optional) – Power used in computing the Minkowski metric. Default = 2.0.

  • check_data (bool, optional) – Whether to check the data for NaNs. Default = False.

property core_sample_indices#

The indices of the core samples in the data matrix.

Type:

numpy.ndarray of shape (n_core_samples, )

fit(A)#

Computes DBSCAN clusters for the supplied data matrix.

Parameters:

A (array-like) – The data matrix with which to compute the DBSCAN clusters. It has shape (n_samples, n_features).

Returns:

Returns the instance itself.

Return type:

self (object)

property labels#

The label (i.e. which cluster) of each sample point in the data matrix. A label of -1 indicates that the point has been classified as noise and has not been assigned to a cluster.

Type:

numpy.ndarray of shape (n_samples, )

property n_clusters#

The number of clusters found.

Type:

int

property n_core_samples#

The number of core samples found in the data matrix.

Type:

int

property n_features#

The number of features in the data matrix.

Type:

int

property n_samples#

The number of samples in the data matrix.

Type:

int

da_status da_dbscan_set_data_s(da_handle handle, da_int n_samples, da_int n_features, const float *A, da_int lda)#
da_status da_dbscan_set_data_d(da_handle handle, da_int n_samples, da_int n_features, const double *A, da_int lda)#

Pass a data matrix to the da_handle object in preparation for DBSCAN clustering.

The data itself is not copied; a pointer to the data matrix is stored instead.

After calling this function you may use the option setting APIs to set options.

Parameters:
  • handle[inout] a da_handle object, initialized with type da_handle_dbscan.

  • n_samples[in] the number of rows of the data matrix, A. Constraint: n_samples \(\ge\) 1.

  • n_features[in] the number of columns of the data matrix, A. Constraint: n_features \(\ge\) 1.

  • A[in] the n_samples \(\times\) n_features data matrix. By default, it should be stored in column-major order, unless you have set the storage order option to row-major.

  • lda[in] the leading dimension of the data matrix. Constraint: lda \(\ge\) n_samples if A is stored in column-major order, or lda \(\ge\) n_features if A is stored in row-major order.

Returns:

da_status. The function returns:

da_status da_dbscan_compute_s(da_handle handle)#
da_status da_dbscan_compute_d(da_handle handle)#

Compute DBSCAN clustering.

Computes DBSCAN clustering on the data matrix previously passed into the handle using da_dbscan_set_data_?.

Parameters:

handle[inout] a da_handle object, initialized with type da_handle_dbscan and with data passed in via da_dbscan_set_data_?.

Returns:

da_status. The function returns:

Post:

After successful execution, da_handle_get_result_? can be queried with the following enum for floating-point output:

  • da_rinfo - return an array of size 9 containing the values of n_samples, n_features, lda, eps, min_samples, leaf_size, p, n_core_samples and n_clusters.

In addition da_handle_get_result_int can be queried with the following enums:

  • da_dbscan_n_clusters - return the number of clusters found.

  • da_dbscan_n_core_samples - return the number of core samples found, n_core_samples.

  • da_dbscan_labels - return an array of size n_samples containing the label (i.e. which cluster it is in) of each sample point. A label of -1 indicates that the point has been classified as noise and has not been assigned to a cluster.

  • da_dbscan_core_sample_indices - return an array of size n_core_samples containing the indices of the core samples.