DBSCAN clustering - 5.2 English - 68552

AOCL API Guide (68552)

Document ID
68552
Release Date
2025-12-29
Version
5.2 English

DBSCAN clustering partitions a set of \(n_{\mathrm{samples}}\) data points \(\{x_1, x_2, \dots, x_{n_{\mathrm{samples}}}\}\) into an unspecified number of clusters, determined at runtime by the density of points.

The algorithm is governed by two parameters, eps and min_samples. The eps parameter is the maximum distance between two samples for one to be considered as in the neighborhood of the other. The min_samples parameter is the number of samples in a neighborhood for a point to be classed as a core sample.

The algorithm works as follows:

  1. The neighborhood of each sample point (that is, the indices of the points within distance eps) is computed.

  2. The sample points are then considered in turn:

    • If a point is not already assigned to a cluster and its neighborhood contains fewer than min_samples points, it is classed as noise.

    • A point is classed as a core sample if its neighborhood contains at least min_samples points. A new cluster is created containing this point.

    • The neighborhood of the core sample is then explored and any points not already assigned to a cluster are added to the cluster.

  3. This process is repeated until all points have been assigned to a cluster or classed as noise.