DBSCAN clustering partitions a set of \(n_{\mathrm{samples}}\) data points \(\{x_1, x_2, \dots, x_{n_{\mathrm{samples}}}\}\) into an unspecified number of clusters, determined at runtime by the density of points.
The algorithm is governed by two parameters, eps and min_samples.
The eps parameter is the maximum distance between two samples for one to be considered as in the neighborhood of the other. The min_samples parameter is the number of samples in a neighborhood for a point to be classed as a core sample.
The algorithm works as follows:
The neighborhood of each sample point (that is, the indices of the points within distance
eps) is computed.The sample points are then considered in turn:
If a point is not already assigned to a cluster and its neighborhood contains fewer than
min_samplespoints, it is classed as noise.A point is classed as a core sample if its neighborhood contains at least
min_samplespoints. A new cluster is created containing this point.The neighborhood of the core sample is then explored and any points not already assigned to a cluster are added to the cluster.
This process is repeated until all points have been assigned to a cluster or classed as noise.