Skip to content

ML.EVAL.CLUSTERING.SILHOUETTE_SCORE

Returns the mean silhouette coefficient over all samples.

Syntax

ML.EVAL.CLUSTERING.SILHOUETTE_SCORE(X, labels, metric, sample_size, random_state)

Arguments

Name Type Default Description
X object DataFrame or 2-D array object of the data that was clustered.
labels object DataFrame or array object of predicted cluster labels, one per row of X.
metric Any "euclidean" Distance metric used to compute pairwise distances (e.g. 'euclidean', 'manhattan', 'cosine').
sample_size Any None If set, compute the silhouette on a random subsample of this many rows instead of the full dataset.
random_state Any None Random seed for sample_size selection. Use a fixed integer for reproducible results.

Returns

A single number between -1 and +1 — higher means better-separated clusters.

When to use

Use ML.EVAL.CLUSTERING.SILHOUETTE_SCORE to gauge how cleanly your clusters separate. For each sample it compares its average distance to other points in its own cluster against its average distance to the nearest other cluster, and averages the result across all samples.

The score lives in [-1, 1]:

  • +1 — clusters are tight and well-separated.
  • 0 — clusters overlap; samples sit on or near a boundary.
  • -1 — many samples are likely assigned to the wrong cluster.

It's especially handy alongside ML.INSPECT.INERTIA when picking n_clusters: inertia always shrinks as k grows, but the silhouette score peaks at a "natural" number of clusters and declines on either side.

Examples

Score a fitted K-Means model's predicted labels on the data in A2:E101:

=ML.CLUSTERING.KMEANS(3, "k-means++", "auto", 300, 0.0001, 0)
=ML.FIT(H1, A2:E101)
=ML.PREDICT(H2, A2:E101)
=ML.EVAL.CLUSTERING.SILHOUETTE_SCORE(A2:E101, H3#)

Score multiple k values to find the most natural cluster count, lining the results up next to the inertia for the elbow chart:

A2: 2     B2: =ML.CLUSTERING.KMEANS(A2, "k-means++", "auto", 300, 0.0001, 0)
            C2: =ML.FIT(B2, $A$25:$E$125)
            D2: =ML.INSPECT.INERTIA(C2)
            E2: =ML.PREDICT(C2, $A$25:$E$125)
            F2: =ML.EVAL.CLUSTERING.SILHOUETTE_SCORE($A$25:$E$125, E2#)

Remarks

  • Pass the same feature matrix (X) you trained K-Means on, plus the predicted cluster labels — usually the output of ML.PREDICT against your fitted K-Means model.
  • The score is undefined for n_clusters = 1 and for trivial inputs where every sample is its own cluster — those configurations raise an error.
  • Silhouette scoring is O(n²) in memory and runtime; on very large datasets either subset the rows first or pass a sample_size to score on a random subset (use random_state for reproducibility).
  • Pre-scale your features with ML.PREPROCESSING.STANDARD_SCALER before scoring — silhouette uses Euclidean distance by default, so unequal feature magnitudes will dominate the comparison.

See also