ML.EVAL.CLUSTERING.SILHOUETTE_SCORE¶

Returns the mean silhouette coefficient over all samples.

Syntax¶

ML.EVAL.CLUSTERING.SILHOUETTE_SCORE(X, labels, metric, sample_size, random_state)

Arguments¶

Name	Type	Default	Description
X	object		DataFrame or 2-D array object of the data that was clustered.
labels	object		DataFrame or array object of predicted cluster labels, one per row of X.
metric	Any	"euclidean"	Distance metric used to compute pairwise distances (e.g. 'euclidean', 'manhattan', 'cosine').
sample_size	Any	None	If set, compute the silhouette on a random subsample of this many rows instead of the full dataset.
random_state	Any	None	Random seed for sample_size selection. Use a fixed integer for reproducible results.

Returns¶

A single number between -1 and +1 — higher means better-separated clusters.

When to use¶

Use ML.EVAL.CLUSTERING.SILHOUETTE_SCORE to gauge how cleanly your clusters separate. For each sample it compares its average distance to other points in its own cluster against its average distance to the nearest other cluster, and averages the result across all samples.

The score lives in [-1, 1]:

+1 — clusters are tight and well-separated.
0 — clusters overlap; samples sit on or near a boundary.
-1 — many samples are likely assigned to the wrong cluster.

It's especially handy alongside ML.INSPECT.INERTIA when picking n_clusters: inertia always shrinks as k grows, but the silhouette score peaks at a "natural" number of clusters and declines on either side.

Examples¶

Score a fitted K-Means model's predicted labels on the data in A2:E101:

=ML.CLUSTERING.KMEANS(3, "k-means++", "auto", 300, 0.0001, 0)
=ML.FIT(H1, A2:E101)
=ML.PREDICT(H2, A2:E101)
=ML.EVAL.CLUSTERING.SILHOUETTE_SCORE(A2:E101, H3#)

Score multiple k values to find the most natural cluster count, lining the results up next to the inertia for the elbow chart:

A2: 2     B2: =ML.CLUSTERING.KMEANS(A2, "k-means++", "auto", 300, 0.0001, 0)
            C2: =ML.FIT(B2, $A$25:$E$125)
            D2: =ML.INSPECT.INERTIA(C2)
            E2: =ML.PREDICT(C2, $A$25:$E$125)
            F2: =ML.EVAL.CLUSTERING.SILHOUETTE_SCORE($A$25:$E$125, E2#)

Remarks¶

Pass the same feature matrix (X) you trained K-Means on, plus the predicted cluster labels — usually the output of ML.PREDICT against your fitted K-Means model.
The score is undefined for n_clusters = 1 and for trivial inputs where every sample is its own cluster — those configurations raise an error.
Silhouette scoring is O(n²) in memory and runtime; on very large datasets either subset the rows first or pass a sample_size to score on a random subset (use random_state for reproducibility).
Pre-scale your features with ML.PREPROCESSING.STANDARD_SCALER before scoring — silhouette uses Euclidean distance by default, so unequal feature magnitudes will dominate the comparison.