Skip to content

ML.DATASETS.MAKE_CIRCLES

Generates a synthetic dataset of two concentric circles — a classic non-linearly-separable shape.

Syntax

ML.DATASETS.MAKE_CIRCLES(n_samples, noise, factor, random_state)

Arguments

Name Type Default Description
n_samples int 100 Total number of points generated. Split evenly between the inner and outer circles.
noise float None Standard deviation of Gaussian noise added to each point. Use 0 for clean circles.
factor float 0.8 Scale factor between the inner and outer circle (0 < factor < 1). Smaller = inner circle further from outer.
random_state int None Random seed for reproducible output.

Returns

Dataset

A Dataset (DataFrame) with columns [x_0, x_1, target], where target is 0 for the outer circle and 1 for the inner.

When to use

Use ML.DATASETS.MAKE_CIRCLES to generate a synthetic 2-D dataset of two concentric circles — the textbook shape that linear classifiers and linear PCA cannot separate, and where a Kernel PCA with the right kernel dramatically can.

A common use is to demonstrate when the kernel trick matters: side by side, plot the raw circles, the projection produced by linear PCA, and the projection produced by ML.DIM_REDUCTION.KERNEL_PCA(kernel="rbf"). Only the third visibly separates the two classes.

Examples

Generate 150 noisy circles with the inner circle at 30% of the outer radius and a fixed seed:

=ML.DATASETS.MAKE_CIRCLES(150, 0.05, 0.3, 0)

Cell B4 now holds a [Database Icon] Dataset reference. Preview the first few rows with ML.DATA.SAMPLE(B4, 5). The columns are x_0, x_1, and target (0 or 1).

Remarks

  • n_samples is the total number of points; sklearn splits it evenly between the inner and outer circle (75 and 75 when you ask for 150).
  • noise is the standard deviation of Gaussian noise added to each point. Pass 0 (or leave blank) for clean, noise-free circles.
  • factor is the ratio of the inner circle's radius to the outer circle's, strictly between 0 and 1. Smaller factor means the inner circle is much smaller (and the two classes are more clearly separated by radius); larger factor makes the rings closer together.
  • random_state controls reproducibility. Pin it to an integer if you need the same dataset every run.
  • The dataset is shuffled by sklearn's default — points alternate between classes in cell order. Use ML.DATA.QUERY with an ORDER BY target clause to group rows by class for plotting.

See also