ML.DATASETS.MAKE_CIRCLES¶
Generates a synthetic dataset of two concentric circles — a classic non-linearly-separable shape.
Syntax¶
Arguments¶
| Name | Type | Default | Description |
|---|---|---|---|
| n_samples | int | 100 | Total number of points generated. Split evenly between the inner and outer circles. |
| noise | float | None | Standard deviation of Gaussian noise added to each point. Use 0 for clean circles. |
| factor | float | 0.8 | Scale factor between the inner and outer circle (0 < factor < 1). Smaller = inner circle further from outer. |
| random_state | int | None | Random seed for reproducible output. |
Returns¶
Dataset
A Dataset (DataFrame) with columns [x_0, x_1, target], where target is 0 for the outer circle and 1 for the inner.
When to use¶
Use ML.DATASETS.MAKE_CIRCLES to generate a synthetic 2-D dataset of two
concentric circles — the textbook shape that linear classifiers and linear
PCA cannot separate, and where a Kernel PCA with the right kernel
dramatically can.
A common use is to demonstrate when the kernel trick matters: side by
side, plot the raw circles, the projection produced by linear PCA, and the
projection produced by ML.DIM_REDUCTION.KERNEL_PCA(kernel="rbf"). Only
the third visibly separates the two classes.
Examples¶
Generate 150 noisy circles with the inner circle at 30% of the outer radius and a fixed seed:
Cell B4 now holds a [Database Icon] Dataset reference. Preview the first
few rows with ML.DATA.SAMPLE(B4, 5). The columns are x_0, x_1, and
target (0 or 1).
Remarks¶
n_samplesis the total number of points; sklearn splits it evenly between the inner and outer circle (75 and 75 when you ask for 150).noiseis the standard deviation of Gaussian noise added to each point. Pass 0 (or leave blank) for clean, noise-free circles.factoris the ratio of the inner circle's radius to the outer circle's, strictly between 0 and 1. Smallerfactormeans the inner circle is much smaller (and the two classes are more clearly separated by radius); largerfactormakes the rings closer together.random_statecontrols reproducibility. Pin it to an integer if you need the same dataset every run.- The dataset is shuffled by sklearn's default — points alternate between
classes in cell order. Use
ML.DATA.QUERYwith anORDER BY targetclause to group rows by class for plotting.