IR Sensor Calibration — K-Means Clustering
IR sensor calibration uses a K-Means clustering approach to automatically distinguish black from white surfaces. This technique is based on the research paper Applied Machine Learning in Sensor Calibration — A Clustering Technique by Abigail Liu, Aaron Xie, and Oliver Jiang (Los Altos Community Team 0399, GCER 2025).
The Problem
IR sensors return raw analog values that vary between sensors, surfaces, and environmental conditions. To make decisions like “am I on a black line?”, the system needs a threshold separating black readings from white readings.
Traditional approaches — such as taking a fixed percentile of the data — are vulnerable to skewed samples. If the robot spends most of its calibration drive on white surface with only a brief pass over black, a percentile-based threshold can land in the wrong place. The paper demonstrates that percentile methods achieve only 92–98% accuracy and are susceptible to false positives in skewed data.
The Solution: K-Means Clustering (k=2)
Instead of relying on percentiles, the calibration system uses K-Means clustering with k=2 to separate sensor readings into two natural groups: one for white and one for black.
Sampling: During calibration, the robot drives across the game surface while IR sensors are sampled at 10 ms intervals (100 Hz). Each sensor accumulates a list of raw analog readings as the robot passes over both white and black areas.
Clustering: The collected samples are fed into a 1D K-Means algorithm:
- Initialize two centroids at the minimum and maximum of the data
- Assign each data point to its nearest centroid
- Recompute each centroid as the mean of its assigned points
- Repeat for up to 10 iterations (convergence is typically reached within 5, since the data is semi-sorted from the WHITE-BLACK-WHITE driving pattern)
- Return the two centroids in ascending order — the lower one becomes the white threshold, the higher one the black threshold
Samples: [180, 195, 210, 185, 2800, 3100, 2950, 190, 205, ...]
└── white cluster ──┘ └── black cluster ──┘ └── white ──┘
K-Means centroids: white = 193.5, black = 2950.0
Why Clustering Works Better
The paper compares three calibration algorithms:
| Algorithm | Approach | Success Rate | Handles Skewed Data? |
|---|---|---|---|
| 90th percentile | Use 90th percentile as BLACK threshold | 92% | No |
| Median of 80% range | Average 10th/90th percentile medians | 98% | No |
| K-Means clustering | Cluster into two groups, threshold at midpoint | 100% | Yes |
The key advantage is robustness to skewed data distributions. If the robot’s calibration drive crosses a black line only briefly, 90% of the samples may be white. Percentile methods get confused — they might place the “black” threshold at a white reading. K-Means correctly identifies even a small cluster of black readings and separates it from the white cluster.
Validation
After clustering, the calibration is validated before being accepted:
- Minimum range check: The overall spread of readings must exceed 500 units. If all readings are similar, the sensor likely didn’t see both surfaces.
- Minimum separation check: The two centroids must be at least 700 units apart and at least 25% of the total data range. This prevents accepting calibrations where the clusters aren’t meaningfully distinct.
If validation fails, the BotUI shows a warning and lets you retry.
Soft Classification
After calibration, the IR sensor doesn’t just return “black” or “white” — it also provides a probability via linear interpolation between the two thresholds:
probabilityOfBlack:
value <= white_threshold → 0.0
value >= black_threshold → 1.0
otherwise → (value - white) / (black - white)
This enables more nuanced line-following behavior (e.g., proportional control) rather than binary on/off decisions. See Line Following for how the PID controller uses these probability values.