Methodology
How we measure global internet censorship — from raw probe measurements to a single per-country score, with a gradient-boosting classifier in the middle. We publish three honest performance numbers (LOCO 0.91 AUC, stratified 0.98 AUC, time-based 0.50 AUC) so you can pick the one that matches your risk tolerance.
Contents
Overview
Composite score from multiple measurement networks, processed through our ML pipeline, updates continuously based on live network measurements.
When a government blocks a new service, our data reflects the change within hours.
Data Sources
OONI Measurements
Sensor Network
External Sources
Multi-Source Correlation
No single measurement network captures the full picture of internet censorship. OONI provides active probing but has geographic gaps. CensoredPlanet provides remote measurement but lacks ground truth. IODA detects outages but not selective blocking.
Voidly operates its own 30-node network across 6 continents — testing VPN accessibility and censorship patterns every 5 minutes — then correlates these proprietary measurements with three external measurement networks (OONI, CensoredPlanet, IODA) to produce verified incidents with evidence chains. This turns ambiguous network anomalies into structured, citable censorship intelligence.
ML Model
Gradient boosting classifier v3.3 trained on 4,237 labeled samples (1,116 positive across 131 countries) from OONI, CensoredPlanet, IODA, and Voidly probes. LOCO median F1 0.87 (honest). The older v2 — which reported 99.8% F1 / 1.000 ROC AUC — was retired 2026-05-21 after an audit found country_risk_tier leakage was driving 85% of its score. Privacy-preserving training on aggregate data only — no raw user data is used.
Three splits, three honest numbers
We previously cited 0.998 F1 as the headline performance. Our own sentinel accuracy endpoint publishes a warning that this number overstates real-world performance by 47.9pp. We now publish all three splits so you can choose which one matches your risk tolerance:
Live calibration: The forecast was severely miscalibrated through 2026-05-20 (Brier 0.59, MAE 0.60). Predicted 5% risk; actual incident rate in that bucket was 65%. Fixed 2026-05-20 by refitting isotonic regression on 810 live (predicted, observed) pairs from sentinel_outcomes. In-sample Brier dropped to 0.22, calibration MAE to 0.00. Watched-country gate prevents extrapolation to non-censoring countries. See the 90-day drift series → See the reliability diagram → Read the full refit writeup →
Censorship Classifier
Shutdown Forecast Model (live)
Live performance is from /v1/sentinel/accuracy — a public endpoint that auto-evaluates the forecast against actual incidents on a rolling 30-day window. Numbers update every 24h.
The forecast is a current-regime signal, not an onset predictor
An honest re-evaluation found the LOCO and stratified AUCs above are inflated by label autocorrelation. The forecast target target_7day is a 7-day sliding window — 98.9% autocorrelated day-to-day — so a trivial predict-yesterday baseline scores AUC 0.957. On a strict forward-temporal split the production model scores AUC 0.589 on the raw label, and on the rows where a shutdown actually begins (transitions) it scores AUC ~0.33 — below chance. Plain English: the forecast reliably reflects “this country is currently in a censored regime,” but has essentially zero skill at predicting a new shutdown before it happens. Read the full forecast onset-skill finding →
Feature Importance (Classifier v3, 2026-05-20)
country_risk_tier at 85% importance — a hardcoded label-derived feature that was effectively cheating. v3 drops it. The new top-3 share is ~73% across three genuine signals (interaction terms + measurement volume), no single dominator. Live values from /v1/classifier/feature-importance.Production classifier: v3.3 GradientBoosting — 16 features (13 base + 3 regime-similarity-weighted geographic contagion), 4,237 training samples (1,116 positive), 131 countries. LOCO median F1 0.870, stratified F1 0.729. See the v3 finding writeup for the full story on why v2's 99.8% was inflated.
- Multi-horizon forecast — separate 1d/7d/30d XGBoost+isotonic. LOCO AUC 0.91/0.88/0.84. Per-horizon SHAP + 90% conformal intervals + monotonicity check. Live at
/v1/forecast/{cc}/multi-horizon. Honest caveat: these horizons use the sametarget_Ndaysliding-window label as v1 and are evaluated on shuffled / LOCO splits, so the AUCs are inflated by label autocorrelation the same way — they are current-regime signals, not onset predictors. See the onset-skill finding. - ACI online conformal (Gibbs & Candès 2021) — replaces static isotonic with online update. Initial state α=0.10 → 0.21, empirical coverage 91.3%. Visible in every
/v1/forecast/{cc}/7dayresponse. - CenDTect DBSCAN anomaly — per-country unsupervised rolling 45-day window. AUC 0.6506. Promoted as second-opinion signal. Live at
/v1/anomaly/dbscan/{cc}. - Per-domain HDBSCAN drift — weekly cron, surfaces novel blocking patterns at the domain level. Live at
/v1/anomaly/domain-drift/leaderboard. - Per-measurement classifier (Niaki KDD23) — row-level XGBoost. AUC 1.0 honest caveat: model reconstructs the labeling rule. Live at
POST /v1/measurement/classify. - GNN over AS topology — GraphSAGE 2-layer over 7,060-node CAIDA peering graph. LOOCV AUC 0.80 on 6 tier-1 ASNs. Honest caveat: n=6 is statistically underpowered. Live at
/v1/forecast/asn-gnn/{asn}.
Full model history with metrics, training dates, and honest caveats: /atlas/changelog.
Scoring System
0-100 scale. 0 = complete freedom. 100 = total censorship.
Limitations
- ⚠Scores are national averages — regional variations not captured
- ⚠VPN detection underreported in highly restricted environments
- ⚠Sample sizes vary by country — affects confidence levels
- ⚠Real-time events may take up to 24h to reflect in scores
- ⚠Content filtering and throttling harder to detect than blocking
- ⚠Self-censorship and legal restrictions not measured
Confidence Intervals
Each country score includes a confidence interval reflecting measurement certainty. Wider intervals indicate less data or greater variability.
Scores shown are illustrative examples from a point-in-time snapshot. Live scores update continuously on the Censorship Index.
Validation
Scores are validated against external benchmarks, known censorship events, and three independently-evaluated holdout splits. Continuous live evaluation runs daily and is published at /v1/sentinel/accuracy.
From Voidly's own sentinel endpoint: “Stratified AUC overstates real-world performance by 47.9pp vs. time-based split. Do not cite the stratified number as a deployment figure; use the loco_median or the prod_rolling block once it populates.” We follow that recommendation here. No independent third-party evaluation has been conducted; published tools + data are available for independent replication — see Reproducibility.
Update Pipeline
Citation
Use this data in research? Please cite:
APA Format
Voidly Research. (2026). Global Censorship Index. https://voidly.ai/censorship-indexBibTeX
@misc{voidly_censorship_index,
author = {Voidly Research},
title = {Global Censorship Index},
year = {2026},
url = {https://voidly.ai/censorship-index}
}License: CC BY 4.0 — Free to use with attribution
Data Access
Contact
See also
- Censorship IndexPer-country score, blocked services, and risk tier — every country we track.
- FindingsAuto-generated, citable signals mined from the incident database every 6 hours.
- Reports & DigestsLive dashboard of recent activity: incidents, top countries, evidence sources.
- Open DataFree downloadable datasets, public APIs, and SDKs (CC BY 4.0).
- RSS / Atom / JSON FeedsSubscribe to incidents, findings, and AI-service status — push-friendly streams.
Cite this page
All Voidly Research data is licensed under CC BY 4.0. You can redistribute, remix, and build on it freely with attribution.