voidly

Methodology

How we measure global internet censorship — from raw probe measurements to a single per-country score, with a gradient-boosting classifier in the middle. We publish three honest performance numbers (LOCO 0.91 AUC, stratified 0.98 AUC, time-based 0.50 AUC) so you can pick the one that matches your risk tolerance.

Version: 2.2 (honest splits)Updated: 2026-06-15License: CC BY 4.0JSON
Contents
  1. 01.Overview
  2. 02.Data Sources
  3. 03.Multi-Source Correlation
  4. 04.ML Model
  5. 05.Scoring System
  6. 06.Limitations
  7. 07.Confidence Intervals
  8. 08.Validation
  9. 09.Update Pipeline
  10. 10.Citation
  11. 11.Data Access
  12. 12.Contact

Overview

Composite score from multiple measurement networks, processed through our ML pipeline, updates continuously based on live network measurements.

When a government blocks a new service, our data reflects the change within hours.

Data Sources

OONI Measurements

Samples36,257,345
Coverage130 countries
TestsWeb, Messaging, Circumvention

Sensor Network

Nodes30
Coverage6 continents
Probes (24h)~7,000+

External Sources

IODAInternet outage detection (ASN-level)
CensoredPlanetRemote DNS/HTTP blocking (50 countries)
Citizen LabDomain categorization (14K+ domains)
Probe coverage honesty: Our 30+ Voidly probe nodes are deployed in countries like the US, UK, Germany, Japan, Singapore, India, Brazil, and South Africa — useful for “is this domain reachable from outside” tests but not from inside top-censorship countries (IR, CN, RU, VE, EG, PK, MM, SY, TR, SD, KP, BY, UZ, TM, SA, VN, TH, LB, AZ) where most incidents actually happen. For inside-country measurements we rely on OONI (volunteer-run inside-country probes) and CensoredPlanet (remote DNS/HTTP probing). We're actively recruiting community probe operators in censored regions — see /probes.

Multi-Source Correlation

No single measurement network captures the full picture of internet censorship. OONI provides active probing but has geographic gaps. CensoredPlanet provides remote measurement but lacks ground truth. IODA detects outages but not selective blocking.

Voidly operates its own 30-node network across 6 continents — testing VPN accessibility and censorship patterns every 5 minutes — then correlates these proprietary measurements with three external measurement networks (OONI, CensoredPlanet, IODA) to produce verified incidents with evidence chains. This turns ambiguous network anomalies into structured, citable censorship intelligence.

ML Model

Gradient boosting classifier v3.3 trained on 4,237 labeled samples (1,116 positive across 131 countries) from OONI, CensoredPlanet, IODA, and Voidly probes. LOCO median F1 0.87 (honest). The older v2 — which reported 99.8% F1 / 1.000 ROC AUC — was retired 2026-05-21 after an audit found country_risk_tier leakage was driving 85% of its score. Privacy-preserving training on aggregate data only — no raw user data is used.

Three splits, three honest numbers

We previously cited 0.998 F1 as the headline performance. Our own sentinel accuracy endpoint publishes a warning that this number overstates real-world performance by 47.9pp. We now publish all three splits so you can choose which one matches your risk tolerance:

Honest — LOCO median
AUC 0.91 · F1 0.55
Train on 18 countries, test on the 19th. Median across the 19 holdouts. The strongest figure we can defend.
Floor — Time-based
AUC 0.50 · F1 0.00
Train pre-T, test post-T. Random on novel events because the model hasn't seen enough new-pattern data yet.
Inflated — Stratified
AUC 0.98 · F1 0.79
15% random holdout. Has temporal leakage. Sanity-check only; do not cite for deployment claims.

Live calibration: The forecast was severely miscalibrated through 2026-05-20 (Brier 0.59, MAE 0.60). Predicted 5% risk; actual incident rate in that bucket was 65%. Fixed 2026-05-20 by refitting isotonic regression on 810 live (predicted, observed) pairs from sentinel_outcomes. In-sample Brier dropped to 0.22, calibration MAE to 0.00. Watched-country gate prevents extrapolation to non-censoring countries. See the 90-day drift series → See the reliability diagram → Read the full refit writeup →

Censorship Classifier

AlgorithmGradientBoosting
F1 (LOCO median)0.55
AUC (LOCO median)0.91
F1 (stratified, inflated)0.79
AUC (stratified, inflated)0.98
F1 (time-based, floor)0.00
AUC (time-based, floor)0.50
Training Samples4,237 (1,116 positive) across 131 countries labeled incidents
ScheduleWeekly (Sundays @ 02:00 UTC)

Shutdown Forecast Model (live)

AlgorithmXGBoost + isotonic calibration
AUC (LOCO median)0.91
F1 (LOCO median)0.55
Live recall (prod_rolling)36%
Live Brier score0.59
Live calibration MAE0.60 (poor — see callout above)
Training Samples14.6K historical records
ScheduleWeekly (Sundays @ 02:00 UTC)

Live performance is from /v1/sentinel/accuracy — a public endpoint that auto-evaluates the forecast against actual incidents on a rolling 30-day window. Numbers update every 24h.

The forecast is a current-regime signal, not an onset predictor

An honest re-evaluation found the LOCO and stratified AUCs above are inflated by label autocorrelation. The forecast target target_7day is a 7-day sliding window — 98.9% autocorrelated day-to-day — so a trivial predict-yesterday baseline scores AUC 0.957. On a strict forward-temporal split the production model scores AUC 0.589 on the raw label, and on the rows where a shutdown actually begins (transitions) it scores AUC ~0.33 — below chance. Plain English: the forecast reliably reflects “this country is currently in a censored regime,” but has essentially zero skill at predicting a new shutdown before it happens. Read the full forecast onset-skill finding →

Feature Importance (Classifier v3, 2026-05-20)

Honesty update: v2 had country_risk_tier at 85% importance — a hardcoded label-derived feature that was effectively cheating. v3 drops it. The new top-3 share is ~73% across three genuine signals (interaction terms + measurement volume), no single dominator. Live values from /v1/classifier/feature-importance.
anomaly_rate
22.1%
month
20.2%
measurement_count
17.0%
neighbor_max_anomaly_7d
8.9%
neighbor_incident_count_7d
7.7%
neighbor_block_rate_7d
7.6%
rate_count_interaction
5.6%

Production classifier: v3.3 GradientBoosting — 16 features (13 base + 3 regime-similarity-weighted geographic contagion), 4,237 training samples (1,116 positive), 131 countries. LOCO median F1 0.870, stratified F1 0.729. See the v3 finding writeup for the full story on why v2's 99.8% was inflated.

New model ensemble (2026-05-21)
  • Multi-horizon forecast — separate 1d/7d/30d XGBoost+isotonic. LOCO AUC 0.91/0.88/0.84. Per-horizon SHAP + 90% conformal intervals + monotonicity check. Live at /v1/forecast/{cc}/multi-horizon. Honest caveat: these horizons use the same target_Nday sliding-window label as v1 and are evaluated on shuffled / LOCO splits, so the AUCs are inflated by label autocorrelation the same way — they are current-regime signals, not onset predictors. See the onset-skill finding.
  • ACI online conformal (Gibbs & Candès 2021) — replaces static isotonic with online update. Initial state α=0.10 → 0.21, empirical coverage 91.3%. Visible in every /v1/forecast/{cc}/7day response.
  • CenDTect DBSCAN anomaly — per-country unsupervised rolling 45-day window. AUC 0.6506. Promoted as second-opinion signal. Live at /v1/anomaly/dbscan/{cc}.
  • Per-domain HDBSCAN drift — weekly cron, surfaces novel blocking patterns at the domain level. Live at /v1/anomaly/domain-drift/leaderboard.
  • Per-measurement classifier (Niaki KDD23) — row-level XGBoost. AUC 1.0 honest caveat: model reconstructs the labeling rule. Live at POST /v1/measurement/classify.
  • GNN over AS topology — GraphSAGE 2-layer over 7,060-node CAIDA peering graph. LOOCV AUC 0.80 on 6 tier-1 ASNs. Honest caveat: n=6 is statistically underpowered. Live at /v1/forecast/asn-gnn/{asn}.

Full model history with metrics, training dates, and honest caveats: /atlas/changelog.

Scoring System

0-100 scale. 0 = complete freedom. 100 = total censorship.

0-10
Free
Minimal or no censorship
11-25
Low
Limited content restrictions
26-45
Medium
Significant restrictions on some platforms
46-70
High
Widespread blocking of platforms and news
71-100
Severe
Pervasive censorship / isolated internet

Limitations

  • Scores are national averages — regional variations not captured
  • VPN detection underreported in highly restricted environments
  • Sample sizes vary by country — affects confidence levels
  • Real-time events may take up to 24h to reflect in scores
  • Content filtering and throttling harder to detect than blocking
  • Self-censorship and legal restrictions not measured

Confidence Intervals

Each country score includes a confidence interval reflecting measurement certainty. Wider intervals indicate less data or greater variability.

Country
Score (illustrative)
Interval
Confidence
Note
Country A
66%
± 2%
high
Large sample
Country B
42%
± 4%
high
Country C
31%
± 3%
high
Country D
21%
± 7%
medium
Smaller sample

Scores shown are illustrative examples from a point-in-time snapshot. Live scores update continuously on the Censorship Index.

Validation

Scores are validated against external benchmarks, known censorship events, and three independently-evaluated holdout splits. Continuous live evaluation runs daily and is published at /v1/sentinel/accuracy.

BaselineFreedom House — Freedom on the Net
Correlationr = 0.87 (vs. Freedom House FOTN, self-reported)
Ground TruthKnown events (e.g. Iran shutdowns match score spikes)
Cross ValidationThree splits: stratified / time-based / LOCO
Classifier F1 (LOCO, honest)0.55
Classifier AUC (LOCO, honest)0.91
Classifier F1 (stratified, inflated)0.79 — do not cite as deployment
Classifier AUC (stratified, inflated)0.98 — do not cite as deployment
Classifier F1 (time-based, floor)0.00
Classifier AUC (time-based, floor)0.50
Live Brier (forecast)0.59 (calibration MAE 0.60)

From Voidly's own sentinel endpoint: “Stratified AUC overstates real-world performance by 47.9pp vs. time-based split. Do not cite the stratified number as a deployment figure; use the loco_median or the prod_rolling block once it populates.” We follow that recommendation here. No independent third-party evaluation has been conducted; published tools + data are available for independent replication — see Reproducibility.

Update Pipeline

OONIIngestionFeature EngineeringML ScoringIndex Update
Classifier RetrainWeekly (Sundays @ 02:00 UTC)
PublicationDaily @ 03:00 UTC
Score Latency~6h (aggregated), ~5min (probes)
Raw IngestionEvery 5min (probes), every 6h (OONI/IODA/CensoredPlanet)
Self-monitoring. Drift detection (KS test on rolling holdout) auto-triggers retrain between scheduled runs. The D8 promote gate (G1–G7 + calibration) blocks any model that fails AUC, recall, or coverage thresholds, and a post-promote watchdog auto-rollbacks if live performance regresses. No human in the publish loop — see /status → Self-operating systems for live counters and source.

Citation

Use this data in research? Please cite:

APA Format

Voidly Research. (2026). Global Censorship Index. https://voidly.ai/censorship-index

BibTeX

@misc{voidly_censorship_index,
  author = {Voidly Research},
  title = {Global Censorship Index},
  year = {2026},
  url = {https://voidly.ai/censorship-index}
}

License: CC BY 4.0 — Free to use with attribution

Data Access

Contact

Researchresearch@voidly.ai
Partnershipspartnerships@voidly.ai
Generalteam@voidly.ai