Methodology

How we measure global internet censorship — from raw probe measurements to a single per-country score, with a gradient-boosting classifier in the middle. We publish three honest performance numbers (LOCO 0.91 AUC, stratified 0.98 AUC, time-based 0.50 AUC) so you can pick the one that matches your risk tolerance.

Version: 2.2 (honest splits)Updated: 2026-06-15License: CC BY 4.0JSON

Contents

01.Overview
02.Data Sources
03.Multi-Source Correlation
04.ML Model
05.Scoring System
06.Limitations
07.Confidence Intervals
08.Validation
09.Update Pipeline
10.Citation
11.Data Access
12.Contact

Overview

Composite score from multiple measurement networks, processed through our ML pipeline, updates continuously based on live network measurements.

When a government blocks a new service, our data reflects the change within hours.

Data Sources

OONI Measurements

Samples36,257,345

Coverage130 countries

TestsWeb, Messaging, Circumvention

Sensor Network

Nodes30

Coverage6 continents

Probes (24h)~7,000+

External Sources

IODAInternet outage detection (ASN-level)

CensoredPlanetRemote DNS/HTTP blocking (50 countries)

Citizen LabDomain categorization (14K+ domains)

Probe coverage honesty: Our 30+ Voidly probe nodes are deployed in countries like the US, UK, Germany, Japan, Singapore, India, Brazil, and South Africa — useful for “is this domain reachable from outside” tests but not from inside top-censorship countries (IR, CN, RU, VE, EG, PK, MM, SY, TR, SD, KP, BY, UZ, TM, SA, VN, TH, LB, AZ) where most incidents actually happen. For inside-country measurements we rely on OONI (volunteer-run inside-country probes) and CensoredPlanet (remote DNS/HTTP probing). We're actively recruiting community probe operators in censored regions — see /probes.

Multi-Source Correlation

No single measurement network captures the full picture of internet censorship. OONI provides active probing but has geographic gaps. CensoredPlanet provides remote measurement but lacks ground truth. IODA detects outages but not selective blocking.

Voidly operates its own 30-node network across 6 continents — testing VPN accessibility and censorship patterns every 5 minutes — then correlates these proprietary measurements with three external measurement networks (OONI, CensoredPlanet, IODA) to produce verified incidents with evidence chains. This turns ambiguous network anomalies into structured, citable censorship intelligence.

ML Model

Gradient boosting classifier v3.3 trained on 4,237 labeled samples (1,116 positive across 131 countries) from OONI, CensoredPlanet, IODA, and Voidly probes. LOCO median F1 0.87 (honest). The older v2 — which reported 99.8% F1 / 1.000 ROC AUC — was retired 2026-05-21 after an audit found country_risk_tier leakage was driving 85% of its score. Privacy-preserving training on aggregate data only — no raw user data is used.

Three splits, three honest numbers

We previously cited 0.998 F1 as the headline performance. Our own sentinel accuracy endpoint publishes a warning that this number overstates real-world performance by 47.9pp. We now publish all three splits so you can choose which one matches your risk tolerance:

Honest — LOCO median

AUC 0.91 · F1 0.55

Train on 18 countries, test on the 19th. Median across the 19 holdouts. The strongest figure we can defend.

Floor — Time-based

AUC 0.50 · F1 0.00

Train pre-T, test post-T. Random on novel events because the model hasn't seen enough new-pattern data yet.

Inflated — Stratified

AUC 0.98 · F1 0.79

15% random holdout. Has temporal leakage. Sanity-check only; do not cite for deployment claims.

Live calibration: The forecast was severely miscalibrated through 2026-05-20 (Brier 0.59, MAE 0.60). Predicted 5% risk; actual incident rate in that bucket was 65%. Fixed 2026-05-20 by refitting isotonic regression on 810 live (predicted, observed) pairs from sentinel_outcomes. In-sample Brier dropped to 0.22, calibration MAE to 0.00. Watched-country gate prevents extrapolation to non-censoring countries. See the 90-day drift series → See the reliability diagram → Read the full refit writeup →

Censorship Classifier

AlgorithmGradientBoosting

F1 (LOCO median)0.55

AUC (LOCO median)0.91

F1 (stratified, inflated)0.79

AUC (stratified, inflated)0.98

F1 (time-based, floor)0.00

AUC (time-based, floor)0.50

Training Samples4,237 (1,116 positive) across 131 countries labeled incidents

ScheduleWeekly (Sundays @ 02:00 UTC)

Shutdown Forecast Model (live)

AlgorithmXGBoost + isotonic calibration

AUC (LOCO median)0.91

F1 (LOCO median)0.55

Live recall (prod_rolling)36%

Live Brier score0.59

Live calibration MAE0.60 (poor — see callout above)

Training Samples14.6K historical records

ScheduleWeekly (Sundays @ 02:00 UTC)

Live performance is from /v1/sentinel/accuracy — a public endpoint that auto-evaluates the forecast against actual incidents on a rolling 30-day window. Numbers update every 24h.

The forecast is a current-regime signal, not an onset predictor

An honest re-evaluation found the LOCO and stratified AUCs above are inflated by label autocorrelation. The forecast target target_7day is a 7-day sliding window — 98.9% autocorrelated day-to-day — so a trivial predict-yesterday baseline scores AUC 0.957. On a strict forward-temporal split the production model scores AUC 0.589 on the raw label, and on the rows where a shutdown actually begins (transitions) it scores AUC ~0.33 — below chance. Plain English: the forecast reliably reflects “this country is currently in a censored regime,” but has essentially zero skill at predicting a new shutdown before it happens. Read the full forecast onset-skill finding →

Feature Importance (Classifier v3, 2026-05-20)

Honesty update: v2 had country_risk_tier at 85% importance — a hardcoded label-derived feature that was effectively cheating. v3 drops it. The new top-3 share is ~73% across three genuine signals (interaction terms + measurement volume), no single dominator. Live values from /v1/classifier/feature-importance.

anomaly_rate

22.1%

month

20.2%

measurement_count

17.0%

neighbor_max_anomaly_7d

8.9%

neighbor_incident_count_7d

7.7%

neighbor_block_rate_7d

7.6%

rate_count_interaction

5.6%

Production classifier: v3.3 GradientBoosting — 16 features (13 base + 3 regime-similarity-weighted geographic contagion), 4,237 training samples (1,116 positive), 131 countries. LOCO median F1 0.870, stratified F1 0.729. See the v3 finding writeup for the full story on why v2's 99.8% was inflated.

New model ensemble (2026-05-21)

Multi-horizon forecast — separate 1d/7d/30d XGBoost+isotonic. LOCO AUC 0.91/0.88/0.84. Per-horizon SHAP + 90% conformal intervals + monotonicity check. Live at /v1/forecast/{cc}/multi-horizon. Honest caveat: these horizons use the same target_Nday sliding-window label as v1 and are evaluated on shuffled / LOCO splits, so the AUCs are inflated by label autocorrelation the same way — they are current-regime signals, not onset predictors. See the onset-skill finding.
ACI online conformal (Gibbs & Candès 2021) — replaces static isotonic with online update. Initial state α=0.10 → 0.21, empirical coverage 91.3%. Visible in every /v1/forecast/{cc}/7day response.
CenDTect DBSCAN anomaly — per-country unsupervised rolling 45-day window. AUC 0.6506. Promoted as second-opinion signal. Live at /v1/anomaly/dbscan/{cc}.
Per-domain HDBSCAN drift — weekly cron, surfaces novel blocking patterns at the domain level. Live at /v1/anomaly/domain-drift/leaderboard.
Per-measurement classifier (Niaki KDD23) — row-level XGBoost. AUC 1.0 honest caveat: model reconstructs the labeling rule. Live at POST /v1/measurement/classify.
GNN over AS topology — GraphSAGE 2-layer over 7,060-node CAIDA peering graph. LOOCV AUC 0.80 on 6 tier-1 ASNs. Honest caveat: n=6 is statistically underpowered. Live at /v1/forecast/asn-gnn/{asn}.

Full model history with metrics, training dates, and honest caveats: /atlas/changelog.

Scoring System

0-100 scale. 0 = complete freedom. 100 = total censorship.

0-10

Free

Minimal or no censorship

11-25

Low

Limited content restrictions

26-45

Medium

Significant restrictions on some platforms

46-70

High

Widespread blocking of platforms and news

71-100

Severe

Pervasive censorship / isolated internet

Limitations

⚠Scores are national averages — regional variations not captured
⚠VPN detection underreported in highly restricted environments
⚠Sample sizes vary by country — affects confidence levels
⚠Real-time events may take up to 24h to reflect in scores
⚠Content filtering and throttling harder to detect than blocking
⚠Self-censorship and legal restrictions not measured

Confidence Intervals

Each country score includes a confidence interval reflecting measurement certainty. Wider intervals indicate less data or greater variability.

Country

Score (illustrative)

Interval

Confidence

Note

Country A

66%

± 2%

high

Large sample

Country B

42%

± 4%

high

Country C

31%

± 3%

high

Country D

21%

± 7%

medium

Smaller sample

Scores shown are illustrative examples from a point-in-time snapshot. Live scores update continuously on the Censorship Index.

Validation

Scores are validated against external benchmarks, known censorship events, and three independently-evaluated holdout splits. Continuous live evaluation runs daily and is published at /v1/sentinel/accuracy.

BaselineFreedom House — Freedom on the Net

Correlationr = 0.87 (vs. Freedom House FOTN, self-reported)

Ground TruthKnown events (e.g. Iran shutdowns match score spikes)

Cross ValidationThree splits: stratified / time-based / LOCO

Classifier F1 (LOCO, honest)0.55

Classifier AUC (LOCO, honest)0.91

Classifier F1 (stratified, inflated)0.79 — do not cite as deployment

Classifier AUC (stratified, inflated)0.98 — do not cite as deployment

Classifier F1 (time-based, floor)0.00

Classifier AUC (time-based, floor)0.50

Live Brier (forecast)0.59 (calibration MAE 0.60)

From Voidly's own sentinel endpoint: “Stratified AUC overstates real-world performance by 47.9pp vs. time-based split. Do not cite the stratified number as a deployment figure; use the loco_median or the prod_rolling block once it populates.” We follow that recommendation here. No independent third-party evaluation has been conducted; published tools + data are available for independent replication — see Reproducibility.

Update Pipeline

OONI→Ingestion→Feature Engineering→ML Scoring→Index Update

Classifier RetrainWeekly (Sundays @ 02:00 UTC)

PublicationDaily @ 03:00 UTC

Score Latency~6h (aggregated), ~5min (probes)

Raw IngestionEvery 5min (probes), every 6h (OONI/IODA/CensoredPlanet)

Self-monitoring. Drift detection (KS test on rolling holdout) auto-triggers retrain between scheduled runs. The D8 promote gate (G1–G7 + calibration) blocks any model that fails AUC, recall, or coverage thresholds, and a post-promote watchdog auto-rollbacks if live performance regresses. No human in the publish loop — see /status → Self-operating systems for live counters and source.

Citation

Use this data in research? Please cite:

APA Format

Voidly Research. (2026). Global Censorship Index. https://voidly.ai/censorship-index

BibTeX

@misc{voidly_censorship_index,
  author = {Voidly Research},
  title = {Global Censorship Index},
  year = {2026},
  url = {https://voidly.ai/censorship-index}
}

License: CC BY 4.0 — Free to use with attribution

Data Access

/data/censorship-index.json

Full dataset with schema.org markup

/data/censorship-index.csv

Excel, Sheets, Pandas

/data/methodology

Machine-readable methodology

/api-docs

Detection, prediction, realtime

Contact

Researchresearch@voidly.ai

Partnershipspartnerships@voidly.ai

Generalteam@voidly.ai

View Index Live Alerts

Cite this page

All Voidly Research data is licensed under CC BY 4.0. You can redistribute, remix, and build on it freely with attribution.

Overview

Data Sources

OONI Measurements

Sensor Network

External Sources

Multi-Source Correlation

ML Model

Three splits, three honest numbers

Censorship Classifier

Shutdown Forecast Model (live)

The forecast is a current-regime signal, not an onset predictor

Feature Importance (Classifier v3, 2026-05-20)

Scoring System

Limitations

Confidence Intervals

Validation

Update Pipeline

Citation

APA Format

BibTeX

Data Access

Contact

See also

Cite this page