Skip to content

Backend Analysis

edit file

NOTE This should not be confused with the analysis stage of the OONI data processing pipeline!

Miscellaneous scripts, services and tools. It contains ancillary components that are not updated often and might not justify a dedicated Debian package for each of them.

Deployed using the analysis packageβ€‰πŸ“¦

https://github.com/ooni/backend/blob/0ec9fba0eb9c4c440dcb7456f2aab529561104ae/analysis/

Data flows from various updaters:

Diagram

Ellipses represent data; rectangles represent processes. Purple components belong to the backend. Click on the image and then click on each shape to see related documentation.

See the following subchapters for details:

CitizenLab test list updater

This component fetches the test lists from CitizenLab Test Listβ€‰πŸ’‘ and populates the citizenlab table ⛁ and citizenlab_flip table ⛁.

Loading graph...

The git repository https://github.com/citizenlab/test-lists.git is cloned as an unauthenticated user.

Database writes are performed as the citizenlab user.

The tables have few constraints on the database side: most of the validation is done in the script and it is meant to be strict. The updater overwrites citizenlab_flip table ⛁ and then swaps it with citizenlab table ⛁ atomically. In case of failure during git cloning, verification and table overwrite the final swap does not happen, leaving the citizenlab table unaltered.

It is deployed using the analysis packageβ€‰πŸ“¦ and started by the ooni-update-citizenlab ⏲ Systemd timer.

Logs are generated as the analysis.citizenlab_test_lists_updater unit.

Also it generates the following metrics with the citizenlab_test_lists_updater prefix:

Metric nameTypeDescription
fetch_citizen_lab_liststimerFetch duration
update_citizenlab_tabletimerUpdate duration
citizenlab_test_list_lengaugeTable size

The updater lives in one file: https://github.com/ooni/backend/blob/0ec9fba0eb9c4c440dcb7456f2aab529561104ae/analysis/analysis/citizenlab_test_lists_updater.py

To run the updater manually during development:

PYTHONPATH=analysis ./run_analysis --update-citizenlab --dry-run --stdout

Fingerprint updater

This component fetches measurement fingerprints as CSV files from https://github.com/ooni/blocking-fingerprints and populates fingerprints_dns table ⛁, fingerprints_dns_tmp table ⛁, fingerprints_http table ⛁ and fingerprints_http_tmp table ⛁.

The tables without _tmp are used by the Fastpathβ€‰βš™.

The CSV files are fetched directly without git-cloning.

Database writes are performed as the api user, configured in https://github.com/ooni/backend/blob/0ec9fba0eb9c4c440dcb7456f2aab529561104ae/analysis/analysis/analysis.py#L64

The tables have no constraints on the database side and basic validation is performed by the script: https://github.com/ooni/backend/blob/0ec9fba0eb9c4c440dcb7456f2aab529561104ae/analysis/analysis/fingerprints_updater.py#L91

The updater overwrites the tables ending with _tmp and then swaps them with the β€œreal” tables atomically. In case of failure the final swap does not happen, leaving the β€œreal” tables unaltered.

It is deployed using the analysis packageβ€‰πŸ“¦ and started by the ooni-update-citizenlab ⏲ Systemd timer.

Logs are generated as the analysis.fingerprints_updater unit.

Also it generates the following metrics with the fingerprints_updater prefix:

Metric nameTypeDescription
fetch_csvtimerCSV fetch duration
fingerprints_update_progressgaugeUpdate progress
fingerprints_dns_tmp_lengaugeDNS table size
fingerprints_http_tmp_lengaugeHTTP table size

See the Fingerprint updater dashboardβ€‰πŸ“Š on Grafana.

The updater lives primarily in https://github.com/ooni/backend/blob/0ec9fba0eb9c4c440dcb7456f2aab529561104ae/analysis/analysis/fingerprints_updater.py and it’s called by the analysis.py script

To run the updater manually during development:

PYTHONPATH=analysis ./run_analysis --update-citizenlab --dry-run --stdout

ASN metadata updater

This component fetches ASN metadata from https://archive.org/download/ip2country-as (generated via: https://github.com/ooni/historical-geoip)

It populates the asnmeta table ⛁ and asnmeta_tmp table ⛁.

asnmeta table is used by the private APIβ€‰βš™, see: https://github.com/ooni/backend/blob/0ec9fba0eb9c4c440dcb7456f2aab529561104ae/api/ooniapi/private.py#L923 and https://api.ooni.io/apidocs/#/default/get_api___asnmeta

Database writes are performed as the api user, configured in https://github.com/ooni/backend/blob/0ec9fba0eb9c4c440dcb7456f2aab529561104ae/analysis/analysis/analysis.py#L64

The table has no constraints on the database side and basic validation is performed by the script: https://github.com/ooni/backend/blob/0ec9fba0eb9c4c440dcb7456f2aab529561104ae/analysis/analysis/asnmeta_updater.py#L95

Logs are generated as the analysis.asnmeta_updater unit.

Also it generates the following metrics with the asnmeta_updater prefix:

Metric nameTypeDescription
fetch_datatimerData fetch duration
asnmeta_update_progressgaugeUpdate progress
asnmeta_tmp_lengaugetable size

See the ASN metadata updater dashboardβ€‰πŸ“Š on Grafana.

To run the updater manually during development:

PYTHONPATH=analysis ./run_analysis --update-asnmeta --stdout

GeoIP downloader

Fetches GeoIP databases, installed by the ooni-apiβ€‰βš™. Started by the ooni-download-geoip timer ⏲ on backend-fsn.ooni.org πŸ–₯, see.

Lives at https://github.com/ooni/backend/blob/0ec9fba0eb9c4c440dcb7456f2aab529561104ae/ https://github.com/ooni/backend/blob/0ec9fba0eb9c4c440dcb7456f2aab529561104ae/api/debian/ooni_download_geoip.py

Updates asn.mmdb and cc.mmdb in /var/lib/ooniapi/

Can be monitored with the GeoIP MMDB database dashboardβ€‰πŸ“Š and by running:

sudo journalctl --identifier ooni_download_geoip

Database backup tool

The backup tool is a service that regularly backs up ClickHouseβ€‰βš™ tables to S3. It also exports tables in CSV.zstd format for public consumption.

Contrarily to similar tools, it is designed to:

  • extract data in chunks and upload it without creating temporary files

  • without requiring transaction support in the database (not available in ClickHouse)

  • without requiring transactional filesystems or interrupting the database workload

It is configured by Ansibleβ€‰πŸ”§ using the /etc/ooni/db-backup.conf file. Runs as a SystemD service, see ooni-db-backup timer ⏲

It compresses data using https://facebook.github.io/zstd/ during the upload.

The tool chunks tables as needed and add sleeps as needed to prevent a query backlog impacting the database performance.

Logs are generated as the ooni-db-backup unit.

Also it generates the following metrics with the db-backup prefix:

Metric nameTypeDescription
upload_to_s3timerData upload duration
run_exporttimerData export duration
table_{tblname}_backup_time_mstimerTable backup time

See the Database backup dashboardβ€‰πŸ“Š on Grafana and Metrics listβ€‰πŸ’‘ for application metrics.

Monitor with:

sudo journalctl -f --identifier ooni-db-backup

Future improvements:

Ancillary modules

analysis/analysis/analysis.py is the main analysis script and acts as a wrapper to other components.

analysis/analysis/metrics.py is a tiny wrapper for the Statsd Python library.

Social media blocking event detector

Blocking event detector currently under development. Documented in https://docs.google.com/document/d/1WQ6_ybnPbO_W6Tq-xKuHQslG1dSPr4jUbZ3jQLaMqdw/edit

Deployed by the detector packageβ€‰πŸ“¦.

See Monitor blocking event detections notebookβ€‰πŸ“” Event detector dashboard Detector timer ⏲