Backend Analysis
NOTE This should not be confused with the analysis stage of the OONI data processing pipeline!
Miscellaneous scripts, services and tools. It contains ancillary components that are not updated often and might not justify a dedicated Debian package for each of them.
Deployed using the analysis packageβπ¦
https://github.com/ooni/backend/blob/0ec9fba0eb9c4c440dcb7456f2aab529561104ae/analysis/
Data flows from various updaters:
Ellipses represent data; rectangles represent processes. Purple components belong to the backend. Click on the image and then click on each shape to see related documentation.
See the following subchapters for details:
CitizenLab test list updater
This component fetches the test lists from CitizenLab Test Listβπ‘ and populates the citizenlab tableββ and citizenlab_flip tableββ.
Loading graph...
The git repository https://github.com/citizenlab/test-lists.git is cloned as an unauthenticated user.
Database writes are performed as the citizenlab
user.
The tables have few constraints on the database side: most of the
validation is done in the script and it is meant to be strict. The
updater overwrites citizenlab_flip tableββ and
then swaps it with citizenlab tableββ atomically. In
case of failure during git cloning, verification and table overwrite the
final swap does not happen, leaving the citizenlab
table unaltered.
It is deployed using the analysis packageβπ¦ and started by the ooni-update-citizenlabββ² Systemd timer.
Logs are generated as the analysis.citizenlab_test_lists_updater
unit.
Also it generates the following metrics with the
citizenlab_test_lists_updater
prefix:
Metric name | Type | Description |
---|---|---|
fetch_citizen_lab_lists | timer | Fetch duration |
update_citizenlab_table | timer | Update duration |
citizenlab_test_list_len | gauge | Table size |
The updater lives in one file: https://github.com/ooni/backend/blob/0ec9fba0eb9c4c440dcb7456f2aab529561104ae/analysis/analysis/citizenlab_test_lists_updater.py
To run the updater manually during development:
Fingerprint updater
This component fetches measurement fingerprints as CSV files from https://github.com/ooni/blocking-fingerprints and populates fingerprints_dns tableββ, fingerprints_dns_tmp tableββ, fingerprints_http tableββ and fingerprints_http_tmp tableββ.
The tables without _tmp
are used by the Fastpathββ.
The CSV files are fetched directly without git-cloning.
Database writes are performed as the api
user, configured in
https://github.com/ooni/backend/blob/0ec9fba0eb9c4c440dcb7456f2aab529561104ae/analysis/analysis/analysis.py#L64
The tables have no constraints on the database side and basic validation is performed by the script: https://github.com/ooni/backend/blob/0ec9fba0eb9c4c440dcb7456f2aab529561104ae/analysis/analysis/fingerprints_updater.py#L91
The updater overwrites the tables ending with _tmp
and then swaps them
with the βrealβ tables atomically. In case of failure the final swap
does not happen, leaving the βrealβ tables unaltered.
It is deployed using the analysis packageβπ¦ and started by the ooni-update-citizenlabββ² Systemd timer.
Logs are generated as the analysis.fingerprints_updater
unit.
Also it generates the following metrics with the fingerprints_updater
prefix:
Metric name | Type | Description |
---|---|---|
fetch_csv | timer | CSV fetch duration |
fingerprints_update_progress | gauge | Update progress |
fingerprints_dns_tmp_len | gauge | DNS table size |
fingerprints_http_tmp_len | gauge | HTTP table size |
See the Fingerprint updater dashboardβπ on Grafana.
The updater lives primarily in
https://github.com/ooni/backend/blob/0ec9fba0eb9c4c440dcb7456f2aab529561104ae/analysis/analysis/fingerprints_updater.py
and itβs called by the analysis.py
script
To run the updater manually during development:
ASN metadata updater
This component fetches ASN metadata from https://archive.org/download/ip2country-as (generated via: https://github.com/ooni/historical-geoip)
It populates the asnmeta tableββ and asnmeta_tmp tableββ.
asnmeta table is used by the private APIββ, see: https://github.com/ooni/backend/blob/0ec9fba0eb9c4c440dcb7456f2aab529561104ae/api/ooniapi/private.py#L923 and https://api.ooni.io/apidocs/#/default/get_api___asnmeta
Database writes are performed as the api
user, configured in
https://github.com/ooni/backend/blob/0ec9fba0eb9c4c440dcb7456f2aab529561104ae/analysis/analysis/analysis.py#L64
The table has no constraints on the database side and basic validation is performed by the script: https://github.com/ooni/backend/blob/0ec9fba0eb9c4c440dcb7456f2aab529561104ae/analysis/analysis/asnmeta_updater.py#L95
Logs are generated as the analysis.asnmeta_updater
unit.
Also it generates the following metrics with the asnmeta_updater
prefix:
Metric name | Type | Description |
---|---|---|
fetch_data | timer | Data fetch duration |
asnmeta_update_progress | gauge | Update progress |
asnmeta_tmp_len | gauge | table size |
See the ASN metadata updater dashboardβπ on Grafana.
To run the updater manually during development:
GeoIP downloader
Fetches GeoIP databases, installed by the ooni-apiββ. Started by the ooni-download-geoip timerββ² on backend-fsn.ooni.orgβπ₯, see.
Lives at https://github.com/ooni/backend/blob/0ec9fba0eb9c4c440dcb7456f2aab529561104ae/ https://github.com/ooni/backend/blob/0ec9fba0eb9c4c440dcb7456f2aab529561104ae/api/debian/ooni_download_geoip.py
Updates asn.mmdb
and cc.mmdb
in /var/lib/ooniapi/
Can be monitored with the GeoIP MMDB database dashboardβπ and by running:
Database backup tool
The backup tool is a service that regularly backs up
ClickHouseββ tables to S3. It also exports tables in
CSV.zstd
format for public consumption.
Contrarily to similar tools, it is designed to:
-
extract data in chunks and upload it without creating temporary files
-
without requiring transaction support in the database (not available in ClickHouse)
-
without requiring transactional filesystems or interrupting the database workload
It is configured by Ansibleβπ§ using the
/etc/ooni/db-backup.conf
file. Runs as a SystemD service, see
ooni-db-backup timerββ²
It compresses data using https://facebook.github.io/zstd/ during the upload.
The tool chunks tables as needed and add sleeps as needed to prevent a query backlog impacting the database performance.
Logs are generated as the ooni-db-backup
unit.
Also it generates the following metrics with the db-backup
prefix:
Metric name | Type | Description |
---|---|---|
upload_to_s3 | timer | Data upload duration |
run_export | timer | Data export duration |
table_{tblname}_backup_time_ms | timer | Table backup time |
See the Database backup dashboardβπ on Grafana and Metrics listβπ‘ for application metrics.
Monitor with:
Future improvements:
-
database schema backup. For extracting the schema see Database schema checkβπ‘
Ancillary modules
analysis/analysis/analysis.py
is the main analysis script and acts as
a wrapper to other components.
analysis/analysis/metrics.py
is a tiny wrapper for the Statsd Python
library.
Social media blocking event detector
Blocking event detector currently under development. Documented in https://docs.google.com/document/d/1WQ6_ybnPbO_W6Tq-xKuHQslG1dSPr4jUbZ3jQLaMqdw/edit
Deployed by the detector packageβπ¦.
See Monitor blocking event detections notebookβπ Event detector dashboard Detector timerββ²