Database
ATTENTION We have since switched to using PostgreSQL instead of EmbeddedRocksDB, so parts of this documentation needs updating.
The backend uses both the native ClickHouse table engine MergeTree and βforeignβ database engines, for example in EmbeddedRocksDB tables.
The MergeTree family is column-oriented, eventually consistent, non-transactional and optimized for OLAP workloads. It is suitable for very large number of records and columns.
The existing tables using MergeTree are meant to support replication in future by switching to ReplicatedMergeTree. This is crucial to implement active/standby and even active/active setups both for increased reliability and performance.
This choice is reflected in how record insertion and deduplication are performed: where possible, the API codebase assumes that multiple API instances can insert and fetch records without leading to inconsistent results. This requires special care in the implementation because MergeTree tables do not support immediate consistency nor transactions.
EmbeddedRocksDB uses https://rocksdb.org/ - a key-value, row-oriented, low-latency engine. Some backend workloads are more suitable for this engine. EmbeddedRocksDB does not support replication.
important Some workloads would better suited for a transactional and immediately consistent database. E.g. OONI Runβπ and Incident managementβπ. See https://clickhouse.com/docs/en/engines/table-engines/special/keeper-map
To get an overview of the existing tables and engines use:
An overview of the more important tables:
-
accounts tableββ EmbeddedRocksDB
-
asnmeta tableββ MergeTree
-
citizenlab tableββ ReplacingMergeTree
-
citizenlab_flip tableββ ReplacingMergeTree
-
counters_asn_test_list tableββ MaterializedView
-
counters_test_list tableββ MaterializedView
-
fastpath tableββ ReplacingMergeTree
-
fingerprints_dns tableββ EmbeddedRocksDB
-
fingerprints_dns_tmp tableββ EmbeddedRocksDB
-
fingerprints_http tableββ EmbeddedRocksDB
-
fingerprints_http_tmp tableββ EmbeddedRocksDB
-
incidents tableββ ReplacingMergeTree
-
jsonl tableββ MergeTree
-
msmt_feedback tableββ ReplacingMergeTree
-
oonirun tableββ ReplacingMergeTree
-
session_expunge tableββ EmbeddedRocksDB
-
test_groups tableββ Join
-
url_priorities tableββ CollapsingMergeTree
note As ClickHouse does not support transactions, there are some workarounds to implement atomic updates of whole tables.
One way is to use two tables with the same schema, where one table receive updates and another one is used for reading, and swap them once the writes are completed. This is used by the APIββ, Test helper rotationββ and other components. The SQL syntax is:
accounts table
Used for authentication. Assignes roles to accounts (by account id). The default value for accounts not present in the table is βuserβ. As such, the table is currently tracking only admin roles.
Schema:
To create and update account roles see:
Creating admin API accountsβπ
asnmeta table
Contains ASNβπ‘ lookup data used by the API
Schema:
asnmeta_tmp table
Temporary table, see asnmeta tableββ
Schema:
Schema:
citizenlab table
Contains data from the CitizenLab URL testing list repository.
Schema:
Receive writes from CitizenLab test list updaterββ
Used by CitizenLabβπ
citizenlab_flip table
Temporary table. See CitizenLab test list updaterββ
counters_asn_test_list table
A MATERIALIZED VIEW
table that, despite the name, is updated
continuously by ClickHouse as new measurements are inserted in the
fastpath
table.
It contains statistics on the incoming measurement flow, grouped by
week, probe_cc
, probe_asn
and input
. It is used by
Prioritizationβπ.
Schema:
counters_test_list table
Similar to counters_asn_test_list tableββ -
the main differences are that this table has daily granularity and does
not discriminate by probe_asn
Schema:
fastpath table
This table stores the output of the Fastpathββ. It is usually the largest table in the database and receives the largest amount of read and write traffic.
It is used by multiple entry points in the APIββ, primarily for measurement listing and presentation and by the Aggregation and MATβπ
Schema:
See Fastpath deduplicationβπ for deduplicating the table records.
fingerprints_dns table
Stores measurement DNS fingerprints. The contents are used by the
Fastpathββ to detect confirmed
measurements.
It is updated by the Fingerprint updaterββ
Schema:
fingerprints_dns_tmp table
Temporary table. See Fingerprint updaterββ
fingerprints_http table
Stores measurement HTTP fingerprints. The contents are used by the
Fastpathββ to detect confirmed
measurements.
It is updated by the Fingerprint updaterββ
Schema:
fingerprints_http_tmp table
Temporary table. See Fingerprint updaterββ
incidents table
Stores incidents. See Incident managementβπ.
Schema:
jsonl table
This table provides a method to look up measurements in JSONL files stored in S3 data bucketβπ‘ buckets.
It is written by the Measurement uploaderββ when Postcans and JSONL filesβπ‘ just after measurements are uploaded to the bucket.
It is used by multiple entry points in the APIββ, primarily
by get_measurement_meta
.
Schema:
msmt_feedback table
Used for Measurement feedbackβπ
Schema:
oonirun table
Used for OONI Runβπ
Schema:
session_expunge table
Used for authentication. It stores
Schema:
obs_openvpn table
Table used by OpenVPN tests. Written by the Fastpathββ and read by the APIββ
Schema:
test_groups table
Contains the definition of test groups. Updated manually and read by the API, mainly to show grouping in Explorerβπ±.
Schema:
test_helper_instances table
List of live, draining and destroyed test helper instances. Used by Test helper rotationββ internally.
Schema:
url_priorities table
This table stores rules to compute priorities for URLs used in Web connectivity testββ.
See Prioritizationβπ, Prioritization management and Prioritization rules UIβπ±
Schema:
ClickHouse system tables
ClickHouse has many system tables that can be used for monitoring performance and debugging.
Tables matching the following names (wildcard) can grow in size and also cause unnecessary I/O load:
Fastpath deduplication
ClickHouse does not deduplicate all records deterministically for performance reasons. Full deduplication can be performed with the following command and it is often required after running the fastpath to reprocess old measurements. Deduplication is CPU and IO-intensive.
Dropping tables
ClickHouse has a protection against table drops. Use this to allow dropping a table once. After use the flag file is automatically removed:
Investigating table sizes
To monitor ClickHouseβs performance and table size growth thereβs a dashboard and the ClickHouse queries notebookβπ
To investigate table and index sizes the following query is useful:
important The system tables named
asynchronous_metric_log
,query_log
andquery_thread_log
can be useful for debugging and performance optimization but grow over time and create additional I/O traffic. Also see ClickHouse system tablesβπ‘.
Possible workarounds are:
-
Drop old records.
-
Implement sampling writes that write only a percentage of the records.
-
Disable logging for specific users or queries.
-
Disable logging in the codebase running the query. See https://github.com/ooni/backend/blob/0ec9fba0eb9c4c440dcb7456f2aab529561104ae/fastpath/fastpath/db.py#L177 for an example of custom settings.
-
Disable logging altogether.
Also see Disable unnecessary ClickHouse system tablesβπ
Investigating database performance
If needed a simple script can be run to generate additional metrics and logs to identify slow queries.
Continuous Deployment: Database schema changes
The database schema required by the API is stored in https://github.com/ooni/backend/blob/0ec9fba0eb9c4c440dcb7456f2aab529561104ae/api/tests/integ/clickhouse_1_schema.sql
In a typical CD workflows it is possible to deploy database schema changes in a way that provides safety against outages and requires no downtimes.
Any component that can be deployed independently from others can be made responsible for creating and updating the tables it needs. If multiple components access the same table itβs important to coordinate their code updates and deployments during schema changes.
important typical CD requires deployments to be performed incrementally to speed up and simplify rollouts. It does not guarantees that package versions can be rolled forward or back by skipping intermediate versions.
To support such use-case more complex database scheme versioning would be required. Due to hardware resource constraints and time constraints we need to be able to manually create new tables (especially on the test stage) and perform analysis work, run performance and feature tests, troubleshooting etc. As such, at this time the backend components are not checking the database schema against a stored βtemplateβ at install or start time.
note A starting point for more automated database versioning existed at https://github.com/ooni/backend/blob/0ec9fba0eb9c4c440dcb7456f2aab529561104ae/api/database_upgrade_schema.py
note in future such check could be implemented and generate a warning to be enabled only on non-test CD stages.
note one way to make database changes at package install time is to use the
debian/postinst
script. For example see https://github.com/ooni/backend/blob/0ec9fba0eb9c4c440dcb7456f2aab529561104ae/fastpath/debian/postinst
important in writing SQL queries explicitly set the columns you need to read or write and access them by name: any new column will be ignored.
Some examples of database schema change workflows:
Adding a new column to the fastpath
The following workflow can be tweaked or simplified to add columns or tables to other components.
note multiple deployments or restarts should not trigger failures: always add columns using
ALTER TABLE <name> ADD COLUMN IF NOT EXISTS <name> <other parameters>
The new column is added either manually or by deploying a new version of the fastpath package that adds the column at install or start time. Ensure the rollout reaches all the CD stages.
Prepare, code-review, merge and deploy a new version of the fastpath that writes data to the new column. Ensure the rollout reaches all the CD stages.
Verify the contents of the new column e.g. using a Jupyter Notebook
Prepare, code-review, merge and deploy a new version of the API that reads data from the new column. Ensure the rollout reaches all the CD stages.
This workflow guarantees that the same schema and codebase has been tested on all stages before reaching production.
Renaming a column or table
This is a more complex workflow especially when multiple components access the same table or column. It can also be used for changing a column type and other invasive changes.
For each of the following steps ensure the rollout reaches all the CD stages.
-
Create a new column or table.
-
Prepare, code-review, merge and deploy a new version of all the components that write data. Write all data to both the old and new column/table.
-
If required, run one-off queries that migrate existing data from the old column/table to the new one. Using
IF NOT EXISTS
can be useful. -
Prepare, code-review, merge and deploy a new version of all the components that reads data. Update queries to only read from the new column/table.
-
Prepare, code-review, merge and deploy a new version of all the components that write data. Write data only to the new column/table.
-
Run one-off queries to delete the old column/table. Using
IF EXISTS
can be useful for idempotent runs.
Database schema check
To compare the database schemas across backend hosts you can use:
The generated files can be compared more easily using meld
.
Also related: Database backup toolββ