Welcome to SynDB
SynDB is a platform for finding, sharing, and meta-analyzing synapse metrics derived from high-resolution microscopy. It supports federated deployments where institutions retain data sovereignty while participating in cross-institutional analysis.
Quick Links
- Installation — get started with the CLI or GUI
- Upload — share your data
- Search — find datasets
- SyQL — query neuroanatomical data
- Graph Analysis — network analysis on connectomes
- Federation — join as a federated node
- API Reference — full route map and auth details
- CLI Reference — all commands and options
Resources
Why use SynDB?
SynDB serves three audiences: data owners who produce microscopy data, data scientists who analyze it, and institutions that want to participate in federated analysis without giving up control of their data.
Image data owner
- Data sharing: Others can use your data to teach, increasing the educational value of the data.
- Citations: Whenever your data is used in a publication, you will be cited, increasing your visibility in the scientific community.
- Provenance tracking: Version history, lineage, and auto-generated citations (BibTeX, RIS) for your datasets.
Data scientist
- Meta-analysis: Compare data across thousands of experiments using cross-dataset meta-analysis.
- SyQL queries: A declarative query language that resolves metadata into optimized SQL.
- Graph analysis: Network analysis on connectome data — motifs, shortest paths, reachability, cross-dataset comparison.
- Data visualization: Use the data to create visualizations for publications or presentations.
- Statistical modelling: Use the data to create models that can predict outcomes in future experiments.
Node operator / Institution
- Data sovereignty: Keep your data on your infrastructure — it never leaves your network.
- Federated meta-analysis: Participate in cross-institutional queries without transferring data.
- Minimal footprint: A federation node requires only ClickHouse and the
syndb-nodebinary. - Schema sync: The hub pushes DDL migrations to your node automatically.
See Federation Overview for setup details.
Installation
The SynDB platform provides several UIs directed towards different user groups. We recommend using the UIs for those getting started with SynDB. For advanced users, the API is the most flexible way to interact with the platform, see the Advanced section.
User interfaces
The SynDB interfaces are implemented with the Python programming language. To run them you need to have a Python environment.
Tip
Setup Python environment
This requires two things (1) Python interpreter installed in your system, (2) Python environment management for the SynDB packages.
There many solutions to both requirements, we recommend using
pyenvto solve 1st problem, and pipx for the 2nd. Follow the installation guide for your operating system.
Install
pipx:
pipx install syndb-cli[gui]
pip:
pip install syndb-cli[gui]
Upgrade
To upgrade the SynDB CLI along with the GUI (if installed), run the following command:
pipx:
pipx upgrade syndb-cli
pip:
pip install syndb-cli[gui] --upgrade
Advanced
syndb-cli without GUI
pipx:
pipx install syndb-cli
pip:
pip install syndb-cli
Direct API usage
The API can be accessed through the OpenAPI documentation. For a more tailored approach, you may interact with the API through the syndb-data Python package:
poetry:
poetry add syndb-data
pip:
pip install syndb-data
Alternatively, you may generate your own language bindings using openapi-generator; you will need the SynDB openapi schema.
Quick start
Command line interface
Following the installation, you may run the SynDB CLI using the following command:
syndb
The internal documentation of the CLI will guide you through the available commands and options. See the upload documentation for uploading with the CLI.
Graphical user interface
After installing the SynDB CLI, which contains the GUI, you may run the GUI using the following command:
syndb gui
The GUI will open in your default web browser, and in case the browser is already open, a new tab will be created. You might also have to refresh the new page to see the GUI.
Tip
Dark Mode
Use the dark reader extension for dark mode in the GUI.
Next steps
- Authenticate — set up your account and verify for academic access
- Search data
- Upload data
- SyQL queries — query neuroanatomical data (requires academic verification)
- Federation — join as a federated node
- API documentation
Authentication
SynDB uses PASETO v4 tokens for authentication. Access tokens authorize API requests; refresh tokens obtain new access tokens without re-authenticating.
Account Types
| Type | How to create | Capabilities |
|---|---|---|
| Regular | POST /v1/user/auth/register or CLI syndb user register | Browse, search datasets |
| Academic | Verify via CILogon (institutional login) | All regular + SyQL, graph analysis, meta-analysis, upload, jobs |
| Service | POST /v1/user/auth/register-service with X-Service-Secret header | Same as Academic (auto-verified) |
| SuperUser | Promoted by existing superuser | All + federation admin, ontology management |
Academic verification is required for compute-intensive operations: query execution, graph analysis, analytics, meta-analysis, and dataset upload.
Registration & Login
CLI:
syndb user register
syndb user login
API:
# Register
curl -X POST https://api.syndb.xyz/v1/user/auth/register \
-H "Content-Type: application/json" \
-d '{"email": "[email protected]", "password": "...", "display_name": "Jane Doe"}'
# Login — returns access_token and refresh_token
curl -X POST https://api.syndb.xyz/v1/user/auth/login \
-H "Content-Type: application/json" \
-d '{"email": "[email protected]", "password": "..."}'
Token Lifecycle
- Login returns an access token (15 min TTL) and a refresh token (30 day TTL)
- Use the access token in requests:
Authorization: Bearer <access_token> - When the access token expires, exchange the refresh token for a new pair:
curl -X POST https://api.syndb.xyz/v1/user/auth/refresh \ -H "Content-Type: application/json" \ -d '{"refresh_token": "..."}' - Each refresh rotates the token — the old refresh token is invalidated
Refresh tokens use family-based rotation: reuse of a revoked token invalidates the entire family, forcing re-authentication.
OAuth Providers
Authenticate through institutional or social identity providers:
| Provider | Use case | Scopes |
|---|---|---|
| CILogon | Academic institutional login (universities, research labs) | openid, email, org.cilogon.userinfo |
| GitHub | Social login + ORCID association | user:email |
| Social login | openid, email, profile | |
| GitLab | Social login (supports self-hosted instances) | read_user |
| ORCID | Researcher ID association (requires existing account) | openid |
All OAuth flows use PKCE (Proof Key for Code Exchange) with SHA-256.
Academic Verification via CILogon
CILogon links your institutional identity to your SynDB account, automatically verifying you as an academic user:
- Log in to SynDB
- Navigate to CILogon verification (or
GET /v1/user/authenticate/cilogon/authorize) - Authenticate with your institution’s SSO
- Your account is marked as verified — unlocking SyQL, graph analysis, and upload
Service Accounts
For automated pipelines and integrations:
curl -X POST https://api.syndb.xyz/v1/user/auth/register-service \
-H "Content-Type: application/json" \
-H "X-Service-Secret: <SERVICE_SECRET>" \
-d '{"email": "[email protected]", "password": "..."}'
Service accounts are auto-verified and bypass academic checks. The X-Service-Secret must match the server’s SERVICE_SECRET environment variable.
Logout
# Revokes the refresh token
curl -X POST https://api.syndb.xyz/v1/user/auth/logout \
-H "Content-Type: application/json" \
-d '{"refresh_token": "..."}'
Overview
The SynDB data platform is accessible through the API. By search, you may find and download high level metrics; by upload, you may share your data to become part of a meta-analytical study.
Composition
The SynDB data platform is designed to provide a comprehensive and organized repository of high-resolution microscopy data and associated metadata. The composition of SynDB can be broken down into three main components: Metadata, Image Metrics, and Raw Data. Each of these components plays a crucial role in the functionality and utility of the platform.
Metadata
The metadata is used to define and retrieve datasets. It stores metadata about the data in the respective dataset:
- Brain region
- Sourcing model animal
- Genetic manipulations (mutations)
- Microscopy method
- Publication information
The metadata is defined by the data owner during upload.
Warning
Dataset
You must split your dataset into individual SynDB datasets if any of these fields differ within your own dataset.
Image metrics
The image metrics in SynDB are derived from high-resolution microscopy assays, processed using sophisticated algorithms and models. These metrics form the primary data of interest within the platform. Each neuronal compartment and structure has its own unique set of metric categories, which necessitates distinct database schemas. We will refer to these phenomena as SynDB tables in future references.
To facilitate efficient data management, every imaging metric is linked to a dataset via its ID. This linkage enables robust search capabilities by filtering through metadata, thus avoiding the need to handle terabytes of raw data directly. You can learn more about how dataset metadata filtration works in the article on search.
The flexible data model of SynDB supports this functionality by defining specific parameters for each compartment and structure. These varied models are unified into comprehensive datasets through dataset metadata, which effectively organizes data groups across the platform. This unified approach ensures that users can efficiently access and analyze the vast array of imaging metrics available in SynDB.
Raw data
Raw data is the original data from which the metrics are derived. The raw data is stored in the database and can be requested from its metric counterpart. Raw data sets currently include meshes and SWC files. These are included at the discretion of the data owner.
Organization & Tracking
- Collections & Tags: Group datasets into curated collections and apply tags for discovery.
- Provenance & Citations: Track version history, data lineage, and generate citations in BibTeX/RIS format. Export metadata as JSON-LD for linked data integration.
Search
The search feature filters through datasets based on the search terms provided by the user. The search terms can be combined to narrow down the search results.
By default, every search field is AND meaning that the every term has to exist for in the resulting dataset.
Note
TODO
Add capabilities to customize the logical operators in the search, e.g., AND, OR, NOT.
Download the search results
Following the search, you may download the imaging derived metrics of the datasets from the search results. You will get a single .tar.xz file with parquet files inside. You may read parquet files using the pandas or polars library in Python.
Note
Other languages
Apache parquet is a file format supported by most popular programming languages. You may find libraries for reading parquet files in your preferred language.
Upload
Note
Prerequisites
This article requires that you understand how data is stored on SynDB, we recommend reading through the overview article if you are uncertain.
Uploading to SynDB is a multistep process, and requires understanding of the SynDB dataset model.
The process
Preparation
We recommend you to follow the guide in the exact sequence provided. This ensures the instructions are followed effectively and idiomatically.
Terms and conditions
You must accept the terms and conditions before uploading data. The terms include:
- Statement that the data is not false or misleading
- Redistribution rights
- Data licensing agreement with the license of your choice, see guide to pick license; the default license is ODC-BY.
Data structuring
SynDB utilizes data standardization to facilitate uploads. Your imaging metrics must be in a tabular data format; for instance, .xlsx, .csv, or .parquet. Read more about the data structuring in the contributor’s guide.
Login
Once you enter the upload page, you will be prompted to log in to your SynDB account if you are not already; furthermore, you must verify your academic status by logging in to your institution’s account.
The upload
You can upload data using the CLI or the GUI; including mixing the usage of both. We recommend that you only use the GUI for the first time.
1. Assign IDs, and correlate relations
Each SynDB unit requires a unique ID assigned before being uploaded to the platform. The GUI does this automatically, but not the CLI. When you have multiple SynDB tables under one dataset it is expected that these have some relations with each other.
Warning
Dataset integrity
As it may lead to undefined behaviour, it is disallowed to upload SynDB table data that are unrelated under the same dataset!
Meaning that you cannot upload a table of neurons and a table of synapses under the same dataset unless each synapse has a relation to a neuron from the respective table of neurons.
GUI
The GUI will automatically assign UUIDs to each SynDB unit. The relations are correlated based on the top-down hierarchy of the tables, you may find the latest version of the hierarchy in the source on GitHub.
CLI
TODO
2. Selecting or creating the SynDB dataset metadata
As mentioned, in the overview article, every dataset has a metadata defined by the data owner during the upload. You can either select an existing dataset or create a new one.
3. Confirm and upload
Before the upload starts you will be prompted to confirm the dataset and the data you are uploading. Once you confirm, the upload will start. Should be relatively quick.
Delete owned datasets
You may at any time delete datasets that you own. This will remove the dataset and all the data associated with it. The deletion is permanent and cannot be undone.
External sources
SynDB supports importing connectomics data from 20+ major connectome datasets. This page covers the most common imports. See the CLI Reference for the full list of supported datasets.
Note
Dataset UUID
The
<syndb-dataset-id>is the UUID of the SynDB dataset that will be associated with the imported data. You can copy and paste it from the dataset management page on the GUI.
FlyWire
FlyWire connectomics data can be imported from CAVE CSV exports.
Validate your FlyWire data directory:
syndb etl flywire validate --data-dir external_datasets/FlyWire
Import into your dataset:
syndb etl flywire import \
--data-dir external_datasets/FlyWire \
--dataset-id <syndb-dataset-id> \
--table neurons \
--table synapses
FlyWire also supports a synapses-detailed table for individual synapse positions (large, batched import).
Hemibrain
The Hemibrain v1.2.1 dataset from Janelia FlyEM can be downloaded directly from Google Cloud Storage.
Download the dataset:
syndb etl hemibrain download --output-dir external_datasets/Hemibrain --extract
Validate the data directory:
syndb etl hemibrain validate --data-dir external_datasets/Hemibrain
Import into your dataset:
syndb etl hemibrain import \
--data-dir external_datasets/Hemibrain \
--dataset-id <syndb-dataset-id> \
--table neurons \
--table synapses
MANC (Male Adult Nerve Cord)
The MANC / MaleCNS v0.9 dataset from Janelia FlyEM uses Apache Arrow Feather files.
Download the dataset:
syndb etl manc download --output-dir external_datasets/MANC
Validate the data directory:
syndb etl manc validate --data-dir external_datasets/MANC
Import into your dataset:
syndb etl manc import \
--data-dir external_datasets/MANC \
--dataset-id <syndb-dataset-id> \
--table neurons \
--table synapses
Warning
Download size
The MANC dataset includes the connectome-weights Feather file (~1.1 GB). Ensure sufficient disk space before downloading.
Collections & Tags
Organize datasets into curated collections and apply tags for discovery.
Tags
Tags are free-form metadata labels attached to datasets. They surface in search results and help users discover related data.
Add Tags
Tags are assigned during dataset creation or updated afterward via the dataset metadata endpoints.
Search by Tags
curl "https://api.syndb.xyz/v1/search?q=drosophila+mushroom+body"
The full-text search indexes dataset tags alongside titles and descriptions. See Search.
Collections
Collections are curated groupings of datasets — for example, “All Drosophila connectomes” or “Lab X publication datasets.”
Create a Collection
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://api.syndb.xyz/v1/neurodata/collections \
-d '{
"name": "Drosophila Connectomes",
"description": "All Drosophila melanogaster connectome datasets"
}'
List Collections
curl https://api.syndb.xyz/v1/neurodata/collections
Add Datasets to a Collection
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://api.syndb.xyz/v1/neurodata/collections/{collection_id}/datasets \
-d '{"dataset_id": "..."}'
Collections are useful for meta-analysis — pass a collection’s dataset IDs to the meta-analysis endpoint to compare all datasets in the group.
Provenance & Citations
SynDB tracks dataset lineage, version history, and generates machine-readable citations.
Version History
Each dataset maintains a version history. View all versions:
curl -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/neurodata/datasets/{dataset_id}/versions
Provenance Chain
The provenance endpoint shows the audit trail — who created, modified, or derived from the dataset:
curl -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/neurodata/datasets/{dataset_id}/provenance
Lineage
Track derived-from relationships between datasets:
curl -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/neurodata/datasets/{dataset_id}/lineage
Citations
Generate citations in standard formats:
# BibTeX
curl "https://api.syndb.xyz/v1/neurodata/datasets/{dataset_id}/citation?format=bibtex"
# RIS (for EndNote, Zotero)
curl "https://api.syndb.xyz/v1/neurodata/datasets/{dataset_id}/citation?format=ris"
JSON-LD
Export dataset metadata as linked data for integration with knowledge graphs and semantic web tools:
curl "https://api.syndb.xyz/v1/neurodata/datasets/{dataset_id}/jsonld"
Returns a JSON-LD document following schema.org and neuroscience ontology standards.
Access Requests
For restricted datasets, request access from the dataset owner:
# Request access
curl -X POST -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/neurodata/datasets/{dataset_id}/access-request
# Check access status
curl -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/neurodata/datasets/{dataset_id}/access
The dataset creator receives the request and can approve or deny it.
SyQL Query Language
SyQL (SynDB Query Language) is a declarative query language for neuroanatomical data. It resolves dataset metadata into optimized ClickHouse SQL, handles access control, and submits queries to the async job system.
Requires Academic verification.
Workflow
SyQL has a three-stage pipeline:
| Stage | Endpoint | What it does |
|---|---|---|
| Plan | POST /v1/syql/plan | Parse → validate → resolve metadata → return logical plan |
| Explain | POST /v1/syql/explain | Plan + compile to SQL → return compiled query and advisories |
| Execute | POST /v1/syql/exec | Plan + compile + submit to job queue → return job ID |
Use plan to validate syntax. Use explain to preview the generated SQL before committing to execution. Use exec when you’re ready to run.
Plan
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://api.syndb.xyz/v1/syql/plan \
-d '{"query": "SELECT mesh_volume, brain_region FROM neurons WHERE dataset_id = '\''...'\'' LIMIT 1000"}'
Returns the parsed logical plan: resolved tables, columns, filters, and metadata.
Explain
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://api.syndb.xyz/v1/syql/explain \
-d '{"query": "SELECT mesh_volume FROM neurons WHERE brain_region = '\''mushroom_body'\'' LIMIT 1000"}'
Returns:
- The compiled ClickHouse SQL
- Query advisories (e.g., missing indexes, large scan warnings)
- Estimated cost
Execute
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://api.syndb.xyz/v1/syql/exec \
-d '{"query": "SELECT mesh_volume FROM neurons WHERE brain_region = '\''mushroom_body'\'' LIMIT 1000"}'
Returns a job_id. Track and download results via the Jobs System.
Cancel
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://api.syndb.xyz/v1/syql/cancel \
-d '{"query_id": "..."}'
Federation Scope
Add "scope": "federation" to fan the query out across all federated nodes:
{
"query": "SELECT COUNT(*) FROM synapses GROUP BY dataset_id",
"scope": "federation"
}
See Cross-Cluster Queries for details.
Saved Queries
Frequently used SyQL queries can be saved for reuse. See Saved Queries.
Saved Queries
Save SyQL queries for reuse, sharing, and scheduled re-execution.
Requires Academic verification.
Save a Query
From SyQL
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://api.syndb.xyz/v1/queries/from-syql \
-d '{
"name": "Mushroom body neuron volumes",
"query": "SELECT mesh_volume FROM neurons WHERE brain_region = '\''mushroom_body'\''",
"description": "All neuron mesh volumes in the mushroom body"
}'
Direct Save
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://api.syndb.xyz/v1/queries \
-d '{
"name": "My query",
"query": "...",
"description": "..."
}'
List Saved Queries
curl -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/queries
Get a Query
curl -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/queries/{query_id}
Update
curl -X PUT -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://api.syndb.xyz/v1/queries/{query_id} \
-d '{"name": "Updated name", "query": "...", "description": "..."}'
Delete
curl -X DELETE -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/queries/{query_id}
Run a Saved Query
curl -X POST -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/queries/{query_id}/run
Submits the query to the job system and returns a job ID.
CLI
syndb query list
syndb query save --name "My query" --query "SELECT ..."
syndb query show {query_id}
syndb query run {query_id}
syndb query status {query_id}
syndb query update {query_id} --name "New name"
syndb query delete {query_id}
Analytics
Pre-computed analytics endpoints for dataset exploration. These query ClickHouse materialized views and return results quickly (cached for 5 minutes).
Requires Academic verification.
Dataset Summary
Row counts per compartment type:
curl -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/analytics/{dataset_id}/summary
Returns counts for neurons, synapses, dendrites, axons, pre-synaptic terminals, dendritic spines, vesicles, mitochondria, and other compartment types present in the dataset.
Neuron Morphometrics
Morphological statistics for neurons in a dataset:
curl -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/analytics/{dataset_id}/neuron-morphometrics
Returns distributions of mesh volume, surface area, sphericity, terminal count, and other morphometric features.
Z-Score Comparison
Standardized comparison of a metric across multiple datasets:
curl -H "Authorization: Bearer $TOKEN" \
"https://api.syndb.xyz/v1/analytics/zscore-comparison?metric=mesh_volume&dataset_ids=uuid1,uuid2,uuid3"
Returns per-dataset z-scores normalized against the pooled distribution — useful for identifying outlier datasets.
Graph Summary
Network-level statistics for connectome datasets:
curl -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/analytics/{dataset_id}/graph-summary
Returns: node count, edge count, density, number of connected components, mean clustering coefficient.
Reciprocity
Fraction of bidirectional synaptic connections:
curl -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/analytics/{dataset_id}/reciprocity
Degree Distribution
Top neurons by connectivity:
curl -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/analytics/{dataset_id}/degree-distribution
Returns in-degree, out-degree, and total degree distributions.
Graph Analysis
In-memory graph analysis on connectome datasets. SynDB constructs a directed graph from synapse data in ClickHouse (up to 10M edges) and runs network algorithms using petgraph.
Requires Academic verification.
Graph Metrics
Basic network statistics:
curl -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/graph/{dataset_id}/metrics
Returns: node count, edge count, density, number of connected components, mean clustering coefficient, diameter, hub neurons (highest centrality).
Motif Analysis (Triadic Census)
Count all 16 three-node subgraph patterns (triadic census):
curl -X POST -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/graph/{dataset_id}/motifs
Compare by Synapse Type
Compare motif distributions across different neurotransmitter types:
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://api.syndb.xyz/v1/graph/{dataset_id}/motifs/compare-synapse-types
Shortest Path
Find the shortest path between two neurons:
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://api.syndb.xyz/v1/graph/{dataset_id}/shortest-path \
-d '{"source": "neuron-id-1", "target": "neuron-id-2"}'
Uses Dijkstra’s algorithm. Supports configurable edge weight modes.
Reachability
Find all neurons reachable within N hops:
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://api.syndb.xyz/v1/graph/{dataset_id}/reachability \
-d '{"source": "neuron-id", "max_hops": 3}'
BFS traversal, maximum 100 hops.
Reachability Curve
Sample how reachability grows with hop count:
curl -H "Authorization: Bearer $TOKEN" \
"https://api.syndb.xyz/v1/graph/{dataset_id}/reachability-curve?max_hops=20&samples=100"
Returns the fraction of the network reachable at each hop distance, sampled from random starting neurons (max 500 samples, max 20 hops).
Full Analysis
Run metrics + motifs + hub neuron detection in one call:
curl -X POST -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/graph/{dataset_id}/full-analysis
Cross-Dataset Comparison
Compare graph properties across multiple datasets:
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
"https://api.syndb.xyz/v1/graph/compare" \
-d '{"dataset_ids": ["uuid-1", "uuid-2", "uuid-3"]}'
Graph Precompute (CLI)
For large datasets, precompute graph metrics and store results in ClickHouse materialized tables:
syndb graph-precompute --dataset-id {uuid}
This is a batch operation typically run as part of the ETL pipeline or as a Kubernetes job.
Meta-Analysis
Cross-dataset meta-analysis computes effect sizes and heterogeneity statistics across multiple datasets, enabling comparisons that no single dataset can answer.
Requires Academic verification.
Cross-Dataset Analysis
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://api.syndb.xyz/v1/meta-analysis/analyze \
-d '{
"table": "neurons",
"metric": "mesh_volume",
"group_by": "brain_region",
"dataset_ids": ["uuid-1", "uuid-2", "uuid-3"]
}'
Parameters
| Field | Required | Description |
|---|---|---|
table | Yes | Target table: neurons, synapses, dendrites, axons, pre_synaptic_terminals, dendritic_spines, vesicles, mitochondria |
metric | Yes | Column to analyze (e.g., mesh_volume, mesh_surface_area, connection_score) |
group_by | No | Grouping column (e.g., brain_region, neurotransmitter) |
dataset_ids | No | Specific datasets (omit for all accessible datasets) |
scope | No | "local" (default) or "federation" |
cluster_ids | No | Specific federation clusters (when scope is federation) |
Atlas Comparison
Compare dataset metrics against reference atlases (pre-aggregated materialized views):
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://api.syndb.xyz/v1/meta-analysis/atlas-compare \
-d '{
"dataset_id": "...",
"table": "neurons",
"metric": "mesh_volume"
}'
Federation Scope
To run meta-analysis across federated nodes:
{
"table": "synapses",
"metric": "connection_score",
"group_by": "neurotransmitter",
"scope": "federation",
"cluster_ids": ["cluster-uuid-1", "cluster-uuid-2"]
}
The hub fans the aggregation out to each specified cluster and merges the results. See Cross-Cluster Queries.
Omit cluster_ids to include all healthy clusters.
Jobs System
Long-running queries execute asynchronously through the job system. Submit a job, check its status, and download results when ready.
Requires Academic verification.
Workflow
Submit job → Job queued → Job running → Job completed → Download result
→ Job failed (check error, rerun)
Submit a Query Job
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://api.syndb.xyz/v1/jobs \
-d '{"query": "SELECT * FROM neurons WHERE dataset_id = '\''...'\''", "format": "arrow"}'
Returns a job_id for tracking.
Submit a Graph Job
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://api.syndb.xyz/v1/jobs/graph \
-d '{"dataset_id": "...", "analysis": "full"}'
Check Status
curl -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/jobs/{job_id}
| Status | Meaning |
|---|---|
pending | Queued, waiting for a worker |
running | Currently executing |
completed | Results available for download |
failed | Execution error (check error_message) |
cancelled | Cancelled by user |
List Your Jobs
curl -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/jobs
Download Results
curl -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/jobs/{job_id}/result \
-o result.arrow
- Query jobs: Arrow IPC format (readable by pandas, polars, DuckDB)
- Graph jobs: JSON
Cancel a Job
curl -X DELETE -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/jobs/{job_id}
Rerun a Job
curl -X POST -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/jobs/{job_id}/rerun
Creates a new job with the same parameters.
Configuration
| Parameter | Default | Environment Variable |
|---|---|---|
| Max concurrent workers | 4 | JOB_QUEUE_MAX_WORKERS |
| Result TTL | 24 hours | JOB_RESULT_TTL_HOURS |
| Max result size | 1 GB | JOB_MAX_RESULT_BYTES |
Results are stored in S3 and automatically cleaned up after the TTL expires.
Federation Overview
SynDB federation allows multiple institutions to participate in a shared neuroscience data network while retaining full control of their data. Each institution runs a node with its own ClickHouse instance; a central hub coordinates queries across all nodes.
Why Federate?
| Concern | Without federation | With federation |
|---|---|---|
| Data sovereignty | Upload all data to a central server | Data stays on your infrastructure |
| Meta-analysis | Limited to datasets on one instance | Query across all participating institutions |
| Compliance | Data leaves your network | Data never leaves — only query results cross boundaries |
| Latency | Single point of access | Local reads are fast; cross-cluster queries pay network cost |
Key Concepts
Hub — The coordinating instance that runs the full SynDB stack (API, PostgreSQL, ClickHouse, Meilisearch, S3). It maintains a registry of federated clusters, monitors their health, and routes cross-cluster queries.
Node — A lightweight participant running ClickHouse and the syndb-node binary. Nodes register with the hub via libp2p or HTTP, receive schema migrations, and respond to delegated queries.
Schema versioning — The hub pushes ClickHouse DDL migrations to all nodes. Queries only route to nodes whose schema version is compatible.
Health monitoring — The hub periodically checks each node’s health. Nodes are classified as Healthy, Degraded, Unreachable, or Unknown. Unhealthy nodes are excluded from federation queries.
Federation password — A shared secret that nodes present when registering with the hub. Prevents unauthorized clusters from joining.
When to Federate vs. Upload
Federate when:
- Institutional policy requires data to stay on-premise
- You have existing ClickHouse infrastructure
- You want to contribute to cross-institutional meta-analysis without data transfer
Upload directly when:
- You don’t have infrastructure to maintain
- Your data has no residency requirements
- You want the simplest path to sharing
Architecture at a Glance
┌─────────────────────────────────┐
│ Hub │
│ API + PostgreSQL + ClickHouse │
│ + S3 + Meilisearch + libp2p │
└──────┬──────────────┬───────────┘
│ libp2p/QUIC │ libp2p/QUIC
┌────▼────┐ ┌────▼────┐
│ Node A │ │ Node B │
│ CH + CLI │ │ CH + CLI │
└─────────┘ └─────────┘
Queries flow: User → Hub API → Hub ClickHouse → remote() to Node ClickHouse → results aggregated at Hub.
See Architecture for the full technical breakdown.
Federation Architecture
Components
Hub
The hub runs the full SynDB stack and coordinates the federation:
| Component | Role |
|---|---|
| syndb-api | HTTP API (port 8080) + Arrow Flight (port 50051) |
| PostgreSQL | User accounts, dataset metadata, cluster registry, job queue, benchmarks |
| ClickHouse | Local data warehouse + remote() queries to nodes |
| S3/MinIO | Mesh files, job results, ETL staging |
| Meilisearch | Full-text search index |
| HubRegistryActor | libp2p actor managing cluster registration and health |
| FederationHealthMonitor | Periodic health checks with circuit-breaker logic |
Node
Nodes are lightweight — no PostgreSQL, no S3, no Meilisearch:
| Component | Role |
|---|---|
| syndb-node | Federation daemon with Arrow Flight server (port 50052) |
| ClickHouse | Local data warehouse (HTTP port 8124, native port 9003/9440) |
| ClusterActor | libp2p actor handling hub communication |
Networking: libp2p
Federation uses libp2p for peer-to-peer communication:
- Transport: QUIC with built-in TLS 1.3 (encrypted, multiplexed)
- Discovery: mDNS for LAN (zero-config), DHT for WAN
- NAT traversal: Relay nodes for peers behind NAT
- Actor model: kameo actors manage the swarm event loop
DHT Registration
Services register under well-known names in the DHT:
| Name | Actor |
|---|---|
syndb-hub | HubRegistryActor |
syndb-cluster:{name} | ClusterActor |
The ClusterActor on each node looks up syndb-hub in the DHT to find and register with the hub.
Actor Messages
The ClusterActor handles these message types:
| Message | Direction | Purpose |
|---|---|---|
HealthPing | Hub → Node | Periodic liveness check |
SchemaSync | Hub → Node | Push DDL migrations |
DatasetCatalogRequest | Hub → Node | Discover datasets on node |
GetFlightEndpoint | Hub → Node | Resolve Flight address for data transfer |
AnalyticsQuery | Hub → Node | Delegated analytics computation |
OntologySync | Hub → Node | Push ontology terms |
Data Plane
Two mechanisms move data between hub and nodes:
ClickHouse remote()
For SQL queries, the hub compiles a remote('node-host:port', 'syndb', 'table', 'user', 'password') call that executes directly on the node’s ClickHouse and streams results back.
Arrow Flight (Internal)
For large result sets and non-SQL workloads (graph analysis, analytics), the hub delegates to the node’s internal Flight server (port 50052). Results stream back as Arrow IPC batches.
Schema Versioning
Each ClickHouse DDL migration has a version number. The hub tracks the current version and each node’s version:
- Hub receives a schema sync request (
POST /v1/federation/schema/sync) - Hub sends pending migrations to each active node via
SchemaSyncmessage - Nodes apply migrations and report their new version
- Queries only route to nodes whose schema version is compatible
Health Monitoring
The FederationHealthMonitorActor runs on the hub:
| State | Meaning | Query routing |
|---|---|---|
| Healthy | Responds to pings, schema compatible | Included |
| Degraded | Responds but slow or partially failing | Included with lower priority |
| Unreachable | Failed consecutive pings | Excluded |
| Unknown | Newly registered, not yet checked | Excluded until first successful ping |
Health transitions are logged and stored in PostgreSQL for audit.
Concurrency Model
- Lock-free reads: The hub’s cluster registry uses
papayaconcurrent hash maps — reads never block, even under high query load - Actor isolation: Each cluster connection is managed by its own actor, preventing one slow node from blocking others
- Supervisor trees: Actor failures are caught and restarted by the kameo supervisor
Node Setup
This guide walks through joining the SynDB federation as a node operator.
Prerequisites
- ClickHouse instance with a
syndbdatabase - Network reachability to the hub (or mDNS on the same LAN)
- The federation password (provided by the hub administrator)
syndb-clibinary (with federation feature)
Step 1: Initialize
syndb federation init \
--cluster_name "my-lab-node" \
--clickhouse_endpoint "clickhouse.mylab.edu" \
--clickhouse_http_port 8123 \
--clickhouse_port 9440 \
--federation_password "$SYNDB_FEDERATION_PASSWORD" \
--institution "My University" \
--contact_email "[email protected]"
This command:
- Bootstraps a libp2p swarm and discovers the hub via mDNS or configured multiaddrs
- Registers the node with the hub (presenting the federation password)
- Applies any pending ClickHouse schema migrations
- Saves configuration to
~/.config/syndb/federation.json
Optional flags
| Flag | Default | Description |
|---|---|---|
--listen_addr | OS-assigned | libp2p listen address (e.g., /ip4/0.0.0.0/udp/4001/quic-v1) |
--description | — | Human-readable cluster description |
Step 2: Verify
# Show federation config
syndb federation status
# Test connectivity (3s mDNS discovery + hub + ClickHouse check)
syndb federation test
federation test performs:
- Bootstraps a temporary libp2p swarm with mDNS discovery
- Looks up the hub in the DHT
- Tests ClickHouse connectivity
Step 3: Sync Schema
If the hub has newer schema migrations:
# Preview changes
syndb federation sync-schema --dry_run true
# Apply
syndb federation sync-schema
This uses an HTTP fallback via SYNDB_HUB_URL if libp2p is unavailable.
Step 4: Confirm Registration
List all federated clusters to verify your node appears:
export SYNDB_HUB_URL="https://api.syndb.xyz"
syndb federation clusters
Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
SYNDB_FEDERATION_PASSWORD | Yes | — | Shared secret for hub registration |
SYNDB_HUB_URL | For HTTP fallback | — | Hub API URL (e.g., https://api.syndb.xyz) |
FEDERATION_CLUSTER_NAME | Yes (node mode) | — | Unique cluster identifier |
FEDERATION_NODE_FLIGHT_PORT | No | 50052 | Internal Flight gRPC port |
FEDERATION_NODE_FLIGHT_ADVERTISE | No | localhost:50052 | Advertised Flight endpoint |
FEDERATION_ENABLE_MDNS | No | true | Enable mDNS for LAN discovery |
FEDERATION_LISTEN_ADDR | No | OS-assigned | libp2p listen address |
FEDERATION_HUB_MULTIADDRS | No | — | Comma-separated hub multiaddrs for WAN |
FEDERATION_CLUSTER_NATIVE_PORT | No | 9440 | ClickHouse native port for remote() queries |
Docker Compose (Development)
For local development, the federation profile starts a hub and one node:
docker compose --profile federation up -d
This starts:
clickhouse-node— ClickHouse on HTTP 8124, native 9003clickhouse-node-setup— Creates federation user on the nodeclickhouse-hub-fed-setup— Creates federation user on the hubsyndb-node— Federation daemon with Flight on 50052, libp2p on 4001
All services use network_mode: host and discover each other via localhost.
Removing a Node
syndb federation logout
This deletes ~/.config/syndb/federation.json. The hub administrator can also deactivate the cluster via DELETE /v1/federation/clusters/{id}.
Hub Administration
All hub administration endpoints require SuperUser authentication.
Federation Status
curl -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/federation/status
{
"total_clusters": 5,
"active_clusters": 4,
"healthy": 3,
"degraded": 1,
"unreachable": 0,
"schema_version": 12
}
Cluster Management
List Clusters
curl -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/federation/clusters
Returns each cluster’s ID, name, endpoint, port, health status, and active flag.
Register a Cluster
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://api.syndb.xyz/v1/federation/clusters \
-d '{
"name": "partner-lab",
"endpoint": "ch.partner-lab.edu",
"port": 9440,
"description": "Partner Lab ClickHouse node",
"institution": "Partner University",
"contact_email": "[email protected]"
}'
Clusters can also self-register via POST /v1/federation/register using the federation password (no SuperUser required).
Deactivate a Cluster
curl -X DELETE -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/federation/clusters/{cluster_id}
Sets is_active = false. The cluster is excluded from future queries but its record is preserved.
Health Checks
Single Cluster
curl -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/federation/clusters/{cluster_id}/health
Verification Tests
Three targeted tests for diagnosing cluster issues:
# Test ClickHouse connectivity and measure latency
curl -X POST -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/federation/clusters/{cluster_id}/test/connectivity
# Verify schema version compatibility
curl -X POST -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/federation/clusters/{cluster_id}/test/schema
# Run a test cross-cluster query
curl -X POST -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/federation/clusters/{cluster_id}/test/query
Schema Sync
Push pending DDL migrations to all active clusters:
curl -X POST -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/federation/schema/sync
Get the current schema version and migrations:
# All migrations
curl -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/federation/schema
# Migrations since version 10
curl -H "Authorization: Bearer $TOKEN" \
"https://api.syndb.xyz/v1/federation/schema?since_version=10"
Benchmarks
Track federation query performance:
# Submit a benchmark record
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://api.syndb.xyz/v1/federation/benchmarks \
-d '{
"cluster_id": "...",
"query_type": "remote_single",
"latency_ms": 145,
"row_count": 50000,
"cluster_count": 1,
"payload_bytes": 2048000,
"success": true
}'
# List benchmarks with filters
curl -H "Authorization: Bearer $TOKEN" \
"https://api.syndb.xyz/v1/federation/benchmarks?query_type=remote_single&limit=50"
# Aggregate stats grouped by query type
curl -H "Authorization: Bearer $TOKEN" \
"https://api.syndb.xyz/v1/federation/benchmarks/aggregate?since=2024-01-01"
Query Types
| Type | Description |
|---|---|
remote_single | Query to one remote cluster |
remote_multi | Query spanning multiple clusters |
federation_union | Union across all federated clusters |
federation_search | Federated search |
health_check | Health check probe |
Cross-Cluster Queries
Federation queries let you analyze data across all participating nodes from a single API call.
How It Works
- User submits a query via SyQL or meta-analysis endpoint with federation scope
- Hub resolves targets — checks dataset locality index to determine which nodes hold relevant data
- Hub compiles remote queries — generates ClickHouse
remote('node:port', 'syndb', 'table', 'user', 'pass')calls - Nodes execute locally — each node runs its portion of the query against local data
- Hub aggregates — results stream back and are merged at the hub
SyQL with Federation Scope
SyQL queries can target the federation by specifying scope in the request:
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://api.syndb.xyz/v1/syql/exec \
-d '{
"query": "SELECT neuron FROM neurons WHERE brain_region = '\''mushroom_body'\''",
"scope": "federation"
}'
The hub transparently fans the query out to nodes that hold matching datasets.
Meta-Analysis Across Clusters
Specify cluster_ids to include specific nodes:
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://api.syndb.xyz/v1/meta-analysis/analyze \
-d '{
"table": "neurons",
"metric": "mesh_volume",
"group_by": "brain_region",
"scope": "federation",
"cluster_ids": ["uuid-1", "uuid-2"]
}'
Omit cluster_ids to query all healthy clusters.
Data Plane: Arrow Flight
For large result sets and non-SQL workloads (graph analysis, analytics), the hub delegates to each node’s internal Flight server:
- Hub sends a Flight
DoGetrequest to the node’s advertised Flight endpoint (default port 50052) - Results stream back as Arrow IPC record batches
- The hub merges batches from multiple nodes before returning to the client
Limitations
| Constraint | Detail |
|---|---|
| Latency | Cross-cluster queries add network round-trip time per node |
| Schema compatibility | Nodes must be at a compatible schema version; incompatible nodes are excluded |
| Node health | Only Healthy and Degraded nodes receive queries; Unreachable nodes are skipped |
| Delegation timeout | Default 30s (FEDERATION_DELEGATION_TIMEOUT_SECS); long-running queries may need async jobs |
| No cross-node joins | Each node executes independently; joins happen only against local data |
Best Practices
- Use async jobs (
POST /v1/jobs) for large federation queries to avoid HTTP timeouts - Check federation status before running large queries to know which nodes are available
- Prefer meta-analysis endpoints for cross-dataset aggregation — they handle fan-out efficiently
- Monitor benchmarks to track federation query performance over time
Federation Troubleshooting
Node Cannot Find Hub
Symptom: syndb federation init or syndb federation test hangs during hub discovery.
Causes and fixes:
| Cause | Fix |
|---|---|
| mDNS blocked by firewall | Open UDP port 5353 or set FEDERATION_ENABLE_MDNS=false and use explicit multiaddrs |
| Hub and node on different networks | Set FEDERATION_HUB_MULTIADDRS to the hub’s libp2p address (e.g., /ip4/hub-ip/udp/4001/quic-v1) |
| Hub not running | Verify hub process is up and listening on its libp2p port |
Registration Rejected
Symptom: "Invalid federation password" error.
Fix: Ensure SYNDB_FEDERATION_PASSWORD matches the hub’s FEDERATION_PASSWORD exactly. Check for trailing whitespace or newlines in environment variables.
Schema Version Mismatch
Symptom: Node excluded from federation queries; hub logs show schema incompatibility.
Fix:
# Check current schema
syndb federation status
# Sync to latest
syndb federation sync-schema
If sync fails, verify the node’s ClickHouse is reachable and the syndb database exists.
Health States
| State | Meaning | Action |
|---|---|---|
| Healthy | All checks pass | None |
| Degraded | Responds but slow or partially failing | Check ClickHouse load, disk space, network |
| Unreachable | Failed consecutive pings | Check firewall, ClickHouse process, network connectivity |
| Unknown | Newly registered | Wait for first health check cycle or trigger manual verify |
Trigger a manual health check from the hub:
curl -X POST -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/federation/clusters/{id}/verify
Docker Compose Issues
Port Conflicts
The federation profile uses network_mode: host. Check for conflicts:
- Hub ClickHouse: HTTP 8123, native 9002
- Node ClickHouse: HTTP 8124, native 9003
- Federation Flight: 50052
- libp2p: UDP 4001
Node Fails to Start
Check that hub ClickHouse setup containers completed first:
docker compose --profile federation logs clickhouse-hub-fed-setup
docker compose --profile federation logs clickhouse-node-setup
These create the federation user on each ClickHouse instance. If they fail, the node cannot authenticate for remote() queries.
Connectivity Test Sequence
Run targeted tests to isolate the failure:
# 1. Test ClickHouse connectivity
curl -X POST -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/federation/clusters/{id}/test/connectivity
# 2. Test schema compatibility
curl -X POST -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/federation/clusters/{id}/test/schema
# 3. Test cross-cluster query
curl -X POST -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/federation/clusters/{id}/test/query
Each test returns a pass/fail result with latency and error details. Work through them in order — later tests depend on earlier ones passing.
Docker Compose
Local development and single-machine deployment using Docker Compose.
Base Stack
docker compose up -d
Starts the core services:
| Service | Port | Description |
|---|---|---|
syndb-api | 8080 (HTTP), 50051 (Flight) | REST API + Arrow Flight |
syndb-ui | — (reverse-proxied) | Web frontend |
postgres | 5433 | Metadata, users, access control |
clickhouse | 8123 (HTTP), 9002 (native) | Data warehouse |
minio | 9000 (API), 9001 (console) | Object storage |
meilisearch | 7700 | Full-text search |
All services use network_mode: host — they bind directly to the host network.
Federation Profile
docker compose --profile federation up -d
Adds federation services on top of the base stack:
| Service | Port | Description |
|---|---|---|
clickhouse-node | 8124 (HTTP), 9003 (native) | Node ClickHouse |
clickhouse-node-setup | — | Creates federation user on node |
clickhouse-hub-fed-setup | — | Creates federation user on hub |
syndb-node | 50052 (Flight), 4001/UDP (libp2p) | Federation node daemon |
Note: The
federationandfederation-worldprofiles share port 8124 and are mutually exclusive.federation-worldruns 5 regional ClickHouse nodes for benchmarking only.
ETL Profile
Run dataset imports:
docker compose --profile etl run syndb-etl <dataset> <command>
Example:
docker compose --profile etl run syndb-etl hemibrain download
docker compose --profile etl run syndb-etl hemibrain import
Version Management
All service versions are defined in versions.nix. After changing versions:
just sync-versions
This regenerates .env with the correct image tags.
Image Building
Build container images from Nix:
just stack-prepare
This builds OCI images for syndb-api, syndb-node, syndb-cli-etl, and the UI.
Volumes
| Volume | Service | Content |
|---|---|---|
clickhouse-data | clickhouse | ClickHouse data |
clickhouse-node-data | clickhouse-node | Node ClickHouse data |
postgres-data | postgres | PostgreSQL data |
minio-data | minio | S3 object storage |
meilisearch-data | meilisearch | Search index |
Cleanup
ClickHouse creates files with UID 100100 and restrictive permissions. To clean volumes:
podman unshare rm -rf <volume-path>
Prefer keeping data in Docker volumes rather than bind mounts to avoid permission issues.
Kubernetes & Helm
Production deployment on Kubernetes using Helm charts.
Charts Overview
| Chart | Description |
|---|---|
syndb-hub | Hub deployment (API, UI, depends on syndb-clickhouse) |
syndb-federation-node | Federation node (syndb-node, depends on syndb-clickhouse) |
syndb-clickhouse | Shared ClickHouse subchart (used by both hub and node) |
syndb-etl | ETL batch jobs (download, prepare, import, graph-precompute) |
nautilus | Umbrella chart for the NRP Nautilus cluster deployment |
Charts are located under infrastructure/helm/.
Hub Deployment
The hub chart deploys the full SynDB stack. Key values:
syndb-clickhouse:
clusterName: syndb-hub
shardRegions:
- name: dc1
region: dc1
replicas: 3
api:
image: syndb-api-rust:latest
flightPort: 50051
resources:
requests:
cpu: "1"
memory: 2Gi
The chart also creates a remote_servers.xml ConfigMap for ClickHouse cluster topology.
Node Deployment
Deploy a federation node at your institution:
syndb-clickhouse:
clusterName: syndb-node
shardRegions:
- name: dc1
region: dc1
replicas: 2
nodeApi:
enabled: true
image: syndb-api-rust:latest
flightPort: 50052
libp2pPort: 4001
hubMultiaddrs: "/ip4/<hub-ip>/udp/4001/quic-v1"
federationPassword: "<shared-secret>"
resources:
requests:
cpu: 500m
memory: 512Mi
When nodeApi.enabled=true, the chart deploys:
- A Deployment running
syndb-nodewith Flight (TCP) and libp2p (UDP) ports - A Service exposing both ports
- Environment variables auto-populated from values (cluster name, endpoints, passwords)
In Kubernetes, mDNS is disabled — use hubMultiaddrs for explicit hub discovery.
ETL Jobs
ETL runs as Kubernetes Jobs:
jobs:
hemibrain:
download:
enabled: true
resources:
requests: { cpu: "500m", memory: "2Gi" }
limits: { cpu: "600m", memory: "4Gi" }
import:
enabled: true
resources:
requests: { cpu: "1", memory: "4Gi" }
limits: { cpu: "1200m", memory: "6Gi" }
Important: Kubernetes Jobs are immutable. Before running
helm upgradewhen resource values changed, delete failed or running ETL jobs:kubectl delete job -n syndb -l app=syndb-etl --field-selector status.successful!=1
emptyDir warning:
emptyDirvolumes default to tmpfs and count against the pod’s memory cgroup limit. Add expected emptyDir data size to the memory limit.
Applying Changes
just nautilus-apply
Or manually:
helm upgrade --install syndb infrastructure/helm/nautilus/ \
-n syndb --create-namespace \
-f infrastructure/helm/nautilus/values.yaml
Environment Reference
All configuration is via environment variables. The single source of truth for versions is versions.nix; run just sync-versions to regenerate .env.
Database
| Variable | Default | Description |
|---|---|---|
POSTGRES_HOST | — | PostgreSQL host |
POSTGRES_PORT | 5433 | PostgreSQL port |
POSTGRES_USERNAME | — | PostgreSQL user |
POSTGRES_PASSWORD | — | PostgreSQL password |
POSTGRES_PATH | — | Database name |
POSTGRES_READ_HOST | Same as write host | Read replica host |
DB_POOL_MAX | 20 | Max connection pool size |
DB_POOL_MIN | 2 | Min connection pool size |
DB_CONNECT_TIMEOUT_SECS | 10 | Connection timeout |
CLICKHOUSE_HOST | — | ClickHouse host |
CLICKHOUSE_PORT | 8123 | ClickHouse HTTP port |
Object Storage (S3/MinIO)
| Variable | Default | Description |
|---|---|---|
S3_ACCESS_KEY | — | Access key |
S3_SECRET_KEY | — | Secret key |
S3_ENDPOINT | — | Custom endpoint (for MinIO) |
S3_REGION | — | AWS region |
Bucket names: syndb-mesh, syndb-swb, syndb-search, syndb-jobs. No underscores allowed in bucket names.
Authentication
| Variable | Default | Description |
|---|---|---|
PASSLIB_SECRET | — | PASETO v4.local symmetric key (minimum 32 bytes) |
SERVICE_SECRET | — | Service account registration secret |
UI_BASE_URL | — | OAuth callback redirect base URL |
ACCESS_TOKEN_LIFETIME | 900 (15 min) | Access token TTL in seconds |
REFRESH_TOKEN_LIFETIME | 2592000 (30 days) | Refresh token TTL in seconds |
OAuth Providers
| Variable | Description |
|---|---|
OA_GITHUB_ID, OA_GITHUB_SECRET | GitHub OAuth app credentials |
OA_GOOGLE_ID, OA_GOOGLE_SECRET | Google OAuth credentials |
OA_ORCID_ID, OA_ORCID_SECRET | ORCID OAuth credentials |
OA_CILOGON_ID, OA_CILOGON_SECRET | CILogon OAuth credentials |
OA_GITLAB_ID, OA_GITLAB_SECRET | GitLab OAuth credentials |
OA_GITLAB_URL | Custom GitLab instance URL |
OA_ORCID_SANDBOX | Use sandbox.orcid.org (false) |
OA_CILOGON_SANDBOX | Use test.cilogon.org (false) |
OAUTH_PROVIDER_BASE_URL | Override provider URLs (testing) |
Federation
| Variable | Default | Description |
|---|---|---|
FEDERATION_LISTEN_ADDR | OS-assigned | libp2p listen address |
FEDERATION_ENABLE_MDNS | true | Enable mDNS LAN discovery |
FEDERATION_HUB_MULTIADDRS | — | Comma-separated hub multiaddrs for WAN |
FEDERATION_CLUSTER_NAME | — | Cluster identifier (required for node mode) |
FEDERATION_CLUSTER_DESCRIPTION | — | Cluster description |
FEDERATION_CLUSTER_INSTITUTION | — | Institution name |
FEDERATION_PASSWORD | — | Shared federation secret |
FEDERATION_CLUSTER_NATIVE_PORT | 9440 | ClickHouse native port for remote() |
FEDERATION_NODE_FLIGHT_PORT | 50052 | Internal Flight gRPC port |
FEDERATION_NODE_FLIGHT_ADVERTISE | localhost:50052 | Advertised Flight endpoint |
FEDERATION_DELEGATION_TIMEOUT_SECS | 30 | Timeout for delegated requests |
Server
| Variable | Default | Description |
|---|---|---|
DEV_MODE | false | Permissive CORS, data seeding |
DEBUG | false | Verbose SQL logging |
TESTING | false | Skip federation/job queue init |
REQUEST_TIMEOUT_SECS | 60 | HTTP handler timeout |
HTTP_CLIENT_TIMEOUT_SECS | 30 | Internal HTTP client timeout |
UPLOAD_TIMEOUT | 21600 (6 hours) | Upload timeout |
FLIGHT_PORT | 50051 | Arrow Flight server port |
REQUIRE_AUTHENTICATION | true | Require auth for protected endpoints |
Rate Limiting
| Variable | Default | Description |
|---|---|---|
RATE_LIMIT_PER_SECOND | 100 | Sustained request rate per IP |
RATE_LIMIT_BURST | 200 | Burst capacity per IP |
Job Queue
| Variable | Default | Description |
|---|---|---|
JOB_QUEUE_MAX_WORKERS | 4 | Max concurrent job workers |
JOB_RESULT_TTL_HOURS | 24 | Result retention |
JOB_MAX_RESULT_BYTES | 1073741824 (1 GB) | Max result size |
Search
| Variable | Default | Description |
|---|---|---|
MEILISEARCH_HOST | — | Meilisearch host |
MEILISEARCH_PORT | 7700 | Meilisearch port |
MEILISEARCH_API_KEY | — | Meilisearch API key |
API Overview
Base URL: https://api.syndb.xyz/v1
Interactive OpenAPI documentation: api.syndb.xyz/docs
OpenAPI spec: GET /openapi.json
Authentication
Pass a PASETO access token in the Authorization header:
Authorization: Bearer <access_token>
See Authentication for how to obtain tokens.
Content Types
- Requests:
application/json - Responses:
application/json(API), Apache Arrow IPC (job results), BibTeX/RIS (citations)
Error Format
{
"error": "Human-readable error message"
}
Standard HTTP status codes: 400 (bad request), 401 (unauthenticated), 403 (insufficient permissions), 404 (not found), 409 (conflict), 429 (rate limited).
Route Map
No Authentication Required
| Method | Path | Description |
|---|---|---|
| GET | /health | Service health check (Postgres, ClickHouse, S3, Meilisearch) |
| POST | /v1/user/auth/login | Login |
| POST | /v1/user/auth/register | Register |
| POST | /v1/user/auth/register-service | Register service account (X-Service-Secret) |
| POST | /v1/user/auth/refresh | Refresh token |
| POST | /v1/user/auth/logout | Logout |
| GET | /v1/user/profile/{user_id} | Public profile lookup |
| GET | /v1/search | Full-text dataset search |
| GET | /v1/federation/ping | Federation health |
| POST | /v1/federation/register | Cluster self-registration (federation password) |
Authenticated User (AuthUser)
| Method | Path | Description |
|---|---|---|
| GET | /v1/user/profile | Current user profile |
| PATCH | /v1/user/profile | Update profile |
| POST | /v1/user/profile/scientist-tag | Generate scientist tag |
| GET | /v1/user/authenticate/cilogon | Academic metadata |
Academic / Verified User (AcademicUser)
| Method | Path | Description |
|---|---|---|
| POST | /v1/neurodata/datasets | Register dataset |
| GET | /v1/neurodata/datasets/owned | User’s datasets |
| POST | /v1/neurodata/download-urls | Pre-signed S3 download URLs |
| POST | /v1/syql/plan | Parse and validate SyQL |
| POST | /v1/syql/exec | Execute SyQL query |
| POST | /v1/syql/explain | Explain query plan |
| POST | /v1/syql/cancel | Cancel running query |
| GET | /v1/queries | List saved queries |
| POST | /v1/queries | Save query |
| POST | /v1/queries/{id}/run | Execute saved query |
| POST | /v1/jobs | Submit query job |
| POST | /v1/jobs/graph | Submit graph analysis job |
| GET | /v1/jobs | List jobs |
| GET | /v1/jobs/{id}/result | Download job result |
| GET | /v1/analytics/{id}/summary | Dataset summary stats |
| GET | /v1/analytics/{id}/neuron-morphometrics | Morphometric statistics |
| GET | /v1/analytics/{id}/graph-summary | Graph-level statistics |
| GET | /v1/analytics/{id}/reciprocity | Synapse reciprocity |
| GET | /v1/analytics/{id}/degree-distribution | Degree distribution |
| GET | /v1/analytics/zscore-comparison | Z-score comparison |
| GET | /v1/graph/{id}/metrics | Graph metrics |
| POST | /v1/graph/{id}/motifs | Triadic census |
| POST | /v1/graph/{id}/shortest-path | Shortest path |
| POST | /v1/graph/{id}/reachability | Reachability analysis |
| POST | /v1/graph/{id}/full-analysis | Full graph analysis |
| POST | /v1/graph/compare | Cross-dataset comparison |
| POST | /v1/meta-analysis/analyze | Cross-dataset meta-analysis |
| POST | /v1/meta-analysis/atlas-compare | Atlas comparison |
SuperUser
| Method | Path | Description |
|---|---|---|
| GET | /v1/federation/status | Federation overview |
| GET | /v1/federation/clusters | List clusters |
| POST | /v1/federation/clusters | Register cluster |
| DELETE | /v1/federation/clusters/{id} | Deactivate cluster |
| POST | /v1/federation/schema/sync | Sync schema to clusters |
| POST | /v1/ontology/terms | Create ontology term |
| PUT | /v1/ontology/terms/{id} | Update term |
| POST | /v1/ontology/import-csv | Bulk import terms |
Middleware Stack
Requests pass through these layers in order:
- Request ID — UUID v7, propagated via
X-Request-ID - Tracing — structured request/response logging
- Rate limiting — per-IP token bucket (see Rate Limiting)
- Timeout — 60s default, 408 on expiry
- CORS — permissive in dev mode, restricted to
api_domainin production - Compression — automatic response compression
- Body limit — 100 MB max request body
- API version —
api-version: v1response header
Health Check
curl https://api.syndb.xyz/health
{
"status": "healthy",
"components": {
"postgres": { "status": "ok", "latency_ms": 5 },
"clickhouse": { "status": "ok", "latency_ms": 12 },
"storage": { "status": "ok", "latency_ms": 8 },
"meilisearch": { "status": "ok" }
}
}
Status is degraded if any component fails. Meilisearch is optional — its absence does not degrade the overall status.
CLI Reference
The SynDB CLI (syndb) provides command-line access to all platform features.
Installation
# With GUI
pipx install syndb-cli[gui]
# Without GUI
pipx install syndb-cli
Global Options
| Option | Environment Variable | Description |
|---|---|---|
--server-url | SYNDB_SERVER_URL | API base URL |
--flight-url | SYNDB_FLIGHT_URL | Arrow Flight endpoint |
--flight-port | SYNDB_FLIGHT_PORT | Arrow Flight port |
Commands
user — Account Management
| Command | Description |
|---|---|
syndb user register | Create a new account |
syndb user login | Authenticate and store token |
syndb user logout | Revoke token |
query — Saved Queries
| Command | Description |
|---|---|
syndb query list | List saved queries |
syndb query save | Save a new query |
syndb query show {id} | Show query details |
syndb query update {id} | Update a saved query |
syndb query delete {id} | Delete a saved query |
syndb query run {id} | Execute a saved query |
syndb query status {id} | Check execution status |
dataset — Dataset Management
| Command | Description |
|---|---|
syndb dataset new | Register a new dataset |
syndb dataset prepare | Validate and convert to Parquet |
syndb dataset validate | Schema validation only |
syndb dataset upload | Upload data via Arrow Flight |
syndb dataset download | Download dataset via Arrow Flight |
syndb dataset mesh-upload | Upload 3D mesh files (.glb) |
etl — Dataset Import Pipeline
Each dataset supports download, validate, and import subcommands:
syndb etl <dataset> download
syndb etl <dataset> validate
syndb etl <dataset> import [--tables <table1,table2>]
Available Datasets
| Dataset | Key | Description |
|---|---|---|
| FlyWire | flywire | Whole-brain Drosophila connectome |
| Hemibrain | hemibrain | Janelia FlyEM v1.2.1 |
| MANC | manc | Male Adult Nerve Cord |
| MICrONS | microns | Mouse visual cortex |
| H01 | h01 | Human cortical tissue |
| BANC | banc | Brain And Nerve Cord |
| FANC | fanc | Female Adult Nerve Cord |
| Fish1 | fish1 | Zebrafish brain |
| Optic Lobe | optic-lobe | Drosophila optic lobe |
| Male CNS | male-cns | Male central nervous system |
| C. elegans Hermaphrodite | c-elegans-herm | Complete hermaphrodite |
| C. elegans Male | c-elegans-male | Complete male |
| C. elegans Developmental | c-elegans-dev | Developmental stages |
| Platynereis | platynereis | Marine annelid |
| L1 Larval | l1-larval | Drosophila L1 larval brain |
| Spine Morphometry (Kasthuri) | spine-kasthuri | Dendritic spine morphometry |
| Spine Morphometry (Ofer) | spine-ofer | Dendritic spine morphometry |
| Spine Morphometry (MICrONS) | spine-microns | Dendritic spine morphometry |
| Allen Cell Types | allen-cell-types | Allen Institute reference |
| NeuroMorpho | neuromorpho | NeuroMorpho.org archive |
federation — Federation Management
| Command | Description |
|---|---|
syndb federation init | Initialize federation config and register with hub |
syndb federation status | Show federation configuration |
syndb federation sync-schema | Sync ClickHouse schema from hub |
syndb federation test | Test connectivity (mDNS + hub + ClickHouse) |
syndb federation clusters | List all federated clusters |
syndb federation logout | Remove federation config |
See Node Setup for detailed usage.
graph-precompute — Batch Graph Computation
syndb graph-precompute --dataset-id {uuid}
Pre-computes graph metrics and stores results in ClickHouse materialized tables.
k8s — Kubernetes Administration
| Command | Description |
|---|---|
syndb k8s jobs | List ETL jobs |
syndb k8s status | View job status |
syndb k8s cleanup | Clean up completed/failed jobs |
bench — Benchmarking
Performance testing suite for API and federation queries.
completions — Shell Completions
syndb completions bash > ~/.local/share/bash-completion/completions/syndb
syndb completions zsh > ~/.zfunc/_syndb
syndb completions fish > ~/.config/fish/completions/syndb.fish
Ontology & Vocabularies
SynDB uses controlled vocabularies to standardize dataset metadata — brain regions, species, microscopy techniques, and neurotransmitter types.
Browsing Terms
List All Vocabularies
curl https://api.syndb.xyz/v1/ontology/vocabularies
List Terms in a Vocabulary
curl "https://api.syndb.xyz/v1/ontology/terms?vocabulary=brain_region"
Search Terms
curl "https://api.syndb.xyz/v1/ontology/terms?search=mushroom"
Term Hierarchy
# Get child terms
curl https://api.syndb.xyz/v1/ontology/terms/{term_id}/children
# Get ancestor terms
curl https://api.syndb.xyz/v1/ontology/terms/{term_id}/ancestors
Validating Terms
Before submitting dataset metadata, validate that your terms exist:
curl -X POST -H "Content-Type: application/json" \
https://api.syndb.xyz/v1/ontology/terms/validate \
-d '{"terms": ["mushroom_body", "lateral_horn"]}'
Returns which terms are valid and which are unrecognized.
Administration (SuperUser)
Create a Term
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://api.syndb.xyz/v1/ontology/terms \
-d '{
"name": "calyx",
"vocabulary": "brain_region",
"parent_id": "mushroom_body_term_id",
"description": "Input region of the mushroom body"
}'
Update a Term
curl -X PUT -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://api.syndb.xyz/v1/ontology/terms/{term_id} \
-d '{"description": "Updated description"}'
Deprecate a Term
curl -X PATCH -H "Authorization: Bearer $TOKEN" \
https://api.syndb.xyz/v1/ontology/terms/{term_id}/deprecate
Deprecated terms remain in the system but are flagged in search results and validation.
Bulk Import
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: text/csv" \
https://api.syndb.xyz/v1/ontology/import-csv \
--data-binary @terms.csv
CSV format: name,vocabulary,parent_name,description
Integration with Datasets
When creating or updating dataset metadata, brain region, species, and microscopy fields are validated against the ontology. Invalid terms are rejected with an error listing the closest matches.
Rate Limiting
SynDB enforces per-IP rate limiting using a token bucket algorithm.
Defaults
| Parameter | Default | Environment Variable |
|---|---|---|
| Requests per second | 100 | RATE_LIMIT_PER_SECOND |
| Burst capacity | 200 | RATE_LIMIT_BURST |
The bucket refills at the sustained rate. Burst capacity allows short spikes above the sustained rate.
Client IP Detection
The rate limiter identifies clients by IP address, checked in order:
X-Forwarded-Forheader (first address)X-Real-IPheader- Localhost (fallback for direct connections)
Behind a reverse proxy, ensure X-Forwarded-For is set correctly.
Response on Limit
When the rate limit is exceeded:
HTTP/1.1 429 Too Many Requests
Retry-After: 1
Too many requests
Client Handling
Respect the Retry-After header and implement exponential backoff:
import time
import requests
def request_with_backoff(url, headers, max_retries=3):
for attempt in range(max_retries):
resp = requests.get(url, headers=headers)
if resp.status_code != 429:
return resp
wait = int(resp.headers.get("Retry-After", 1)) * (2 ** attempt)
time.sleep(wait)
raise Exception("Rate limited after retries")
For batch operations, throttle to well under 100 req/s to leave headroom for interactive use.
Choosing a license for your dataset
When sharing microscopy data derived datasets, selecting an appropriate license is crucial for ensuring the proper use and distribution of your work. Different licenses offer varying degrees of freedom and control over your data. Here, we outline some popular licenses, their key features, and considerations to help you choose the right one for your needs.
Considerations for Choosing a License
- Intended Use: Determine whether you want your data to be used freely or with certain restrictions, such as non-commercial use only.
- Credit and Attribution: Decide if you want to receive credit for your work and if it’s important for you to see how others are using your data.
- Derivative Works: Consider whether you want derivative works to be allowed and if they should be shared under the same terms.
- Commercial Use: Reflect on whether you want to permit commercial use of your data. Your institution may have specific policies regarding commercial use.
Licenses
The following are some common licenses used for sharing data on the web, which we also use on the SynDB platform.
Tip
Use ODC over CC
We recommend the ODC licenses for datasets on SynDB. You are free to use Creative Commons licenses as well, just note that they are not designed for data. The default license for SynDB is ODC-BY.
Open Data Commons (ODC) Licenses
Open Data Commons (ODC) licenses are specifically tailored for datasets and databases, focusing on maximizing accessibility and proper attribution in data sharing.
PDDL (Public Domain Dedication and License)
Places the dataset in the public domain, allowing unrestricted use and maximizing openness and usability.
ODC-BY (Attribution License)
Allows use with proper credit to the original creator, ensuring acknowledgment while enabling broad use.
ODC-ODbL (Open Database License)
Permits sharing, modifying, and using the dataset with attribution and requires derivative databases to be shared under the same license, promoting open access and collaborative improvement while keeping derivative databases equally accessible.
Creative Commons (CC) Licenses
Creative Commons (CC) licenses are versatile and well-suited for a wide range of creative works, including datasets
CC0 (Public Domain Dedication)
Allows the use of the dataset without any restrictions, making it ideal for maximizing usability and dissemination.
CC BY (Attribution)
Allows users to use the dataset as long as they provide appropriate credit to the original creator, ensuring wide use while acknowledging the creator’s work.
CC BY-SA (Attribution-ShareAlike)
Permits use of the dataset with appropriate credit and requires sharing derivative works under the same license, keeping derivative works open and shareable under the same terms.
CC BY-NC (Attribution-NonCommercial)
Allows use for non-commercial purposes with proper credit, restricting use to non-commercial purposes while still enabling academic and research use.
CC BY-NC-SA (Attribution-NonCommercial-ShareAlike)
Permits non-commercial use with appropriate credit and sharing of derivative works under the same license, ensuring non-commercial use and open sharing under the same terms.
Conclusion
Selecting the right license for your microscopy data derived dataset is essential for controlling how your data is used and ensuring it meets your sharing objectives. By considering the options and your specific needs, you can choose a license that balances openness, credit, and control, fostering collaboration and advancement in your field.
Metrics structuring for contribution
Note
Prerequisites
This article requires that you understand how data is stored on SynDB, we recommend reading through the overview article if you are uncertain.
This article is a guide for contributors who wish to upload their data to SynDB. Please don’t hesitate to ask for help on the Discord channel if you have any questions; this part can be challenging.
Data structuring
Schema
Each SynDB table has its own schema, which can be found on our GitHub repository. The schema defines the supported column names and their data types. The data must be structured in a way that is compatible with the schema of the table you are contributing to.
The column names and the data stored under them must be in a format that is compatible with the type. You can find the supported column names for each SynDB table in the . You may use the glossary at the end of this article for reference.
Note
Nano
We use nanometers as the unit for all measurements; includes volume, radius, and distance.
Sourcing raw data
You may upload the sourcing raw data files including meshes or SWL to SynDB. Place the absolute path to the file in your table file. The following are supported:
- Meshes in
.glbformat, column name:mesh_path - SWC files,
.swc, column name:swc_path
This list is the main tracker for the supported formats. You may request additional formats on the Discord channel. The SynDB team will review the request and consider adding the new format to the platform.
Columns
Most column types are self-explanatory, but some require additional explanation.
Identifiers and relations
The CID column defined in your table can have any unique hashable value, it will be replaced by a UUID when uploaded to SynDB. When uploading a relational dataset, the cid column in the parent will be used to correlate the relations to the children by their parent_id; meaning the hashable value in the parent cid column must match the parent_id in the child. parent_enum can be omitted as the compartments are defined at the tabular level, and will, therefore, be added automatically.
Example
Notice the parent_id column in the child table, this is the cid of the parent table. The parent_enum column is not present in the child table, as it is defined at the tabular file name.
vesicle.csv, child
| cid | neurotransmitter | voxel_radius | distance_to_active_zone | minimum_normal_length | parent_id | centroid_z | centroid_x | centroid_y |
|---|---|---|---|---|---|---|---|---|
| 0 | glutamate | 26.9129 | 705.2450 | 23 | 1 | 4505.232 | 1996.224 | 4953.6 |
| 1 | glutamate | 25.5388 | 615.0213 | 23 | 1 | 4505.232 | 1996.224 | 4953.6 |
| 2 | glutamate | 29.5260 | 513.0701 | 23 | 1 | 4505.232 | 1996.224 | 4953.6 |
| 3 | glutamate | 30.5131 | 479.9224 | 23 | 1 | 4505.232 | 1996.224 | 4953.6 |
| 4 | glutamate | 28.3977 | 454.8248 | 23 | 1 | 4505.232 | 1996.224 | 4953.6 |
| 5 | glutamate | 30.2033 | 459.7557 | 23 | 2 | 4505.232 | 1996.224 | 4953.6 |
| 6 | glutamate | 33.4548 | 374.8131 | 23 | 2 | 4505.232 | 1996.224 | 4953.6 |
| 7 | glutamate | 32.0890 | 455.9293 | 23 | 4 | 4505.232 | 1996.224 | 4953.6 |
axon.csv, parent
| voxel_volume | mitochondria_count | total_mitochondria_volume | cid |
|---|---|---|---|
| 385668034.56 | 1 | 93208043.52 | 1 |
| 1492089016.32 | 4 | 412054179.84 | 2 |
| 327740497.92 | 0 | 0 | 4 |
Glossary
| Key | Description |
|---|---|
dataset_id | The unique identifier for the dataset, of type uuid. |
cid | The unique identifier for a SynDB unit within the dataset, of type uuid. |
parent_id | The CID of the parent component, of type uuid. |
parent_enum | An integer representing the type or category of the parent component, of type int. |
polarity | The polarity of the neuron, of type ascii. |
voxel_volume | The volume of the voxel, of type double. |
voxel_radius | The radius of the voxel, of type double. |
s3_mesh_location | The location of the mesh in S3 storage, of type smallint. |
mesh_volume | The volume of the mesh, of type double. |
mesh_surface_area | The surface area of the mesh, of type double. |
mesh_area_volume_ratio | The ratio of the surface area to the volume of the mesh, of type double. |
mesh_sphericity | The sphericity of the mesh, of type double. |
centroid_z | The z-coordinate of the centroid, of type double. |
centroid_x | The x-coordinate of the centroid, of type double. |
centroid_y | The y-coordinate of the centroid, of type double. |
s3_swb_location | The location of the SWB in S3 storage, of type smallint. |
terminal_count | The count of terminals, of type int. |
mitochondria_count | The count of mitochondria, of type int. |
total_mitochondria_volume | The total volume of mitochondria, of type double. |
neuron_id | The unique identifier for the associated neuron, of type uuid. |
vesicle_count | The count of vesicles, of type int. |
total_vesicle_volume | The total volume of vesicles, of type double. |
forms_synapse_with | The unique identifier of the synapse that the component forms with, of type uuid. |
connection_score | The score representing the strength or quality of the connection, of type double. |
cleft_score | The score for the synaptic cleft, of type int. |
GABA | The concentration or presence of GABA neurotransmitter, of type double. |
acetylcholine | The concentration or presence of acetylcholine neurotransmitter, of type double. |
glutamate | The concentration or presence of glutamate neurotransmitter, of type double. |
octopamine | The concentration or presence of octopamine neurotransmitter, of type double. |
serine | The concentration or presence of serine neurotransmitter, of type double. |
dopamine | The concentration or presence of dopamine neurotransmitter, of type double. |
root_id | The external root identifier from the source platform (e.g. FlyWire), of type int. |
pre_id | The unique identifier of the pre-synaptic component, of type uuid. |
post_id | The unique identifier of the post-synaptic component, of type uuid. |
dendritic_spine_count | The count of dendritic spines, of type int. |
neurotransmitter | The type of neurotransmitter present in a vesicle, of type ascii. |
distance_to_active_zone | The distance from the vesicle to the active zone, of type double. |
minimum_normal_length | The minimum normal length, of type int. |
ribosome_count | The count of ribosomes within the endoplasmic reticulum, of type int. |
Build Caching
SynDB uses multiple layers of caching to keep compile times short across local development, CI pipelines, and production deploys.
Cargo compiler flags
Configured in .cargo/config.toml, these flags speed up every local cargo
invocation:
| Flag | Effect |
|---|---|
-C link-arg=-fuse-ld=mold | Mold linker — significantly faster than the default ld or lld (Linux only) |
-Zshare-generics=y | Share monomorphized generics between crates, reducing codegen work |
-Zthreads=8 | Parallel compiler frontend (parsing, macro expansion, type checking) |
codegen-backend = "cranelift" | Dev profile uses Cranelift instead of LLVM for faster debug builds |
codegen-backend = "llvm" (for deps) | Dependencies still use LLVM for better optimization |
CI caching (GitHub Actions)
In CI, syndb-ci runs tests and builds directly on the host (no Docker for
the ci subcommand). Cargo artifacts are cached between runs via
Swatinem/rust-cache@v2, which
persists target/ and the cargo registry keyed by branch and Cargo.lock
hash.
For integration tests (local-stack-test, e2e-test), syndb-ci uses
bollard to start ephemeral Docker containers (PostgreSQL, ClickHouse, MinIO)
on a shared Docker network. Test binaries run on the host with environment
variables pointing at localhost:<port>. No cargo cache volumes are needed
inside containers — the host target/ is used directly.
Nix OCI cache
Local stack images (just stack-prepare) for the API and ETL are built with
Nix and cached using syndb-ci nix-oci-cache. This command uses Nix
store paths as content-addressed fingerprints to skip unnecessary rebuilds:
nix build .#oci-syndb-apiproduces a store path (a hash of all inputs)- The script compares the current store path against a stamp file
(
/tmp/.oci-syndb-api.storepath) - If unchanged, the build is skipped entirely
- If changed, the new tarball is copied to
/tmp/and loaded into Docker
This means just stack-prepare is near-instant when source hasn’t changed.
Nix Crane dependency caching
For nix flake check (CI) and Nix-based OCI image builds, the project uses
Crane with a split dependency build:
# nix/rust.nix
mkCargoArtifacts = system:
craneLib.buildDepsOnly (mkCommonArgs system);
buildDepsOnly compiles all workspace dependencies into a cached Nix
derivation. Subsequent builds of workspace crates reuse these artifacts,
so only the project’s own code is recompiled. Since Nix derivations are
content-addressed, the dependency cache is automatically invalidated when
Cargo.lock changes.
UI source-hash cache
The UI image (just _stack-prepare-ui) uses a source-hash stamp to skip
rebuilds when Python source files haven’t changed:
- SHA-256 of all files in
packages/ui/src/,packages/syndb-ql/python/,pyproject.toml,uv.lock, andsyndb-ql-python/Cargo.toml - Compared against
/tmp/.syndb-ui.srchash - If the hash matches and
syndb-ui:devexists in Docker, the build is skipped
Summary
| Layer | Scope | Mechanism | Invalidation |
|---|---|---|---|
| Cargo flags | Local dev | Mold, Cranelift, parallel frontend | N/A (always active) |
| GitHub Actions cache | CI pipelines | Swatinem/rust-cache@v2 | Branch + Cargo.lock hash |
| Nix OCI cache | stack-prepare (API, ETL) | Nix store-path stamps | Content-addressed (any input change) |
| Crane deps | nix flake check, OCI images | buildDepsOnly derivation | Cargo.lock changes |
| UI source hash | stack-prepare (UI) | SHA-256 file stamp | Source file changes |
Troubleshooting
Find up-to-date explanations of different types of errors and pointers on how to resolve them.
403, Unauthorized
Verification
Academic verification is required for computationally or network-heavy tasks. This is to ensure that the resources are not being misused. You may verify yourself after registering on the platform — see Authentication for details on CILogon verification.
Dataset
A dataset belongs to the creator, and groups that the creator chooses to share its ownership. If you are unable to access a dataset, you fit neither of these categories. You may request access to the dataset from the creator.
429, Too Many Requests
You have exceeded the rate limit (100 requests/second by default). Respect the Retry-After header and implement exponential backoff. See Rate Limiting.
Job Failures
If a submitted job fails:
- Check the job status:
GET /v1/jobs/{job_id}— theerror_messagefield describes the failure - Common causes: query timeout, result too large (>1 GB), ClickHouse resource limits
- Rerun the job:
POST /v1/jobs/{job_id}/rerun
See Jobs System for details.
SyQL Errors
- Parse errors: Check SyQL syntax in the error message. Use
POST /v1/syql/planto validate without executing. - Resolution errors: A referenced table or column does not exist. Check the Data Structuring guide for valid column names.
- Timeout: Large queries may exceed the 60s HTTP timeout. Use
POST /v1/syql/execto submit as an async job instead.
Federation Issues
See Federation Troubleshooting for:
- Node discovery failures (mDNS, multiaddrs)
- Schema version mismatches
- Cluster health states
- Docker Compose federation profile issues