Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Welcome to SynDB

SynDB is a platform for finding, sharing, and meta-analyzing synapse metrics derived from high-resolution microscopy. It supports federated deployments where institutions retain data sovereignty while participating in cross-institutional analysis.

Resources

Why use SynDB?

SynDB serves three audiences: data owners who produce microscopy data, data scientists who analyze it, and institutions that want to participate in federated analysis without giving up control of their data.

Image data owner

  • Data sharing: Others can use your data to teach, increasing the educational value of the data.
  • Citations: Whenever your data is used in a publication, you will be cited, increasing your visibility in the scientific community.
  • Provenance tracking: Version history, lineage, and auto-generated citations (BibTeX, RIS) for your datasets.

Data scientist

  • Meta-analysis: Compare data across thousands of experiments using cross-dataset meta-analysis.
  • SyQL queries: A declarative query language that resolves metadata into optimized SQL.
  • Graph analysis: Network analysis on connectome data — motifs, shortest paths, reachability, cross-dataset comparison.
  • Data visualization: Use the data to create visualizations for publications or presentations.
  • Statistical modelling: Use the data to create models that can predict outcomes in future experiments.

Node operator / Institution

  • Data sovereignty: Keep your data on your infrastructure — it never leaves your network.
  • Federated meta-analysis: Participate in cross-institutional queries without transferring data.
  • Minimal footprint: A federation node requires only ClickHouse and the syndb-node binary.
  • Schema sync: The hub pushes DDL migrations to your node automatically.

See Federation Overview for setup details.

Installation

The SynDB platform provides several UIs directed towards different user groups. We recommend using the UIs for those getting started with SynDB. For advanced users, the API is the most flexible way to interact with the platform, see the Advanced section.

User interfaces

The SynDB interfaces are implemented with the Python programming language. To run them you need to have a Python environment.

Tip

Setup Python environment

This requires two things (1) Python interpreter installed in your system, (2) Python environment management for the SynDB packages.

There many solutions to both requirements, we recommend using pyenv to solve 1st problem, and pipx for the 2nd. Follow the installation guide for your operating system.

Install

pipx:

pipx install syndb-cli[gui]

pip:

pip install syndb-cli[gui]

Upgrade

To upgrade the SynDB CLI along with the GUI (if installed), run the following command:

pipx:

pipx upgrade syndb-cli

pip:

pip install syndb-cli[gui] --upgrade

Advanced

syndb-cli without GUI

pipx:

pipx install syndb-cli

pip:

pip install syndb-cli

Direct API usage

The API can be accessed through the OpenAPI documentation. For a more tailored approach, you may interact with the API through the syndb-data Python package:

poetry:

poetry add syndb-data

pip:

pip install syndb-data

Alternatively, you may generate your own language bindings using openapi-generator; you will need the SynDB openapi schema.

Quick start

Command line interface

Following the installation, you may run the SynDB CLI using the following command:

syndb

The internal documentation of the CLI will guide you through the available commands and options. See the upload documentation for uploading with the CLI.

Graphical user interface

After installing the SynDB CLI, which contains the GUI, you may run the GUI using the following command:

syndb gui

The GUI will open in your default web browser, and in case the browser is already open, a new tab will be created. You might also have to refresh the new page to see the GUI.

Tip

Dark Mode

Use the dark reader extension for dark mode in the GUI.

Next steps

Authentication

SynDB uses PASETO v4 tokens for authentication. Access tokens authorize API requests; refresh tokens obtain new access tokens without re-authenticating.

Account Types

TypeHow to createCapabilities
RegularPOST /v1/user/auth/register or CLI syndb user registerBrowse, search datasets
AcademicVerify via CILogon (institutional login)All regular + SyQL, graph analysis, meta-analysis, upload, jobs
ServicePOST /v1/user/auth/register-service with X-Service-Secret headerSame as Academic (auto-verified)
SuperUserPromoted by existing superuserAll + federation admin, ontology management

Academic verification is required for compute-intensive operations: query execution, graph analysis, analytics, meta-analysis, and dataset upload.

Registration & Login

CLI:

syndb user register
syndb user login

API:

# Register
curl -X POST https://api.syndb.xyz/v1/user/auth/register \
  -H "Content-Type: application/json" \
  -d '{"email": "[email protected]", "password": "...", "display_name": "Jane Doe"}'

# Login — returns access_token and refresh_token
curl -X POST https://api.syndb.xyz/v1/user/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email": "[email protected]", "password": "..."}'

Token Lifecycle

  1. Login returns an access token (15 min TTL) and a refresh token (30 day TTL)
  2. Use the access token in requests: Authorization: Bearer <access_token>
  3. When the access token expires, exchange the refresh token for a new pair:
    curl -X POST https://api.syndb.xyz/v1/user/auth/refresh \
      -H "Content-Type: application/json" \
      -d '{"refresh_token": "..."}'
    
  4. Each refresh rotates the token — the old refresh token is invalidated

Refresh tokens use family-based rotation: reuse of a revoked token invalidates the entire family, forcing re-authentication.

OAuth Providers

Authenticate through institutional or social identity providers:

ProviderUse caseScopes
CILogonAcademic institutional login (universities, research labs)openid, email, org.cilogon.userinfo
GitHubSocial login + ORCID associationuser:email
GoogleSocial loginopenid, email, profile
GitLabSocial login (supports self-hosted instances)read_user
ORCIDResearcher ID association (requires existing account)openid

All OAuth flows use PKCE (Proof Key for Code Exchange) with SHA-256.

Academic Verification via CILogon

CILogon links your institutional identity to your SynDB account, automatically verifying you as an academic user:

  1. Log in to SynDB
  2. Navigate to CILogon verification (or GET /v1/user/authenticate/cilogon/authorize)
  3. Authenticate with your institution’s SSO
  4. Your account is marked as verified — unlocking SyQL, graph analysis, and upload

Service Accounts

For automated pipelines and integrations:

curl -X POST https://api.syndb.xyz/v1/user/auth/register-service \
  -H "Content-Type: application/json" \
  -H "X-Service-Secret: <SERVICE_SECRET>" \
  -d '{"email": "[email protected]", "password": "..."}'

Service accounts are auto-verified and bypass academic checks. The X-Service-Secret must match the server’s SERVICE_SECRET environment variable.

Logout

# Revokes the refresh token
curl -X POST https://api.syndb.xyz/v1/user/auth/logout \
  -H "Content-Type: application/json" \
  -d '{"refresh_token": "..."}'

Overview

The SynDB data platform is accessible through the API. By search, you may find and download high level metrics; by upload, you may share your data to become part of a meta-analytical study.

Composition

The SynDB data platform is designed to provide a comprehensive and organized repository of high-resolution microscopy data and associated metadata. The composition of SynDB can be broken down into three main components: Metadata, Image Metrics, and Raw Data. Each of these components plays a crucial role in the functionality and utility of the platform.

Metadata

The metadata is used to define and retrieve datasets. It stores metadata about the data in the respective dataset:

  • Brain region
  • Sourcing model animal
  • Genetic manipulations (mutations)
  • Microscopy method
  • Publication information

The metadata is defined by the data owner during upload.

Warning

Dataset

You must split your dataset into individual SynDB datasets if any of these fields differ within your own dataset.

Image metrics

The image metrics in SynDB are derived from high-resolution microscopy assays, processed using sophisticated algorithms and models. These metrics form the primary data of interest within the platform. Each neuronal compartment and structure has its own unique set of metric categories, which necessitates distinct database schemas. We will refer to these phenomena as SynDB tables in future references.

To facilitate efficient data management, every imaging metric is linked to a dataset via its ID. This linkage enables robust search capabilities by filtering through metadata, thus avoiding the need to handle terabytes of raw data directly. You can learn more about how dataset metadata filtration works in the article on search.

The flexible data model of SynDB supports this functionality by defining specific parameters for each compartment and structure. These varied models are unified into comprehensive datasets through dataset metadata, which effectively organizes data groups across the platform. This unified approach ensures that users can efficiently access and analyze the vast array of imaging metrics available in SynDB.

Raw data

Raw data is the original data from which the metrics are derived. The raw data is stored in the database and can be requested from its metric counterpart. Raw data sets currently include meshes and SWC files. These are included at the discretion of the data owner.

Organization & Tracking

  • Collections & Tags: Group datasets into curated collections and apply tags for discovery.
  • Provenance & Citations: Track version history, data lineage, and generate citations in BibTeX/RIS format. Export metadata as JSON-LD for linked data integration.

Search

The search feature filters through datasets based on the search terms provided by the user. The search terms can be combined to narrow down the search results.

By default, every search field is AND meaning that the every term has to exist for in the resulting dataset.

Note

TODO

Add capabilities to customize the logical operators in the search, e.g., AND, OR, NOT.

Download the search results

Following the search, you may download the imaging derived metrics of the datasets from the search results. You will get a single .tar.xz file with parquet files inside. You may read parquet files using the pandas or polars library in Python.

Note

Other languages

Apache parquet is a file format supported by most popular programming languages. You may find libraries for reading parquet files in your preferred language.

Upload

Note

Prerequisites

This article requires that you understand how data is stored on SynDB, we recommend reading through the overview article if you are uncertain.

Uploading to SynDB is a multistep process, and requires understanding of the SynDB dataset model.

The process

Preparation

We recommend you to follow the guide in the exact sequence provided. This ensures the instructions are followed effectively and idiomatically.

Terms and conditions

You must accept the terms and conditions before uploading data. The terms include:

  • Statement that the data is not false or misleading
  • Redistribution rights
  • Data licensing agreement with the license of your choice, see guide to pick license; the default license is ODC-BY.

Data structuring

SynDB utilizes data standardization to facilitate uploads. Your imaging metrics must be in a tabular data format; for instance, .xlsx, .csv, or .parquet. Read more about the data structuring in the contributor’s guide.

Login

Once you enter the upload page, you will be prompted to log in to your SynDB account if you are not already; furthermore, you must verify your academic status by logging in to your institution’s account.

The upload

You can upload data using the CLI or the GUI; including mixing the usage of both. We recommend that you only use the GUI for the first time.

1. Assign IDs, and correlate relations

Each SynDB unit requires a unique ID assigned before being uploaded to the platform. The GUI does this automatically, but not the CLI. When you have multiple SynDB tables under one dataset it is expected that these have some relations with each other.

Warning

Dataset integrity

As it may lead to undefined behaviour, it is disallowed to upload SynDB table data that are unrelated under the same dataset!

Meaning that you cannot upload a table of neurons and a table of synapses under the same dataset unless each synapse has a relation to a neuron from the respective table of neurons.

GUI

The GUI will automatically assign UUIDs to each SynDB unit. The relations are correlated based on the top-down hierarchy of the tables, you may find the latest version of the hierarchy in the source on GitHub.

CLI

TODO

2. Selecting or creating the SynDB dataset metadata

As mentioned, in the overview article, every dataset has a metadata defined by the data owner during the upload. You can either select an existing dataset or create a new one.

3. Confirm and upload

Before the upload starts you will be prompted to confirm the dataset and the data you are uploading. Once you confirm, the upload will start. Should be relatively quick.

Delete owned datasets

You may at any time delete datasets that you own. This will remove the dataset and all the data associated with it. The deletion is permanent and cannot be undone.

External sources

SynDB supports importing connectomics data from 20+ major connectome datasets. This page covers the most common imports. See the CLI Reference for the full list of supported datasets.

Note

Dataset UUID

The <syndb-dataset-id> is the UUID of the SynDB dataset that will be associated with the imported data. You can copy and paste it from the dataset management page on the GUI.

FlyWire

FlyWire connectomics data can be imported from CAVE CSV exports.

Validate your FlyWire data directory:

syndb etl flywire validate --data-dir external_datasets/FlyWire

Import into your dataset:

syndb etl flywire import \
  --data-dir external_datasets/FlyWire \
  --dataset-id <syndb-dataset-id> \
  --table neurons \
  --table synapses

FlyWire also supports a synapses-detailed table for individual synapse positions (large, batched import).

Hemibrain

The Hemibrain v1.2.1 dataset from Janelia FlyEM can be downloaded directly from Google Cloud Storage.

Download the dataset:

syndb etl hemibrain download --output-dir external_datasets/Hemibrain --extract

Validate the data directory:

syndb etl hemibrain validate --data-dir external_datasets/Hemibrain

Import into your dataset:

syndb etl hemibrain import \
  --data-dir external_datasets/Hemibrain \
  --dataset-id <syndb-dataset-id> \
  --table neurons \
  --table synapses

MANC (Male Adult Nerve Cord)

The MANC / MaleCNS v0.9 dataset from Janelia FlyEM uses Apache Arrow Feather files.

Download the dataset:

syndb etl manc download --output-dir external_datasets/MANC

Validate the data directory:

syndb etl manc validate --data-dir external_datasets/MANC

Import into your dataset:

syndb etl manc import \
  --data-dir external_datasets/MANC \
  --dataset-id <syndb-dataset-id> \
  --table neurons \
  --table synapses

Warning

Download size

The MANC dataset includes the connectome-weights Feather file (~1.1 GB). Ensure sufficient disk space before downloading.

Collections & Tags

Organize datasets into curated collections and apply tags for discovery.

Tags

Tags are free-form metadata labels attached to datasets. They surface in search results and help users discover related data.

Add Tags

Tags are assigned during dataset creation or updated afterward via the dataset metadata endpoints.

Search by Tags

curl "https://api.syndb.xyz/v1/search?q=drosophila+mushroom+body"

The full-text search indexes dataset tags alongside titles and descriptions. See Search.

Collections

Collections are curated groupings of datasets — for example, “All Drosophila connectomes” or “Lab X publication datasets.”

Create a Collection

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://api.syndb.xyz/v1/neurodata/collections \
  -d '{
    "name": "Drosophila Connectomes",
    "description": "All Drosophila melanogaster connectome datasets"
  }'

List Collections

curl https://api.syndb.xyz/v1/neurodata/collections

Add Datasets to a Collection

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://api.syndb.xyz/v1/neurodata/collections/{collection_id}/datasets \
  -d '{"dataset_id": "..."}'

Collections are useful for meta-analysis — pass a collection’s dataset IDs to the meta-analysis endpoint to compare all datasets in the group.

Provenance & Citations

SynDB tracks dataset lineage, version history, and generates machine-readable citations.

Version History

Each dataset maintains a version history. View all versions:

curl -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/neurodata/datasets/{dataset_id}/versions

Provenance Chain

The provenance endpoint shows the audit trail — who created, modified, or derived from the dataset:

curl -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/neurodata/datasets/{dataset_id}/provenance

Lineage

Track derived-from relationships between datasets:

curl -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/neurodata/datasets/{dataset_id}/lineage

Citations

Generate citations in standard formats:

# BibTeX
curl "https://api.syndb.xyz/v1/neurodata/datasets/{dataset_id}/citation?format=bibtex"

# RIS (for EndNote, Zotero)
curl "https://api.syndb.xyz/v1/neurodata/datasets/{dataset_id}/citation?format=ris"

JSON-LD

Export dataset metadata as linked data for integration with knowledge graphs and semantic web tools:

curl "https://api.syndb.xyz/v1/neurodata/datasets/{dataset_id}/jsonld"

Returns a JSON-LD document following schema.org and neuroscience ontology standards.

Access Requests

For restricted datasets, request access from the dataset owner:

# Request access
curl -X POST -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/neurodata/datasets/{dataset_id}/access-request

# Check access status
curl -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/neurodata/datasets/{dataset_id}/access

The dataset creator receives the request and can approve or deny it.

SyQL Query Language

SyQL (SynDB Query Language) is a declarative query language for neuroanatomical data. It resolves dataset metadata into optimized ClickHouse SQL, handles access control, and submits queries to the async job system.

Requires Academic verification.

Workflow

SyQL has a three-stage pipeline:

StageEndpointWhat it does
PlanPOST /v1/syql/planParse → validate → resolve metadata → return logical plan
ExplainPOST /v1/syql/explainPlan + compile to SQL → return compiled query and advisories
ExecutePOST /v1/syql/execPlan + compile + submit to job queue → return job ID

Use plan to validate syntax. Use explain to preview the generated SQL before committing to execution. Use exec when you’re ready to run.

Plan

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://api.syndb.xyz/v1/syql/plan \
  -d '{"query": "SELECT mesh_volume, brain_region FROM neurons WHERE dataset_id = '\''...'\'' LIMIT 1000"}'

Returns the parsed logical plan: resolved tables, columns, filters, and metadata.

Explain

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://api.syndb.xyz/v1/syql/explain \
  -d '{"query": "SELECT mesh_volume FROM neurons WHERE brain_region = '\''mushroom_body'\'' LIMIT 1000"}'

Returns:

  • The compiled ClickHouse SQL
  • Query advisories (e.g., missing indexes, large scan warnings)
  • Estimated cost

Execute

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://api.syndb.xyz/v1/syql/exec \
  -d '{"query": "SELECT mesh_volume FROM neurons WHERE brain_region = '\''mushroom_body'\'' LIMIT 1000"}'

Returns a job_id. Track and download results via the Jobs System.

Cancel

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://api.syndb.xyz/v1/syql/cancel \
  -d '{"query_id": "..."}'

Federation Scope

Add "scope": "federation" to fan the query out across all federated nodes:

{
  "query": "SELECT COUNT(*) FROM synapses GROUP BY dataset_id",
  "scope": "federation"
}

See Cross-Cluster Queries for details.

Saved Queries

Frequently used SyQL queries can be saved for reuse. See Saved Queries.

Saved Queries

Save SyQL queries for reuse, sharing, and scheduled re-execution.

Requires Academic verification.

Save a Query

From SyQL

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://api.syndb.xyz/v1/queries/from-syql \
  -d '{
    "name": "Mushroom body neuron volumes",
    "query": "SELECT mesh_volume FROM neurons WHERE brain_region = '\''mushroom_body'\''",
    "description": "All neuron mesh volumes in the mushroom body"
  }'

Direct Save

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://api.syndb.xyz/v1/queries \
  -d '{
    "name": "My query",
    "query": "...",
    "description": "..."
  }'

List Saved Queries

curl -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/queries

Get a Query

curl -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/queries/{query_id}

Update

curl -X PUT -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://api.syndb.xyz/v1/queries/{query_id} \
  -d '{"name": "Updated name", "query": "...", "description": "..."}'

Delete

curl -X DELETE -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/queries/{query_id}

Run a Saved Query

curl -X POST -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/queries/{query_id}/run

Submits the query to the job system and returns a job ID.

CLI

syndb query list
syndb query save --name "My query" --query "SELECT ..."
syndb query show {query_id}
syndb query run {query_id}
syndb query status {query_id}
syndb query update {query_id} --name "New name"
syndb query delete {query_id}

Analytics

Pre-computed analytics endpoints for dataset exploration. These query ClickHouse materialized views and return results quickly (cached for 5 minutes).

Requires Academic verification.

Dataset Summary

Row counts per compartment type:

curl -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/analytics/{dataset_id}/summary

Returns counts for neurons, synapses, dendrites, axons, pre-synaptic terminals, dendritic spines, vesicles, mitochondria, and other compartment types present in the dataset.

Neuron Morphometrics

Morphological statistics for neurons in a dataset:

curl -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/analytics/{dataset_id}/neuron-morphometrics

Returns distributions of mesh volume, surface area, sphericity, terminal count, and other morphometric features.

Z-Score Comparison

Standardized comparison of a metric across multiple datasets:

curl -H "Authorization: Bearer $TOKEN" \
  "https://api.syndb.xyz/v1/analytics/zscore-comparison?metric=mesh_volume&dataset_ids=uuid1,uuid2,uuid3"

Returns per-dataset z-scores normalized against the pooled distribution — useful for identifying outlier datasets.

Graph Summary

Network-level statistics for connectome datasets:

curl -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/analytics/{dataset_id}/graph-summary

Returns: node count, edge count, density, number of connected components, mean clustering coefficient.

Reciprocity

Fraction of bidirectional synaptic connections:

curl -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/analytics/{dataset_id}/reciprocity

Degree Distribution

Top neurons by connectivity:

curl -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/analytics/{dataset_id}/degree-distribution

Returns in-degree, out-degree, and total degree distributions.

Graph Analysis

In-memory graph analysis on connectome datasets. SynDB constructs a directed graph from synapse data in ClickHouse (up to 10M edges) and runs network algorithms using petgraph.

Requires Academic verification.

Graph Metrics

Basic network statistics:

curl -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/graph/{dataset_id}/metrics

Returns: node count, edge count, density, number of connected components, mean clustering coefficient, diameter, hub neurons (highest centrality).

Motif Analysis (Triadic Census)

Count all 16 three-node subgraph patterns (triadic census):

curl -X POST -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/graph/{dataset_id}/motifs

Compare by Synapse Type

Compare motif distributions across different neurotransmitter types:

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://api.syndb.xyz/v1/graph/{dataset_id}/motifs/compare-synapse-types

Shortest Path

Find the shortest path between two neurons:

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://api.syndb.xyz/v1/graph/{dataset_id}/shortest-path \
  -d '{"source": "neuron-id-1", "target": "neuron-id-2"}'

Uses Dijkstra’s algorithm. Supports configurable edge weight modes.

Reachability

Find all neurons reachable within N hops:

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://api.syndb.xyz/v1/graph/{dataset_id}/reachability \
  -d '{"source": "neuron-id", "max_hops": 3}'

BFS traversal, maximum 100 hops.

Reachability Curve

Sample how reachability grows with hop count:

curl -H "Authorization: Bearer $TOKEN" \
  "https://api.syndb.xyz/v1/graph/{dataset_id}/reachability-curve?max_hops=20&samples=100"

Returns the fraction of the network reachable at each hop distance, sampled from random starting neurons (max 500 samples, max 20 hops).

Full Analysis

Run metrics + motifs + hub neuron detection in one call:

curl -X POST -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/graph/{dataset_id}/full-analysis

Cross-Dataset Comparison

Compare graph properties across multiple datasets:

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  "https://api.syndb.xyz/v1/graph/compare" \
  -d '{"dataset_ids": ["uuid-1", "uuid-2", "uuid-3"]}'

Graph Precompute (CLI)

For large datasets, precompute graph metrics and store results in ClickHouse materialized tables:

syndb graph-precompute --dataset-id {uuid}

This is a batch operation typically run as part of the ETL pipeline or as a Kubernetes job.

Meta-Analysis

Cross-dataset meta-analysis computes effect sizes and heterogeneity statistics across multiple datasets, enabling comparisons that no single dataset can answer.

Requires Academic verification.

Cross-Dataset Analysis

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://api.syndb.xyz/v1/meta-analysis/analyze \
  -d '{
    "table": "neurons",
    "metric": "mesh_volume",
    "group_by": "brain_region",
    "dataset_ids": ["uuid-1", "uuid-2", "uuid-3"]
  }'

Parameters

FieldRequiredDescription
tableYesTarget table: neurons, synapses, dendrites, axons, pre_synaptic_terminals, dendritic_spines, vesicles, mitochondria
metricYesColumn to analyze (e.g., mesh_volume, mesh_surface_area, connection_score)
group_byNoGrouping column (e.g., brain_region, neurotransmitter)
dataset_idsNoSpecific datasets (omit for all accessible datasets)
scopeNo"local" (default) or "federation"
cluster_idsNoSpecific federation clusters (when scope is federation)

Atlas Comparison

Compare dataset metrics against reference atlases (pre-aggregated materialized views):

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://api.syndb.xyz/v1/meta-analysis/atlas-compare \
  -d '{
    "dataset_id": "...",
    "table": "neurons",
    "metric": "mesh_volume"
  }'

Federation Scope

To run meta-analysis across federated nodes:

{
  "table": "synapses",
  "metric": "connection_score",
  "group_by": "neurotransmitter",
  "scope": "federation",
  "cluster_ids": ["cluster-uuid-1", "cluster-uuid-2"]
}

The hub fans the aggregation out to each specified cluster and merges the results. See Cross-Cluster Queries.

Omit cluster_ids to include all healthy clusters.

Jobs System

Long-running queries execute asynchronously through the job system. Submit a job, check its status, and download results when ready.

Requires Academic verification.

Workflow

Submit job → Job queued → Job running → Job completed → Download result
                                      → Job failed (check error, rerun)

Submit a Query Job

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://api.syndb.xyz/v1/jobs \
  -d '{"query": "SELECT * FROM neurons WHERE dataset_id = '\''...'\''", "format": "arrow"}'

Returns a job_id for tracking.

Submit a Graph Job

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://api.syndb.xyz/v1/jobs/graph \
  -d '{"dataset_id": "...", "analysis": "full"}'

Check Status

curl -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/jobs/{job_id}
StatusMeaning
pendingQueued, waiting for a worker
runningCurrently executing
completedResults available for download
failedExecution error (check error_message)
cancelledCancelled by user

List Your Jobs

curl -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/jobs

Download Results

curl -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/jobs/{job_id}/result \
  -o result.arrow
  • Query jobs: Arrow IPC format (readable by pandas, polars, DuckDB)
  • Graph jobs: JSON

Cancel a Job

curl -X DELETE -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/jobs/{job_id}

Rerun a Job

curl -X POST -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/jobs/{job_id}/rerun

Creates a new job with the same parameters.

Configuration

ParameterDefaultEnvironment Variable
Max concurrent workers4JOB_QUEUE_MAX_WORKERS
Result TTL24 hoursJOB_RESULT_TTL_HOURS
Max result size1 GBJOB_MAX_RESULT_BYTES

Results are stored in S3 and automatically cleaned up after the TTL expires.

Federation Overview

SynDB federation allows multiple institutions to participate in a shared neuroscience data network while retaining full control of their data. Each institution runs a node with its own ClickHouse instance; a central hub coordinates queries across all nodes.

Why Federate?

ConcernWithout federationWith federation
Data sovereigntyUpload all data to a central serverData stays on your infrastructure
Meta-analysisLimited to datasets on one instanceQuery across all participating institutions
ComplianceData leaves your networkData never leaves — only query results cross boundaries
LatencySingle point of accessLocal reads are fast; cross-cluster queries pay network cost

Key Concepts

Hub — The coordinating instance that runs the full SynDB stack (API, PostgreSQL, ClickHouse, Meilisearch, S3). It maintains a registry of federated clusters, monitors their health, and routes cross-cluster queries.

Node — A lightweight participant running ClickHouse and the syndb-node binary. Nodes register with the hub via libp2p or HTTP, receive schema migrations, and respond to delegated queries.

Schema versioning — The hub pushes ClickHouse DDL migrations to all nodes. Queries only route to nodes whose schema version is compatible.

Health monitoring — The hub periodically checks each node’s health. Nodes are classified as Healthy, Degraded, Unreachable, or Unknown. Unhealthy nodes are excluded from federation queries.

Federation password — A shared secret that nodes present when registering with the hub. Prevents unauthorized clusters from joining.

When to Federate vs. Upload

Federate when:

  • Institutional policy requires data to stay on-premise
  • You have existing ClickHouse infrastructure
  • You want to contribute to cross-institutional meta-analysis without data transfer

Upload directly when:

  • You don’t have infrastructure to maintain
  • Your data has no residency requirements
  • You want the simplest path to sharing

Architecture at a Glance

┌─────────────────────────────────┐
│            Hub                  │
│  API + PostgreSQL + ClickHouse  │
│  + S3 + Meilisearch + libp2p   │
└──────┬──────────────┬───────────┘
       │ libp2p/QUIC  │ libp2p/QUIC
  ┌────▼────┐    ┌────▼────┐
  │ Node A  │    │ Node B  │
  │ CH + CLI │    │ CH + CLI │
  └─────────┘    └─────────┘

Queries flow: User → Hub API → Hub ClickHouse → remote() to Node ClickHouse → results aggregated at Hub.

See Architecture for the full technical breakdown.

Federation Architecture

Components

Hub

The hub runs the full SynDB stack and coordinates the federation:

ComponentRole
syndb-apiHTTP API (port 8080) + Arrow Flight (port 50051)
PostgreSQLUser accounts, dataset metadata, cluster registry, job queue, benchmarks
ClickHouseLocal data warehouse + remote() queries to nodes
S3/MinIOMesh files, job results, ETL staging
MeilisearchFull-text search index
HubRegistryActorlibp2p actor managing cluster registration and health
FederationHealthMonitorPeriodic health checks with circuit-breaker logic

Node

Nodes are lightweight — no PostgreSQL, no S3, no Meilisearch:

ComponentRole
syndb-nodeFederation daemon with Arrow Flight server (port 50052)
ClickHouseLocal data warehouse (HTTP port 8124, native port 9003/9440)
ClusterActorlibp2p actor handling hub communication

Networking: libp2p

Federation uses libp2p for peer-to-peer communication:

  • Transport: QUIC with built-in TLS 1.3 (encrypted, multiplexed)
  • Discovery: mDNS for LAN (zero-config), DHT for WAN
  • NAT traversal: Relay nodes for peers behind NAT
  • Actor model: kameo actors manage the swarm event loop

DHT Registration

Services register under well-known names in the DHT:

NameActor
syndb-hubHubRegistryActor
syndb-cluster:{name}ClusterActor

The ClusterActor on each node looks up syndb-hub in the DHT to find and register with the hub.

Actor Messages

The ClusterActor handles these message types:

MessageDirectionPurpose
HealthPingHub → NodePeriodic liveness check
SchemaSyncHub → NodePush DDL migrations
DatasetCatalogRequestHub → NodeDiscover datasets on node
GetFlightEndpointHub → NodeResolve Flight address for data transfer
AnalyticsQueryHub → NodeDelegated analytics computation
OntologySyncHub → NodePush ontology terms

Data Plane

Two mechanisms move data between hub and nodes:

ClickHouse remote()

For SQL queries, the hub compiles a remote('node-host:port', 'syndb', 'table', 'user', 'password') call that executes directly on the node’s ClickHouse and streams results back.

Arrow Flight (Internal)

For large result sets and non-SQL workloads (graph analysis, analytics), the hub delegates to the node’s internal Flight server (port 50052). Results stream back as Arrow IPC batches.

Schema Versioning

Each ClickHouse DDL migration has a version number. The hub tracks the current version and each node’s version:

  1. Hub receives a schema sync request (POST /v1/federation/schema/sync)
  2. Hub sends pending migrations to each active node via SchemaSync message
  3. Nodes apply migrations and report their new version
  4. Queries only route to nodes whose schema version is compatible

Health Monitoring

The FederationHealthMonitorActor runs on the hub:

StateMeaningQuery routing
HealthyResponds to pings, schema compatibleIncluded
DegradedResponds but slow or partially failingIncluded with lower priority
UnreachableFailed consecutive pingsExcluded
UnknownNewly registered, not yet checkedExcluded until first successful ping

Health transitions are logged and stored in PostgreSQL for audit.

Concurrency Model

  • Lock-free reads: The hub’s cluster registry uses papaya concurrent hash maps — reads never block, even under high query load
  • Actor isolation: Each cluster connection is managed by its own actor, preventing one slow node from blocking others
  • Supervisor trees: Actor failures are caught and restarted by the kameo supervisor

Node Setup

This guide walks through joining the SynDB federation as a node operator.

Prerequisites

  • ClickHouse instance with a syndb database
  • Network reachability to the hub (or mDNS on the same LAN)
  • The federation password (provided by the hub administrator)
  • syndb-cli binary (with federation feature)

Step 1: Initialize

syndb federation init \
  --cluster_name "my-lab-node" \
  --clickhouse_endpoint "clickhouse.mylab.edu" \
  --clickhouse_http_port 8123 \
  --clickhouse_port 9440 \
  --federation_password "$SYNDB_FEDERATION_PASSWORD" \
  --institution "My University" \
  --contact_email "[email protected]"

This command:

  1. Bootstraps a libp2p swarm and discovers the hub via mDNS or configured multiaddrs
  2. Registers the node with the hub (presenting the federation password)
  3. Applies any pending ClickHouse schema migrations
  4. Saves configuration to ~/.config/syndb/federation.json

Optional flags

FlagDefaultDescription
--listen_addrOS-assignedlibp2p listen address (e.g., /ip4/0.0.0.0/udp/4001/quic-v1)
--descriptionHuman-readable cluster description

Step 2: Verify

# Show federation config
syndb federation status

# Test connectivity (3s mDNS discovery + hub + ClickHouse check)
syndb federation test

federation test performs:

  1. Bootstraps a temporary libp2p swarm with mDNS discovery
  2. Looks up the hub in the DHT
  3. Tests ClickHouse connectivity

Step 3: Sync Schema

If the hub has newer schema migrations:

# Preview changes
syndb federation sync-schema --dry_run true

# Apply
syndb federation sync-schema

This uses an HTTP fallback via SYNDB_HUB_URL if libp2p is unavailable.

Step 4: Confirm Registration

List all federated clusters to verify your node appears:

export SYNDB_HUB_URL="https://api.syndb.xyz"
syndb federation clusters

Environment Variables

VariableRequiredDefaultDescription
SYNDB_FEDERATION_PASSWORDYesShared secret for hub registration
SYNDB_HUB_URLFor HTTP fallbackHub API URL (e.g., https://api.syndb.xyz)
FEDERATION_CLUSTER_NAMEYes (node mode)Unique cluster identifier
FEDERATION_NODE_FLIGHT_PORTNo50052Internal Flight gRPC port
FEDERATION_NODE_FLIGHT_ADVERTISENolocalhost:50052Advertised Flight endpoint
FEDERATION_ENABLE_MDNSNotrueEnable mDNS for LAN discovery
FEDERATION_LISTEN_ADDRNoOS-assignedlibp2p listen address
FEDERATION_HUB_MULTIADDRSNoComma-separated hub multiaddrs for WAN
FEDERATION_CLUSTER_NATIVE_PORTNo9440ClickHouse native port for remote() queries

Docker Compose (Development)

For local development, the federation profile starts a hub and one node:

docker compose --profile federation up -d

This starts:

  • clickhouse-node — ClickHouse on HTTP 8124, native 9003
  • clickhouse-node-setup — Creates federation user on the node
  • clickhouse-hub-fed-setup — Creates federation user on the hub
  • syndb-node — Federation daemon with Flight on 50052, libp2p on 4001

All services use network_mode: host and discover each other via localhost.

Removing a Node

syndb federation logout

This deletes ~/.config/syndb/federation.json. The hub administrator can also deactivate the cluster via DELETE /v1/federation/clusters/{id}.

Hub Administration

All hub administration endpoints require SuperUser authentication.

Federation Status

curl -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/federation/status
{
  "total_clusters": 5,
  "active_clusters": 4,
  "healthy": 3,
  "degraded": 1,
  "unreachable": 0,
  "schema_version": 12
}

Cluster Management

List Clusters

curl -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/federation/clusters

Returns each cluster’s ID, name, endpoint, port, health status, and active flag.

Register a Cluster

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://api.syndb.xyz/v1/federation/clusters \
  -d '{
    "name": "partner-lab",
    "endpoint": "ch.partner-lab.edu",
    "port": 9440,
    "description": "Partner Lab ClickHouse node",
    "institution": "Partner University",
    "contact_email": "[email protected]"
  }'

Clusters can also self-register via POST /v1/federation/register using the federation password (no SuperUser required).

Deactivate a Cluster

curl -X DELETE -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/federation/clusters/{cluster_id}

Sets is_active = false. The cluster is excluded from future queries but its record is preserved.

Health Checks

Single Cluster

curl -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/federation/clusters/{cluster_id}/health

Verification Tests

Three targeted tests for diagnosing cluster issues:

# Test ClickHouse connectivity and measure latency
curl -X POST -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/federation/clusters/{cluster_id}/test/connectivity

# Verify schema version compatibility
curl -X POST -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/federation/clusters/{cluster_id}/test/schema

# Run a test cross-cluster query
curl -X POST -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/federation/clusters/{cluster_id}/test/query

Schema Sync

Push pending DDL migrations to all active clusters:

curl -X POST -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/federation/schema/sync

Get the current schema version and migrations:

# All migrations
curl -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/federation/schema

# Migrations since version 10
curl -H "Authorization: Bearer $TOKEN" \
  "https://api.syndb.xyz/v1/federation/schema?since_version=10"

Benchmarks

Track federation query performance:

# Submit a benchmark record
curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://api.syndb.xyz/v1/federation/benchmarks \
  -d '{
    "cluster_id": "...",
    "query_type": "remote_single",
    "latency_ms": 145,
    "row_count": 50000,
    "cluster_count": 1,
    "payload_bytes": 2048000,
    "success": true
  }'

# List benchmarks with filters
curl -H "Authorization: Bearer $TOKEN" \
  "https://api.syndb.xyz/v1/federation/benchmarks?query_type=remote_single&limit=50"

# Aggregate stats grouped by query type
curl -H "Authorization: Bearer $TOKEN" \
  "https://api.syndb.xyz/v1/federation/benchmarks/aggregate?since=2024-01-01"

Query Types

TypeDescription
remote_singleQuery to one remote cluster
remote_multiQuery spanning multiple clusters
federation_unionUnion across all federated clusters
federation_searchFederated search
health_checkHealth check probe

Cross-Cluster Queries

Federation queries let you analyze data across all participating nodes from a single API call.

How It Works

  1. User submits a query via SyQL or meta-analysis endpoint with federation scope
  2. Hub resolves targets — checks dataset locality index to determine which nodes hold relevant data
  3. Hub compiles remote queries — generates ClickHouse remote('node:port', 'syndb', 'table', 'user', 'pass') calls
  4. Nodes execute locally — each node runs its portion of the query against local data
  5. Hub aggregates — results stream back and are merged at the hub

SyQL with Federation Scope

SyQL queries can target the federation by specifying scope in the request:

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://api.syndb.xyz/v1/syql/exec \
  -d '{
    "query": "SELECT neuron FROM neurons WHERE brain_region = '\''mushroom_body'\''",
    "scope": "federation"
  }'

The hub transparently fans the query out to nodes that hold matching datasets.

Meta-Analysis Across Clusters

Specify cluster_ids to include specific nodes:

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://api.syndb.xyz/v1/meta-analysis/analyze \
  -d '{
    "table": "neurons",
    "metric": "mesh_volume",
    "group_by": "brain_region",
    "scope": "federation",
    "cluster_ids": ["uuid-1", "uuid-2"]
  }'

Omit cluster_ids to query all healthy clusters.

Data Plane: Arrow Flight

For large result sets and non-SQL workloads (graph analysis, analytics), the hub delegates to each node’s internal Flight server:

  • Hub sends a Flight DoGet request to the node’s advertised Flight endpoint (default port 50052)
  • Results stream back as Arrow IPC record batches
  • The hub merges batches from multiple nodes before returning to the client

Limitations

ConstraintDetail
LatencyCross-cluster queries add network round-trip time per node
Schema compatibilityNodes must be at a compatible schema version; incompatible nodes are excluded
Node healthOnly Healthy and Degraded nodes receive queries; Unreachable nodes are skipped
Delegation timeoutDefault 30s (FEDERATION_DELEGATION_TIMEOUT_SECS); long-running queries may need async jobs
No cross-node joinsEach node executes independently; joins happen only against local data

Best Practices

  • Use async jobs (POST /v1/jobs) for large federation queries to avoid HTTP timeouts
  • Check federation status before running large queries to know which nodes are available
  • Prefer meta-analysis endpoints for cross-dataset aggregation — they handle fan-out efficiently
  • Monitor benchmarks to track federation query performance over time

Federation Troubleshooting

Node Cannot Find Hub

Symptom: syndb federation init or syndb federation test hangs during hub discovery.

Causes and fixes:

CauseFix
mDNS blocked by firewallOpen UDP port 5353 or set FEDERATION_ENABLE_MDNS=false and use explicit multiaddrs
Hub and node on different networksSet FEDERATION_HUB_MULTIADDRS to the hub’s libp2p address (e.g., /ip4/hub-ip/udp/4001/quic-v1)
Hub not runningVerify hub process is up and listening on its libp2p port

Registration Rejected

Symptom: "Invalid federation password" error.

Fix: Ensure SYNDB_FEDERATION_PASSWORD matches the hub’s FEDERATION_PASSWORD exactly. Check for trailing whitespace or newlines in environment variables.

Schema Version Mismatch

Symptom: Node excluded from federation queries; hub logs show schema incompatibility.

Fix:

# Check current schema
syndb federation status

# Sync to latest
syndb federation sync-schema

If sync fails, verify the node’s ClickHouse is reachable and the syndb database exists.

Health States

StateMeaningAction
HealthyAll checks passNone
DegradedResponds but slow or partially failingCheck ClickHouse load, disk space, network
UnreachableFailed consecutive pingsCheck firewall, ClickHouse process, network connectivity
UnknownNewly registeredWait for first health check cycle or trigger manual verify

Trigger a manual health check from the hub:

curl -X POST -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/federation/clusters/{id}/verify

Docker Compose Issues

Port Conflicts

The federation profile uses network_mode: host. Check for conflicts:

  • Hub ClickHouse: HTTP 8123, native 9002
  • Node ClickHouse: HTTP 8124, native 9003
  • Federation Flight: 50052
  • libp2p: UDP 4001

Node Fails to Start

Check that hub ClickHouse setup containers completed first:

docker compose --profile federation logs clickhouse-hub-fed-setup
docker compose --profile federation logs clickhouse-node-setup

These create the federation user on each ClickHouse instance. If they fail, the node cannot authenticate for remote() queries.

Connectivity Test Sequence

Run targeted tests to isolate the failure:

# 1. Test ClickHouse connectivity
curl -X POST -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/federation/clusters/{id}/test/connectivity

# 2. Test schema compatibility
curl -X POST -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/federation/clusters/{id}/test/schema

# 3. Test cross-cluster query
curl -X POST -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/federation/clusters/{id}/test/query

Each test returns a pass/fail result with latency and error details. Work through them in order — later tests depend on earlier ones passing.

Docker Compose

Local development and single-machine deployment using Docker Compose.

Base Stack

docker compose up -d

Starts the core services:

ServicePortDescription
syndb-api8080 (HTTP), 50051 (Flight)REST API + Arrow Flight
syndb-ui— (reverse-proxied)Web frontend
postgres5433Metadata, users, access control
clickhouse8123 (HTTP), 9002 (native)Data warehouse
minio9000 (API), 9001 (console)Object storage
meilisearch7700Full-text search

All services use network_mode: host — they bind directly to the host network.

Federation Profile

docker compose --profile federation up -d

Adds federation services on top of the base stack:

ServicePortDescription
clickhouse-node8124 (HTTP), 9003 (native)Node ClickHouse
clickhouse-node-setupCreates federation user on node
clickhouse-hub-fed-setupCreates federation user on hub
syndb-node50052 (Flight), 4001/UDP (libp2p)Federation node daemon

Note: The federation and federation-world profiles share port 8124 and are mutually exclusive. federation-world runs 5 regional ClickHouse nodes for benchmarking only.

ETL Profile

Run dataset imports:

docker compose --profile etl run syndb-etl <dataset> <command>

Example:

docker compose --profile etl run syndb-etl hemibrain download
docker compose --profile etl run syndb-etl hemibrain import

Version Management

All service versions are defined in versions.nix. After changing versions:

just sync-versions

This regenerates .env with the correct image tags.

Image Building

Build container images from Nix:

just stack-prepare

This builds OCI images for syndb-api, syndb-node, syndb-cli-etl, and the UI.

Volumes

VolumeServiceContent
clickhouse-dataclickhouseClickHouse data
clickhouse-node-dataclickhouse-nodeNode ClickHouse data
postgres-datapostgresPostgreSQL data
minio-dataminioS3 object storage
meilisearch-datameilisearchSearch index

Cleanup

ClickHouse creates files with UID 100100 and restrictive permissions. To clean volumes:

podman unshare rm -rf <volume-path>

Prefer keeping data in Docker volumes rather than bind mounts to avoid permission issues.

Kubernetes & Helm

Production deployment on Kubernetes using Helm charts.

Charts Overview

ChartDescription
syndb-hubHub deployment (API, UI, depends on syndb-clickhouse)
syndb-federation-nodeFederation node (syndb-node, depends on syndb-clickhouse)
syndb-clickhouseShared ClickHouse subchart (used by both hub and node)
syndb-etlETL batch jobs (download, prepare, import, graph-precompute)
nautilusUmbrella chart for the NRP Nautilus cluster deployment

Charts are located under infrastructure/helm/.

Hub Deployment

The hub chart deploys the full SynDB stack. Key values:

syndb-clickhouse:
  clusterName: syndb-hub
  shardRegions:
    - name: dc1
      region: dc1
      replicas: 3

api:
  image: syndb-api-rust:latest
  flightPort: 50051
  resources:
    requests:
      cpu: "1"
      memory: 2Gi

The chart also creates a remote_servers.xml ConfigMap for ClickHouse cluster topology.

Node Deployment

Deploy a federation node at your institution:

syndb-clickhouse:
  clusterName: syndb-node
  shardRegions:
    - name: dc1
      region: dc1
      replicas: 2

nodeApi:
  enabled: true
  image: syndb-api-rust:latest
  flightPort: 50052
  libp2pPort: 4001
  hubMultiaddrs: "/ip4/<hub-ip>/udp/4001/quic-v1"
  federationPassword: "<shared-secret>"
  resources:
    requests:
      cpu: 500m
      memory: 512Mi

When nodeApi.enabled=true, the chart deploys:

  • A Deployment running syndb-node with Flight (TCP) and libp2p (UDP) ports
  • A Service exposing both ports
  • Environment variables auto-populated from values (cluster name, endpoints, passwords)

In Kubernetes, mDNS is disabled — use hubMultiaddrs for explicit hub discovery.

ETL Jobs

ETL runs as Kubernetes Jobs:

jobs:
  hemibrain:
    download:
      enabled: true
      resources:
        requests: { cpu: "500m", memory: "2Gi" }
        limits: { cpu: "600m", memory: "4Gi" }
    import:
      enabled: true
      resources:
        requests: { cpu: "1", memory: "4Gi" }
        limits: { cpu: "1200m", memory: "6Gi" }

Important: Kubernetes Jobs are immutable. Before running helm upgrade when resource values changed, delete failed or running ETL jobs:

kubectl delete job -n syndb -l app=syndb-etl --field-selector status.successful!=1

emptyDir warning: emptyDir volumes default to tmpfs and count against the pod’s memory cgroup limit. Add expected emptyDir data size to the memory limit.

Applying Changes

just nautilus-apply

Or manually:

helm upgrade --install syndb infrastructure/helm/nautilus/ \
  -n syndb --create-namespace \
  -f infrastructure/helm/nautilus/values.yaml

Environment Reference

All configuration is via environment variables. The single source of truth for versions is versions.nix; run just sync-versions to regenerate .env.

Database

VariableDefaultDescription
POSTGRES_HOSTPostgreSQL host
POSTGRES_PORT5433PostgreSQL port
POSTGRES_USERNAMEPostgreSQL user
POSTGRES_PASSWORDPostgreSQL password
POSTGRES_PATHDatabase name
POSTGRES_READ_HOSTSame as write hostRead replica host
DB_POOL_MAX20Max connection pool size
DB_POOL_MIN2Min connection pool size
DB_CONNECT_TIMEOUT_SECS10Connection timeout
CLICKHOUSE_HOSTClickHouse host
CLICKHOUSE_PORT8123ClickHouse HTTP port

Object Storage (S3/MinIO)

VariableDefaultDescription
S3_ACCESS_KEYAccess key
S3_SECRET_KEYSecret key
S3_ENDPOINTCustom endpoint (for MinIO)
S3_REGIONAWS region

Bucket names: syndb-mesh, syndb-swb, syndb-search, syndb-jobs. No underscores allowed in bucket names.

Authentication

VariableDefaultDescription
PASSLIB_SECRETPASETO v4.local symmetric key (minimum 32 bytes)
SERVICE_SECRETService account registration secret
UI_BASE_URLOAuth callback redirect base URL
ACCESS_TOKEN_LIFETIME900 (15 min)Access token TTL in seconds
REFRESH_TOKEN_LIFETIME2592000 (30 days)Refresh token TTL in seconds

OAuth Providers

VariableDescription
OA_GITHUB_ID, OA_GITHUB_SECRETGitHub OAuth app credentials
OA_GOOGLE_ID, OA_GOOGLE_SECRETGoogle OAuth credentials
OA_ORCID_ID, OA_ORCID_SECRETORCID OAuth credentials
OA_CILOGON_ID, OA_CILOGON_SECRETCILogon OAuth credentials
OA_GITLAB_ID, OA_GITLAB_SECRETGitLab OAuth credentials
OA_GITLAB_URLCustom GitLab instance URL
OA_ORCID_SANDBOXUse sandbox.orcid.org (false)
OA_CILOGON_SANDBOXUse test.cilogon.org (false)
OAUTH_PROVIDER_BASE_URLOverride provider URLs (testing)

Federation

VariableDefaultDescription
FEDERATION_LISTEN_ADDROS-assignedlibp2p listen address
FEDERATION_ENABLE_MDNStrueEnable mDNS LAN discovery
FEDERATION_HUB_MULTIADDRSComma-separated hub multiaddrs for WAN
FEDERATION_CLUSTER_NAMECluster identifier (required for node mode)
FEDERATION_CLUSTER_DESCRIPTIONCluster description
FEDERATION_CLUSTER_INSTITUTIONInstitution name
FEDERATION_PASSWORDShared federation secret
FEDERATION_CLUSTER_NATIVE_PORT9440ClickHouse native port for remote()
FEDERATION_NODE_FLIGHT_PORT50052Internal Flight gRPC port
FEDERATION_NODE_FLIGHT_ADVERTISElocalhost:50052Advertised Flight endpoint
FEDERATION_DELEGATION_TIMEOUT_SECS30Timeout for delegated requests

Server

VariableDefaultDescription
DEV_MODEfalsePermissive CORS, data seeding
DEBUGfalseVerbose SQL logging
TESTINGfalseSkip federation/job queue init
REQUEST_TIMEOUT_SECS60HTTP handler timeout
HTTP_CLIENT_TIMEOUT_SECS30Internal HTTP client timeout
UPLOAD_TIMEOUT21600 (6 hours)Upload timeout
FLIGHT_PORT50051Arrow Flight server port
REQUIRE_AUTHENTICATIONtrueRequire auth for protected endpoints

Rate Limiting

VariableDefaultDescription
RATE_LIMIT_PER_SECOND100Sustained request rate per IP
RATE_LIMIT_BURST200Burst capacity per IP

Job Queue

VariableDefaultDescription
JOB_QUEUE_MAX_WORKERS4Max concurrent job workers
JOB_RESULT_TTL_HOURS24Result retention
JOB_MAX_RESULT_BYTES1073741824 (1 GB)Max result size

Search

VariableDefaultDescription
MEILISEARCH_HOSTMeilisearch host
MEILISEARCH_PORT7700Meilisearch port
MEILISEARCH_API_KEYMeilisearch API key

API Overview

Base URL: https://api.syndb.xyz/v1

Interactive OpenAPI documentation: api.syndb.xyz/docs

OpenAPI spec: GET /openapi.json

Authentication

Pass a PASETO access token in the Authorization header:

Authorization: Bearer <access_token>

See Authentication for how to obtain tokens.

Content Types

  • Requests: application/json
  • Responses: application/json (API), Apache Arrow IPC (job results), BibTeX/RIS (citations)

Error Format

{
  "error": "Human-readable error message"
}

Standard HTTP status codes: 400 (bad request), 401 (unauthenticated), 403 (insufficient permissions), 404 (not found), 409 (conflict), 429 (rate limited).

Route Map

No Authentication Required

MethodPathDescription
GET/healthService health check (Postgres, ClickHouse, S3, Meilisearch)
POST/v1/user/auth/loginLogin
POST/v1/user/auth/registerRegister
POST/v1/user/auth/register-serviceRegister service account (X-Service-Secret)
POST/v1/user/auth/refreshRefresh token
POST/v1/user/auth/logoutLogout
GET/v1/user/profile/{user_id}Public profile lookup
GET/v1/searchFull-text dataset search
GET/v1/federation/pingFederation health
POST/v1/federation/registerCluster self-registration (federation password)

Authenticated User (AuthUser)

MethodPathDescription
GET/v1/user/profileCurrent user profile
PATCH/v1/user/profileUpdate profile
POST/v1/user/profile/scientist-tagGenerate scientist tag
GET/v1/user/authenticate/cilogonAcademic metadata

Academic / Verified User (AcademicUser)

MethodPathDescription
POST/v1/neurodata/datasetsRegister dataset
GET/v1/neurodata/datasets/ownedUser’s datasets
POST/v1/neurodata/download-urlsPre-signed S3 download URLs
POST/v1/syql/planParse and validate SyQL
POST/v1/syql/execExecute SyQL query
POST/v1/syql/explainExplain query plan
POST/v1/syql/cancelCancel running query
GET/v1/queriesList saved queries
POST/v1/queriesSave query
POST/v1/queries/{id}/runExecute saved query
POST/v1/jobsSubmit query job
POST/v1/jobs/graphSubmit graph analysis job
GET/v1/jobsList jobs
GET/v1/jobs/{id}/resultDownload job result
GET/v1/analytics/{id}/summaryDataset summary stats
GET/v1/analytics/{id}/neuron-morphometricsMorphometric statistics
GET/v1/analytics/{id}/graph-summaryGraph-level statistics
GET/v1/analytics/{id}/reciprocitySynapse reciprocity
GET/v1/analytics/{id}/degree-distributionDegree distribution
GET/v1/analytics/zscore-comparisonZ-score comparison
GET/v1/graph/{id}/metricsGraph metrics
POST/v1/graph/{id}/motifsTriadic census
POST/v1/graph/{id}/shortest-pathShortest path
POST/v1/graph/{id}/reachabilityReachability analysis
POST/v1/graph/{id}/full-analysisFull graph analysis
POST/v1/graph/compareCross-dataset comparison
POST/v1/meta-analysis/analyzeCross-dataset meta-analysis
POST/v1/meta-analysis/atlas-compareAtlas comparison

SuperUser

MethodPathDescription
GET/v1/federation/statusFederation overview
GET/v1/federation/clustersList clusters
POST/v1/federation/clustersRegister cluster
DELETE/v1/federation/clusters/{id}Deactivate cluster
POST/v1/federation/schema/syncSync schema to clusters
POST/v1/ontology/termsCreate ontology term
PUT/v1/ontology/terms/{id}Update term
POST/v1/ontology/import-csvBulk import terms

Middleware Stack

Requests pass through these layers in order:

  1. Request ID — UUID v7, propagated via X-Request-ID
  2. Tracing — structured request/response logging
  3. Rate limiting — per-IP token bucket (see Rate Limiting)
  4. Timeout — 60s default, 408 on expiry
  5. CORS — permissive in dev mode, restricted to api_domain in production
  6. Compression — automatic response compression
  7. Body limit — 100 MB max request body
  8. API versionapi-version: v1 response header

Health Check

curl https://api.syndb.xyz/health
{
  "status": "healthy",
  "components": {
    "postgres": { "status": "ok", "latency_ms": 5 },
    "clickhouse": { "status": "ok", "latency_ms": 12 },
    "storage": { "status": "ok", "latency_ms": 8 },
    "meilisearch": { "status": "ok" }
  }
}

Status is degraded if any component fails. Meilisearch is optional — its absence does not degrade the overall status.

CLI Reference

The SynDB CLI (syndb) provides command-line access to all platform features.

Installation

# With GUI
pipx install syndb-cli[gui]

# Without GUI
pipx install syndb-cli

Global Options

OptionEnvironment VariableDescription
--server-urlSYNDB_SERVER_URLAPI base URL
--flight-urlSYNDB_FLIGHT_URLArrow Flight endpoint
--flight-portSYNDB_FLIGHT_PORTArrow Flight port

Commands

user — Account Management

CommandDescription
syndb user registerCreate a new account
syndb user loginAuthenticate and store token
syndb user logoutRevoke token

query — Saved Queries

CommandDescription
syndb query listList saved queries
syndb query saveSave a new query
syndb query show {id}Show query details
syndb query update {id}Update a saved query
syndb query delete {id}Delete a saved query
syndb query run {id}Execute a saved query
syndb query status {id}Check execution status

dataset — Dataset Management

CommandDescription
syndb dataset newRegister a new dataset
syndb dataset prepareValidate and convert to Parquet
syndb dataset validateSchema validation only
syndb dataset uploadUpload data via Arrow Flight
syndb dataset downloadDownload dataset via Arrow Flight
syndb dataset mesh-uploadUpload 3D mesh files (.glb)

etl — Dataset Import Pipeline

Each dataset supports download, validate, and import subcommands:

syndb etl <dataset> download
syndb etl <dataset> validate
syndb etl <dataset> import [--tables <table1,table2>]

Available Datasets

DatasetKeyDescription
FlyWireflywireWhole-brain Drosophila connectome
HemibrainhemibrainJanelia FlyEM v1.2.1
MANCmancMale Adult Nerve Cord
MICrONSmicronsMouse visual cortex
H01h01Human cortical tissue
BANCbancBrain And Nerve Cord
FANCfancFemale Adult Nerve Cord
Fish1fish1Zebrafish brain
Optic Lobeoptic-lobeDrosophila optic lobe
Male CNSmale-cnsMale central nervous system
C. elegans Hermaphroditec-elegans-hermComplete hermaphrodite
C. elegans Malec-elegans-maleComplete male
C. elegans Developmentalc-elegans-devDevelopmental stages
PlatynereisplatynereisMarine annelid
L1 Larvall1-larvalDrosophila L1 larval brain
Spine Morphometry (Kasthuri)spine-kasthuriDendritic spine morphometry
Spine Morphometry (Ofer)spine-oferDendritic spine morphometry
Spine Morphometry (MICrONS)spine-micronsDendritic spine morphometry
Allen Cell Typesallen-cell-typesAllen Institute reference
NeuroMorphoneuromorphoNeuroMorpho.org archive

federation — Federation Management

CommandDescription
syndb federation initInitialize federation config and register with hub
syndb federation statusShow federation configuration
syndb federation sync-schemaSync ClickHouse schema from hub
syndb federation testTest connectivity (mDNS + hub + ClickHouse)
syndb federation clustersList all federated clusters
syndb federation logoutRemove federation config

See Node Setup for detailed usage.

graph-precompute — Batch Graph Computation

syndb graph-precompute --dataset-id {uuid}

Pre-computes graph metrics and stores results in ClickHouse materialized tables.

k8s — Kubernetes Administration

CommandDescription
syndb k8s jobsList ETL jobs
syndb k8s statusView job status
syndb k8s cleanupClean up completed/failed jobs

bench — Benchmarking

Performance testing suite for API and federation queries.

completions — Shell Completions

syndb completions bash > ~/.local/share/bash-completion/completions/syndb
syndb completions zsh > ~/.zfunc/_syndb
syndb completions fish > ~/.config/fish/completions/syndb.fish

Ontology & Vocabularies

SynDB uses controlled vocabularies to standardize dataset metadata — brain regions, species, microscopy techniques, and neurotransmitter types.

Browsing Terms

List All Vocabularies

curl https://api.syndb.xyz/v1/ontology/vocabularies

List Terms in a Vocabulary

curl "https://api.syndb.xyz/v1/ontology/terms?vocabulary=brain_region"

Search Terms

curl "https://api.syndb.xyz/v1/ontology/terms?search=mushroom"

Term Hierarchy

# Get child terms
curl https://api.syndb.xyz/v1/ontology/terms/{term_id}/children

# Get ancestor terms
curl https://api.syndb.xyz/v1/ontology/terms/{term_id}/ancestors

Validating Terms

Before submitting dataset metadata, validate that your terms exist:

curl -X POST -H "Content-Type: application/json" \
  https://api.syndb.xyz/v1/ontology/terms/validate \
  -d '{"terms": ["mushroom_body", "lateral_horn"]}'

Returns which terms are valid and which are unrecognized.

Administration (SuperUser)

Create a Term

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://api.syndb.xyz/v1/ontology/terms \
  -d '{
    "name": "calyx",
    "vocabulary": "brain_region",
    "parent_id": "mushroom_body_term_id",
    "description": "Input region of the mushroom body"
  }'

Update a Term

curl -X PUT -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://api.syndb.xyz/v1/ontology/terms/{term_id} \
  -d '{"description": "Updated description"}'

Deprecate a Term

curl -X PATCH -H "Authorization: Bearer $TOKEN" \
  https://api.syndb.xyz/v1/ontology/terms/{term_id}/deprecate

Deprecated terms remain in the system but are flagged in search results and validation.

Bulk Import

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: text/csv" \
  https://api.syndb.xyz/v1/ontology/import-csv \
  --data-binary @terms.csv

CSV format: name,vocabulary,parent_name,description

Integration with Datasets

When creating or updating dataset metadata, brain region, species, and microscopy fields are validated against the ontology. Invalid terms are rejected with an error listing the closest matches.

Rate Limiting

SynDB enforces per-IP rate limiting using a token bucket algorithm.

Defaults

ParameterDefaultEnvironment Variable
Requests per second100RATE_LIMIT_PER_SECOND
Burst capacity200RATE_LIMIT_BURST

The bucket refills at the sustained rate. Burst capacity allows short spikes above the sustained rate.

Client IP Detection

The rate limiter identifies clients by IP address, checked in order:

  1. X-Forwarded-For header (first address)
  2. X-Real-IP header
  3. Localhost (fallback for direct connections)

Behind a reverse proxy, ensure X-Forwarded-For is set correctly.

Response on Limit

When the rate limit is exceeded:

HTTP/1.1 429 Too Many Requests
Retry-After: 1

Too many requests

Client Handling

Respect the Retry-After header and implement exponential backoff:

import time
import requests

def request_with_backoff(url, headers, max_retries=3):
    for attempt in range(max_retries):
        resp = requests.get(url, headers=headers)
        if resp.status_code != 429:
            return resp
        wait = int(resp.headers.get("Retry-After", 1)) * (2 ** attempt)
        time.sleep(wait)
    raise Exception("Rate limited after retries")

For batch operations, throttle to well under 100 req/s to leave headroom for interactive use.

Choosing a license for your dataset

When sharing microscopy data derived datasets, selecting an appropriate license is crucial for ensuring the proper use and distribution of your work. Different licenses offer varying degrees of freedom and control over your data. Here, we outline some popular licenses, their key features, and considerations to help you choose the right one for your needs.

Considerations for Choosing a License

  • Intended Use: Determine whether you want your data to be used freely or with certain restrictions, such as non-commercial use only.
  • Credit and Attribution: Decide if you want to receive credit for your work and if it’s important for you to see how others are using your data.
  • Derivative Works: Consider whether you want derivative works to be allowed and if they should be shared under the same terms.
  • Commercial Use: Reflect on whether you want to permit commercial use of your data. Your institution may have specific policies regarding commercial use.

Licenses

The following are some common licenses used for sharing data on the web, which we also use on the SynDB platform.

Tip

Use ODC over CC

We recommend the ODC licenses for datasets on SynDB. You are free to use Creative Commons licenses as well, just note that they are not designed for data. The default license for SynDB is ODC-BY.

Open Data Commons (ODC) Licenses

Open Data Commons (ODC) licenses are specifically tailored for datasets and databases, focusing on maximizing accessibility and proper attribution in data sharing.

PDDL (Public Domain Dedication and License)

Places the dataset in the public domain, allowing unrestricted use and maximizing openness and usability.

ODC-BY (Attribution License)

Allows use with proper credit to the original creator, ensuring acknowledgment while enabling broad use.

ODC-ODbL (Open Database License)

Permits sharing, modifying, and using the dataset with attribution and requires derivative databases to be shared under the same license, promoting open access and collaborative improvement while keeping derivative databases equally accessible.

Creative Commons (CC) Licenses

Creative Commons (CC) licenses are versatile and well-suited for a wide range of creative works, including datasets

CC0 (Public Domain Dedication)

Allows the use of the dataset without any restrictions, making it ideal for maximizing usability and dissemination.

CC BY (Attribution)

Allows users to use the dataset as long as they provide appropriate credit to the original creator, ensuring wide use while acknowledging the creator’s work.

CC BY-SA (Attribution-ShareAlike)

Permits use of the dataset with appropriate credit and requires sharing derivative works under the same license, keeping derivative works open and shareable under the same terms.

CC BY-NC (Attribution-NonCommercial)

Allows use for non-commercial purposes with proper credit, restricting use to non-commercial purposes while still enabling academic and research use.

CC BY-NC-SA (Attribution-NonCommercial-ShareAlike)

Permits non-commercial use with appropriate credit and sharing of derivative works under the same license, ensuring non-commercial use and open sharing under the same terms.

Conclusion

Selecting the right license for your microscopy data derived dataset is essential for controlling how your data is used and ensuring it meets your sharing objectives. By considering the options and your specific needs, you can choose a license that balances openness, credit, and control, fostering collaboration and advancement in your field.

Metrics structuring for contribution

Note

Prerequisites

This article requires that you understand how data is stored on SynDB, we recommend reading through the overview article if you are uncertain.

This article is a guide for contributors who wish to upload their data to SynDB. Please don’t hesitate to ask for help on the Discord channel if you have any questions; this part can be challenging.

Data structuring

Schema

Each SynDB table has its own schema, which can be found on our GitHub repository. The schema defines the supported column names and their data types. The data must be structured in a way that is compatible with the schema of the table you are contributing to.

The column names and the data stored under them must be in a format that is compatible with the type. You can find the supported column names for each SynDB table in the . You may use the glossary at the end of this article for reference.

Note

Nano

We use nanometers as the unit for all measurements; includes volume, radius, and distance.

Sourcing raw data

You may upload the sourcing raw data files including meshes or SWL to SynDB. Place the absolute path to the file in your table file. The following are supported:

  • Meshes in .glb format, column name: mesh_path
  • SWC files, .swc, column name: swc_path

This list is the main tracker for the supported formats. You may request additional formats on the Discord channel. The SynDB team will review the request and consider adding the new format to the platform.

Columns

Most column types are self-explanatory, but some require additional explanation.

Identifiers and relations

The CID column defined in your table can have any unique hashable value, it will be replaced by a UUID when uploaded to SynDB. When uploading a relational dataset, the cid column in the parent will be used to correlate the relations to the children by their parent_id; meaning the hashable value in the parent cid column must match the parent_id in the child. parent_enum can be omitted as the compartments are defined at the tabular level, and will, therefore, be added automatically.

Example

Notice the parent_id column in the child table, this is the cid of the parent table. The parent_enum column is not present in the child table, as it is defined at the tabular file name.

vesicle.csv, child

cidneurotransmittervoxel_radiusdistance_to_active_zoneminimum_normal_lengthparent_idcentroid_zcentroid_xcentroid_y
0glutamate26.9129705.24502314505.2321996.2244953.6
1glutamate25.5388615.02132314505.2321996.2244953.6
2glutamate29.5260513.07012314505.2321996.2244953.6
3glutamate30.5131479.92242314505.2321996.2244953.6
4glutamate28.3977454.82482314505.2321996.2244953.6
5glutamate30.2033459.75572324505.2321996.2244953.6
6glutamate33.4548374.81312324505.2321996.2244953.6
7glutamate32.0890455.92932344505.2321996.2244953.6

axon.csv, parent

voxel_volumemitochondria_counttotal_mitochondria_volumecid
385668034.56193208043.521
1492089016.324412054179.842
327740497.92004

Glossary

KeyDescription
dataset_idThe unique identifier for the dataset, of type uuid.
cidThe unique identifier for a SynDB unit within the dataset, of type uuid.
parent_idThe CID of the parent component, of type uuid.
parent_enumAn integer representing the type or category of the parent component, of type int.
polarityThe polarity of the neuron, of type ascii.
voxel_volumeThe volume of the voxel, of type double.
voxel_radiusThe radius of the voxel, of type double.
s3_mesh_locationThe location of the mesh in S3 storage, of type smallint.
mesh_volumeThe volume of the mesh, of type double.
mesh_surface_areaThe surface area of the mesh, of type double.
mesh_area_volume_ratioThe ratio of the surface area to the volume of the mesh, of type double.
mesh_sphericityThe sphericity of the mesh, of type double.
centroid_zThe z-coordinate of the centroid, of type double.
centroid_xThe x-coordinate of the centroid, of type double.
centroid_yThe y-coordinate of the centroid, of type double.
s3_swb_locationThe location of the SWB in S3 storage, of type smallint.
terminal_countThe count of terminals, of type int.
mitochondria_countThe count of mitochondria, of type int.
total_mitochondria_volumeThe total volume of mitochondria, of type double.
neuron_idThe unique identifier for the associated neuron, of type uuid.
vesicle_countThe count of vesicles, of type int.
total_vesicle_volumeThe total volume of vesicles, of type double.
forms_synapse_withThe unique identifier of the synapse that the component forms with, of type uuid.
connection_scoreThe score representing the strength or quality of the connection, of type double.
cleft_scoreThe score for the synaptic cleft, of type int.
GABAThe concentration or presence of GABA neurotransmitter, of type double.
acetylcholineThe concentration or presence of acetylcholine neurotransmitter, of type double.
glutamateThe concentration or presence of glutamate neurotransmitter, of type double.
octopamineThe concentration or presence of octopamine neurotransmitter, of type double.
serineThe concentration or presence of serine neurotransmitter, of type double.
dopamineThe concentration or presence of dopamine neurotransmitter, of type double.
root_idThe external root identifier from the source platform (e.g. FlyWire), of type int.
pre_idThe unique identifier of the pre-synaptic component, of type uuid.
post_idThe unique identifier of the post-synaptic component, of type uuid.
dendritic_spine_countThe count of dendritic spines, of type int.
neurotransmitterThe type of neurotransmitter present in a vesicle, of type ascii.
distance_to_active_zoneThe distance from the vesicle to the active zone, of type double.
minimum_normal_lengthThe minimum normal length, of type int.
ribosome_countThe count of ribosomes within the endoplasmic reticulum, of type int.

Build Caching

SynDB uses multiple layers of caching to keep compile times short across local development, CI pipelines, and production deploys.

Cargo compiler flags

Configured in .cargo/config.toml, these flags speed up every local cargo invocation:

FlagEffect
-C link-arg=-fuse-ld=moldMold linker — significantly faster than the default ld or lld (Linux only)
-Zshare-generics=yShare monomorphized generics between crates, reducing codegen work
-Zthreads=8Parallel compiler frontend (parsing, macro expansion, type checking)
codegen-backend = "cranelift"Dev profile uses Cranelift instead of LLVM for faster debug builds
codegen-backend = "llvm" (for deps)Dependencies still use LLVM for better optimization

CI caching (GitHub Actions)

In CI, syndb-ci runs tests and builds directly on the host (no Docker for the ci subcommand). Cargo artifacts are cached between runs via Swatinem/rust-cache@v2, which persists target/ and the cargo registry keyed by branch and Cargo.lock hash.

For integration tests (local-stack-test, e2e-test), syndb-ci uses bollard to start ephemeral Docker containers (PostgreSQL, ClickHouse, MinIO) on a shared Docker network. Test binaries run on the host with environment variables pointing at localhost:<port>. No cargo cache volumes are needed inside containers — the host target/ is used directly.

Nix OCI cache

Local stack images (just stack-prepare) for the API and ETL are built with Nix and cached using syndb-ci nix-oci-cache. This command uses Nix store paths as content-addressed fingerprints to skip unnecessary rebuilds:

  1. nix build .#oci-syndb-api produces a store path (a hash of all inputs)
  2. The script compares the current store path against a stamp file (/tmp/.oci-syndb-api.storepath)
  3. If unchanged, the build is skipped entirely
  4. If changed, the new tarball is copied to /tmp/ and loaded into Docker

This means just stack-prepare is near-instant when source hasn’t changed.

Nix Crane dependency caching

For nix flake check (CI) and Nix-based OCI image builds, the project uses Crane with a split dependency build:

# nix/rust.nix
mkCargoArtifacts = system:
    craneLib.buildDepsOnly (mkCommonArgs system);

buildDepsOnly compiles all workspace dependencies into a cached Nix derivation. Subsequent builds of workspace crates reuse these artifacts, so only the project’s own code is recompiled. Since Nix derivations are content-addressed, the dependency cache is automatically invalidated when Cargo.lock changes.

UI source-hash cache

The UI image (just _stack-prepare-ui) uses a source-hash stamp to skip rebuilds when Python source files haven’t changed:

  1. SHA-256 of all files in packages/ui/src/, packages/syndb-ql/python/, pyproject.toml, uv.lock, and syndb-ql-python/Cargo.toml
  2. Compared against /tmp/.syndb-ui.srchash
  3. If the hash matches and syndb-ui:dev exists in Docker, the build is skipped

Summary

LayerScopeMechanismInvalidation
Cargo flagsLocal devMold, Cranelift, parallel frontendN/A (always active)
GitHub Actions cacheCI pipelinesSwatinem/rust-cache@v2Branch + Cargo.lock hash
Nix OCI cachestack-prepare (API, ETL)Nix store-path stampsContent-addressed (any input change)
Crane depsnix flake check, OCI imagesbuildDepsOnly derivationCargo.lock changes
UI source hashstack-prepare (UI)SHA-256 file stampSource file changes

Troubleshooting

Find up-to-date explanations of different types of errors and pointers on how to resolve them.

403, Unauthorized

Verification

Academic verification is required for computationally or network-heavy tasks. This is to ensure that the resources are not being misused. You may verify yourself after registering on the platform — see Authentication for details on CILogon verification.

Dataset

A dataset belongs to the creator, and groups that the creator chooses to share its ownership. If you are unable to access a dataset, you fit neither of these categories. You may request access to the dataset from the creator.

429, Too Many Requests

You have exceeded the rate limit (100 requests/second by default). Respect the Retry-After header and implement exponential backoff. See Rate Limiting.

Job Failures

If a submitted job fails:

  1. Check the job status: GET /v1/jobs/{job_id} — the error_message field describes the failure
  2. Common causes: query timeout, result too large (>1 GB), ClickHouse resource limits
  3. Rerun the job: POST /v1/jobs/{job_id}/rerun

See Jobs System for details.

SyQL Errors

  • Parse errors: Check SyQL syntax in the error message. Use POST /v1/syql/plan to validate without executing.
  • Resolution errors: A referenced table or column does not exist. Check the Data Structuring guide for valid column names.
  • Timeout: Large queries may exceed the 60s HTTP timeout. Use POST /v1/syql/exec to submit as an async job instead.

Federation Issues

See Federation Troubleshooting for:

  • Node discovery failures (mDNS, multiaddrs)
  • Schema version mismatches
  • Cluster health states
  • Docker Compose federation profile issues