Federation Overview

SynDB federation allows multiple institutions to participate in a shared neuroscience data network while retaining full control of their data. Each institution runs a node with its own ClickHouse instance; a central hub coordinates queries across all nodes.

Why Federate?

Concern	Without federation	With federation
Data sovereignty	Upload all data to a central server	Data stays on your infrastructure
Meta-analysis	Limited to datasets on one instance	Query across all participating institutions
Compliance	Data leaves your network	Data never leaves — only query results cross boundaries
Latency	Single point of access	Local reads are fast; cross-cluster queries pay network cost

Key Concepts

Hub — The coordinating instance that runs the full SynDB stack (API, PostgreSQL, ClickHouse, Meilisearch, S3). It maintains a registry of federated clusters, monitors their health, and routes cross-cluster queries.

Node — A lightweight participant running ClickHouse and the syndb-node binary. Nodes register with the hub via libp2p or HTTP, receive schema migrations, and respond to delegated queries.

Schema versioning — The hub pushes ClickHouse DDL migrations to all nodes. Queries only route to nodes whose schema version is compatible.

Health monitoring — The hub periodically checks each node’s health. Nodes are classified as Healthy, Degraded, Unreachable, or Unknown. Unhealthy nodes are excluded from federation queries.

Federation password — A shared secret that nodes present when registering with the hub. Prevents unauthorized clusters from joining.

When to Federate vs. Upload

Federate when:

Institutional policy requires data to stay on-premise
You have existing ClickHouse infrastructure
You want to contribute to cross-institutional meta-analysis without data transfer

Upload directly when:

You don’t have infrastructure to maintain
Your data has no residency requirements
You want the simplest path to sharing

Architecture at a Glance

┌─────────────────────────────────┐
│            Hub                  │
│  API + PostgreSQL + ClickHouse  │
│  + S3 + Meilisearch + libp2p   │
└──────┬──────────────┬───────────┘
       │ libp2p/QUIC  │ libp2p/QUIC
  ┌────▼────┐    ┌────▼────┐
  │ Node A  │    │ Node B  │
  │ CH + CLI │    │ CH + CLI │
  └─────────┘    └─────────┘

Queries flow: User → Hub API → Hub ClickHouse → remote() to Node ClickHouse → results aggregated at Hub.

See Architecture for the full technical breakdown.

Keyboard shortcuts

Synapse DB

Federation Overview

Why Federate?

Key Concepts

When to Federate vs. Upload

Architecture at a Glance