Federation Overview
SynDB federation allows multiple institutions to participate in a shared neuroscience data network while retaining full control of their data. Each institution runs a node with its own ClickHouse instance; a central hub coordinates queries across all nodes.
Why Federate?
| Concern | Without federation | With federation |
|---|---|---|
| Data sovereignty | Upload all data to a central server | Data stays on your infrastructure |
| Meta-analysis | Limited to datasets on one instance | Query across all participating institutions |
| Compliance | Data leaves your network | Data never leaves — only query results cross boundaries |
| Latency | Single point of access | Local reads are fast; cross-cluster queries pay network cost |
Key Concepts
Hub — The coordinating instance that runs the full SynDB stack (API, PostgreSQL, ClickHouse, Meilisearch, S3). It maintains a registry of federated clusters, monitors their health, and routes cross-cluster queries.
Node — A lightweight participant running ClickHouse and the syndb-node binary. Nodes register with the hub via libp2p or HTTP, receive schema migrations, and respond to delegated queries.
Schema versioning — The hub pushes ClickHouse DDL migrations to all nodes. Queries only route to nodes whose schema version is compatible.
Health monitoring — The hub periodically checks each node’s health. Nodes are classified as Healthy, Degraded, Unreachable, or Unknown. Unhealthy nodes are excluded from federation queries.
Federation password — A shared secret that nodes present when registering with the hub. Prevents unauthorized clusters from joining.
When to Federate vs. Upload
Federate when:
- Institutional policy requires data to stay on-premise
- You have existing ClickHouse infrastructure
- You want to contribute to cross-institutional meta-analysis without data transfer
Upload directly when:
- You don’t have infrastructure to maintain
- Your data has no residency requirements
- You want the simplest path to sharing
Architecture at a Glance
┌─────────────────────────────────┐
│ Hub │
│ API + PostgreSQL + ClickHouse │
│ + S3 + Meilisearch + libp2p │
└──────┬──────────────┬───────────┘
│ libp2p/QUIC │ libp2p/QUIC
┌────▼────┐ ┌────▼────┐
│ Node A │ │ Node B │
│ CH + CLI │ │ CH + CLI │
└─────────┘ └─────────┘
Queries flow: User → Hub API → Hub ClickHouse → remote() to Node ClickHouse → results aggregated at Hub.
See Architecture for the full technical breakdown.