Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Federation Overview

SynDB federation allows multiple institutions to participate in a shared neuroscience data network while retaining full control of their data. Each institution runs a node with its own ClickHouse instance; a central hub coordinates queries across all nodes.

Why Federate?

ConcernWithout federationWith federation
Data sovereigntyUpload all data to a central serverData stays on your infrastructure
Meta-analysisLimited to datasets on one instanceQuery across all participating institutions
ComplianceData leaves your networkData never leaves — only query results cross boundaries
LatencySingle point of accessLocal reads are fast; cross-cluster queries pay network cost

Key Concepts

Hub — The coordinating instance that runs the full SynDB stack (API, PostgreSQL, ClickHouse, Meilisearch, S3). It maintains a registry of federated clusters, monitors their health, and routes cross-cluster queries.

Node — A lightweight participant running ClickHouse and the syndb-node binary. Nodes register with the hub via libp2p or HTTP, receive schema migrations, and respond to delegated queries.

Schema versioning — The hub pushes ClickHouse DDL migrations to all nodes. Queries only route to nodes whose schema version is compatible.

Health monitoring — The hub periodically checks each node’s health. Nodes are classified as Healthy, Degraded, Unreachable, or Unknown. Unhealthy nodes are excluded from federation queries.

Federation password — A shared secret that nodes present when registering with the hub. Prevents unauthorized clusters from joining.

When to Federate vs. Upload

Federate when:

  • Institutional policy requires data to stay on-premise
  • You have existing ClickHouse infrastructure
  • You want to contribute to cross-institutional meta-analysis without data transfer

Upload directly when:

  • You don’t have infrastructure to maintain
  • Your data has no residency requirements
  • You want the simplest path to sharing

Architecture at a Glance

┌─────────────────────────────────┐
│            Hub                  │
│  API + PostgreSQL + ClickHouse  │
│  + S3 + Meilisearch + libp2p   │
└──────┬──────────────┬───────────┘
       │ libp2p/QUIC  │ libp2p/QUIC
  ┌────▼────┐    ┌────▼────┐
  │ Node A  │    │ Node B  │
  │ CH + CLI │    │ CH + CLI │
  └─────────┘    └─────────┘

Queries flow: User → Hub API → Hub ClickHouse → remote() to Node ClickHouse → results aggregated at Hub.

See Architecture for the full technical breakdown.