Federation Architecture
Components
Hub
The hub runs the full SynDB stack and coordinates the federation:
| Component | Role |
|---|---|
| syndb-api | HTTP API (port 8080) + Arrow Flight (port 50051) |
| PostgreSQL | User accounts, dataset metadata, cluster registry, job queue, benchmarks |
| ClickHouse | Local data warehouse + remote() queries to nodes |
| S3/MinIO | Mesh files, job results, ETL staging |
| Meilisearch | Full-text search index |
| HubRegistryActor | libp2p actor managing cluster registration and health |
| FederationHealthMonitor | Periodic health checks with circuit-breaker logic |
Node
Nodes are lightweight — no PostgreSQL, no S3, no Meilisearch:
| Component | Role |
|---|---|
| syndb-node | Federation daemon with Arrow Flight server (port 50052) |
| ClickHouse | Local data warehouse (HTTP port 8124, native port 9003/9440) |
| ClusterActor | libp2p actor handling hub communication |
Networking: libp2p
Federation uses libp2p for peer-to-peer communication:
- Transport: QUIC with built-in TLS 1.3 (encrypted, multiplexed)
- Discovery: mDNS for LAN (zero-config), DHT for WAN
- NAT traversal: Relay nodes for peers behind NAT
- Actor model: kameo actors manage the swarm event loop
DHT Registration
Services register under well-known names in the DHT:
| Name | Actor |
|---|---|
syndb-hub | HubRegistryActor |
syndb-cluster:{name} | ClusterActor |
The ClusterActor on each node looks up syndb-hub in the DHT to find and register with the hub.
Actor Messages
The ClusterActor handles these message types:
| Message | Direction | Purpose |
|---|---|---|
HealthPing | Hub → Node | Periodic liveness check |
SchemaSync | Hub → Node | Push DDL migrations |
DatasetCatalogRequest | Hub → Node | Discover datasets on node |
GetFlightEndpoint | Hub → Node | Resolve Flight address for data transfer |
AnalyticsQuery | Hub → Node | Delegated analytics computation |
OntologySync | Hub → Node | Push ontology terms |
Data Plane
Two mechanisms move data between hub and nodes:
ClickHouse remote()
For SQL queries, the hub compiles a remote('node-host:port', 'syndb', 'table', 'user', 'password') call that executes directly on the node’s ClickHouse and streams results back.
Arrow Flight (Internal)
For large result sets and non-SQL workloads (graph analysis, analytics), the hub delegates to the node’s internal Flight server (port 50052). Results stream back as Arrow IPC batches.
Schema Versioning
Each ClickHouse DDL migration has a version number. The hub tracks the current version and each node’s version:
- Hub receives a schema sync request (
POST /v1/federation/schema/sync) - Hub sends pending migrations to each active node via
SchemaSyncmessage - Nodes apply migrations and report their new version
- Queries only route to nodes whose schema version is compatible
Health Monitoring
The FederationHealthMonitorActor runs on the hub:
| State | Meaning | Query routing |
|---|---|---|
| Healthy | Responds to pings, schema compatible | Included |
| Degraded | Responds but slow or partially failing | Included with lower priority |
| Unreachable | Failed consecutive pings | Excluded |
| Unknown | Newly registered, not yet checked | Excluded until first successful ping |
Health transitions are logged and stored in PostgreSQL for audit.
Concurrency Model
- Lock-free reads: The hub’s cluster registry uses
papayaconcurrent hash maps — reads never block, even under high query load - Actor isolation: Each cluster connection is managed by its own actor, preventing one slow node from blocking others
- Supervisor trees: Actor failures are caught and restarted by the kameo supervisor