Cross-Cluster Queries
Federation queries let you analyze data across all participating nodes from a single API call.
How It Works
- User submits a query via SyQL or meta-analysis endpoint with federation scope
- Hub resolves targets — checks dataset locality index to determine which nodes hold relevant data
- Hub compiles remote queries — generates ClickHouse
remote('node:port', 'syndb', 'table', 'user', 'pass')calls - Nodes execute locally — each node runs its portion of the query against local data
- Hub aggregates — results stream back and are merged at the hub
SyQL with Federation Scope
SyQL queries can target the federation by specifying scope in the request:
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://api.syndb.xyz/v1/syql/exec \
-d '{
"query": "SELECT neuron FROM neurons WHERE brain_region = '\''mushroom_body'\''",
"scope": "federation"
}'
The hub transparently fans the query out to nodes that hold matching datasets.
Meta-Analysis Across Clusters
Specify cluster_ids to include specific nodes:
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://api.syndb.xyz/v1/meta-analysis/analyze \
-d '{
"table": "neurons",
"metric": "mesh_volume",
"group_by": "brain_region",
"scope": "federation",
"cluster_ids": ["uuid-1", "uuid-2"]
}'
Omit cluster_ids to query all healthy clusters.
Data Plane: Arrow Flight
For large result sets and non-SQL workloads (graph analysis, analytics), the hub delegates to each node’s internal Flight server:
- Hub sends a Flight
DoGetrequest to the node’s advertised Flight endpoint (default port 50052) - Results stream back as Arrow IPC record batches
- The hub merges batches from multiple nodes before returning to the client
Limitations
| Constraint | Detail |
|---|---|
| Latency | Cross-cluster queries add network round-trip time per node |
| Schema compatibility | Nodes must be at a compatible schema version; incompatible nodes are excluded |
| Node health | Only Healthy and Degraded nodes receive queries; Unreachable nodes are skipped |
| Delegation timeout | Default 30s (FEDERATION_DELEGATION_TIMEOUT_SECS); long-running queries may need async jobs |
| No cross-node joins | Each node executes independently; joins happen only against local data |
Best Practices
- Use async jobs (
POST /v1/jobs) for large federation queries to avoid HTTP timeouts - Check federation status before running large queries to know which nodes are available
- Prefer meta-analysis endpoints for cross-dataset aggregation — they handle fan-out efficiently
- Monitor benchmarks to track federation query performance over time