Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Cross-Cluster Queries

Federation queries let you analyze data across all participating nodes from a single API call.

How It Works

  1. User submits a query via SyQL or meta-analysis endpoint with federation scope
  2. Hub resolves targets — checks dataset locality index to determine which nodes hold relevant data
  3. Hub compiles remote queries — generates ClickHouse remote('node:port', 'syndb', 'table', 'user', 'pass') calls
  4. Nodes execute locally — each node runs its portion of the query against local data
  5. Hub aggregates — results stream back and are merged at the hub

SyQL with Federation Scope

SyQL queries can target the federation by specifying scope in the request:

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://api.syndb.xyz/v1/syql/exec \
  -d '{
    "query": "SELECT neuron FROM neurons WHERE brain_region = '\''mushroom_body'\''",
    "scope": "federation"
  }'

The hub transparently fans the query out to nodes that hold matching datasets.

Meta-Analysis Across Clusters

Specify cluster_ids to include specific nodes:

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://api.syndb.xyz/v1/meta-analysis/analyze \
  -d '{
    "table": "neurons",
    "metric": "mesh_volume",
    "group_by": "brain_region",
    "scope": "federation",
    "cluster_ids": ["uuid-1", "uuid-2"]
  }'

Omit cluster_ids to query all healthy clusters.

Data Plane: Arrow Flight

For large result sets and non-SQL workloads (graph analysis, analytics), the hub delegates to each node’s internal Flight server:

  • Hub sends a Flight DoGet request to the node’s advertised Flight endpoint (default port 50052)
  • Results stream back as Arrow IPC record batches
  • The hub merges batches from multiple nodes before returning to the client

Limitations

ConstraintDetail
LatencyCross-cluster queries add network round-trip time per node
Schema compatibilityNodes must be at a compatible schema version; incompatible nodes are excluded
Node healthOnly Healthy and Degraded nodes receive queries; Unreachable nodes are skipped
Delegation timeoutDefault 30s (FEDERATION_DELEGATION_TIMEOUT_SECS); long-running queries may need async jobs
No cross-node joinsEach node executes independently; joins happen only against local data

Best Practices

  • Use async jobs (POST /v1/jobs) for large federation queries to avoid HTTP timeouts
  • Check federation status before running large queries to know which nodes are available
  • Prefer meta-analysis endpoints for cross-dataset aggregation — they handle fan-out efficiently
  • Monitor benchmarks to track federation query performance over time