Kubernetes & Helm
Production deployment on Kubernetes using Helm charts.
Charts Overview
| Chart | Description |
|---|---|
syndb-hub | Hub deployment (API, UI, depends on syndb-clickhouse) |
syndb-federation-node | Federation node (syndb-node, depends on syndb-clickhouse) |
syndb-clickhouse | Shared ClickHouse subchart (used by both hub and node) |
syndb-etl | ETL batch jobs (download, prepare, import, graph-precompute) |
nautilus | Umbrella chart for the NRP Nautilus cluster deployment |
Charts are located under infrastructure/helm/.
Hub Deployment
The hub chart deploys the full SynDB stack. Key values:
syndb-clickhouse:
clusterName: syndb-hub
shardRegions:
- name: dc1
region: dc1
replicas: 3
api:
image:
repository: docker.io/caniko/syndb-api
tag: "0.10.47"
flightPort: 50051
resources:
requests:
cpu: "1"
memory: 2Gi
ui:
image:
repository: docker.io/caniko/syndb-ui
tag: "0.10.47"
The chart also creates a remote_servers.xml ConfigMap for ClickHouse cluster topology.
Meilisearch on Nautilus
The Nautilus umbrella chart now deploys Meilisearch as an internal-only
production dependency for /v1/search/fulltext.
- Deployment shape: single-replica
StatefulSet - Service type:
ClusterIP - Default storage:
rook-ceph-block - Default volume:
20Gi - Public ingress: none
- Shared secret:
syndb-api-secrets.meilisearch_api_key
The API and the reconcile CronJob both receive:
MEILISEARCH_URL=http://syndb-meilisearch:7700MEILISEARCH_API_KEYfromsyndb-api-secrets
Meilisearch itself receives the same secret as MEILI_MASTER_KEY, with
MEILI_NO_ANALYTICS=true.
Reconcile job
Nautilus also deploys an hourly CronJob that runs:
syndb data search reconcile
using the lightweight oci-syndb-cli image. This is the repair mechanism for
index drift and missed write-side updates.
Rollout order
For a production cutover:
- land the code and image changes
- update
syndb-api-secretsso it containsmeilisearch_api_key - deploy the Nautilus chart
- wait for
/healthto report configured Meilisearch - run one manual reconcile job
- verify
/v1/search/fulltextthrough the public API
The manual one-shot reconcile command inside the supported devshell is:
nix develop . -c env \
POSTGRES_HOST=<host> \
POSTGRES_READ_HOST=<read-host> \
POSTGRES_PORT=<port> \
POSTGRES_USERNAME=<user> \
POSTGRES_PASSWORD=<password> \
POSTGRES_PATH=<database> \
MEILISEARCH_URL=http://syndb-meilisearch:7700 \
MEILISEARCH_API_KEY=<key> \
cargo run -p cli --features dataset -- dataset search reconcile
Node Deployment
Deploy a federation node at your institution:
syndb-clickhouse:
clusterName: syndb-node
shardRegions:
- name: dc1
region: dc1
replicas: 2
nodeApi:
enabled: true
image: syndb-api-rust:latest
flightPort: 50052
libp2pPort: 4001
hubMultiaddrs: "/ip4/<hub-ip>/udp/4001/quic-v1"
federationPassword: "<shared-secret>"
resources:
requests:
cpu: 500m
memory: 512Mi
When nodeApi.enabled=true, the chart deploys:
- A Deployment running
syndb-nodewith Flight (TCP) and libp2p (UDP) ports - A Service exposing both ports
- Environment variables auto-populated from values (cluster name, endpoints, passwords)
In Kubernetes, mDNS is disabled — use hubMultiaddrs for explicit hub discovery.
ETL Jobs
ETL runs through the syndb-etl chart values, primarily downloadJobs, prepareJobs, seed, and graphPrecompute:
syndb-etl:
image:
repository: docker.io/caniko/syndb-etl
tag: "0.10.47"
flight:
enabled: true
serverUrl: "http://syndb-api-service:80"
port: "50051"
downloadJobs:
- pipeline: hemibrain
emptyDirSizeLimit: 8Gi
downloadResources:
requests: { cpu: "500m", memory: "512Mi" }
limits: { cpu: "600m", memory: "614Mi" }
prepareJobs:
- pipeline: hemibrain
emptyDirSizeLimit: 25Gi
graphPrecompute:
enabled: true
Important: Kubernetes Jobs are immutable. Before running
helm upgradewhen resource values changed, delete failed or running ETL jobs:nix develop . -c kubectl delete job -n syndb -l app=syndb-etl --field-selector status.successful!=1
Skip override semantics: when
syndb ops k8s nautilus applyreceives explicitsyndb-etl.skipPipelines[...]flags, SynDB now unions them with bothconfig/etl-skip.ronand the live skip set derived from current ETL Jobs. Manual skip flags are additive; they do not replace the detected live skip set.
emptyDir warning:
emptyDirvolumes default to tmpfs and count against the pod’s memory cgroup limit. Add expected emptyDir data size to the memory limit.
Applying Changes
nix develop . -c cargo run -p cli --features dev -- ops k8s nautilus apply
Or manually:
nix develop . -c helm upgrade --install syndb-nautilus infrastructure/helm/nautilus/ \
-n syndb --create-namespace \
-f infrastructure/helm/nautilus/values.yaml
Pending Helm Releases
SynDB now refuses to apply when syndb-nautilus is already in one of Helm’s
pending states (pending-install, pending-upgrade, pending-rollback).
This prevents a generic:
another operation (install/upgrade/rollback) is in progress
from landing after ETL reset work has already started.
If the pending revision is newer than 10 minutes, treat it as possibly active and inspect it first:
nix develop . -c helm status syndb-nautilus -n syndb
nix develop . -c helm history syndb-nautilus -n syndb
If the pending revision is older than 10 minutes, treat it as stale and roll back to the newest deployed revision before retrying the apply.
Current example from April 19, 2026:
- revision
293was stuck inpending-upgrade - Helm reported
last_deployed = 2026-04-19T18:51:43.666197216+02:00 - the newest deployed revision was
291
Recovery:
nix develop . -c helm rollback syndb-nautilus 291 -n syndb
nix develop . -c cargo run -p cli --features dev -- ops k8s nautilus apply
QueryFabric Rollout
The QueryFabric cutover adds two PostgreSQL metadata invariants that the API now enforces at startup:
- every saved query must have
query_text - every pending query job must have
sql_plan
Use the SynDB devshell and either run the checks manually:
nix develop . -c syndb test queryfabric-full
nix develop . -c syndb test queryfabric-rollout
or use the convenience wrapper:
nix develop . -c syndb ops k8s nautilus deploy queryfabric
test-queryfabric-rollout checks the PostgreSQL environment described by the
current POSTGRES_* / POSTGRES_READ_HOST variables and performs the same
saved-query backfill step the API runs at startup. For production, point those
variables at the target metadata database before running the preflight.
deploy-bump-queryfabric is a safe wrapper over deploy-bump: it runs the full
local QueryFabric + SynDB validation path first, then the target-DB preflight,
and only then publishes images and upgrades Helm on trunk.