SyQL Query Language
SyQL (SynDB Query Language) is a declarative query language for neuroanatomical data. It resolves dataset metadata into optimized ClickHouse SQL, handles access control, and submits queries to the async job system.
Requires Academic verification.
Quick Start
SyQL follows familiar SQL syntax. The simplest query:
FROM neurons LIMIT 10
A more typical query:
SELECT neuron_id, cable_length, cell_type
FROM neurons
WHERE species = 'rat' AND cable_length > 100
ORDER BY cable_length DESC
LIMIT 1000
Query Structure
A full SyQL query can include:
[WITH cte_name AS (...), ...]
SELECT [DISTINCT] columns
FROM table [AS alias]
[JOIN table [AS alias] ON conditions]
[WHERE predicates]
[GROUP BY columns]
[HAVING predicates]
[ORDER BY columns [ASC|DESC]]
[LIMIT n [OFFSET m]]
[SCOPE local|remote|federation]
[DOWNLOAD arrow|parquet|csv]
SELECT is optional — FROM neurons LIMIT 10 is equivalent to SELECT * FROM neurons LIMIT 10.
Tables
Base Compartment Tables
These are the primary data tables:
| Table | Primary Key | Description |
|---|---|---|
neurons | neuron_id | Neuron morphology and metadata |
neuron_relations | relation_id | Directed non-synaptic neuron-to-neuron relations |
synapses | synapse_id | Synaptic connections between neurons |
axons | axon_id | Axonal segments |
dendrites | dendrite_id | Dendritic segments |
dendritic_spines | spine_id | Dendritic spines |
pre_synaptic_terminals | terminal_id | Pre-synaptic terminals |
vesicles | vesicle_id | Synaptic vesicles |
mitochondria | mitochondria_id | Mitochondria |
Every table includes dataset_id, created_at, and metadata columns.
Key Columns
neurons — neuron_id, name, brain_structure, polarity, cell_type, cell_class, cell_subclass, species, majority_neurotransmitter, gaba_avg, acetylcholine_avg, glutamate_avg, octopamine_avg, serotonin_avg, dopamine_avg, tyramine_avg, betaine_avg, cable_length, is_tree, n_branches, n_skeletons, n_trees, surface_area, max_axis_length, volume, voxel_volume, voxel_radius, mesh_volume, mesh_surface_area, mesh_area_volume_ratio, mesh_sphericity, centroid_x, centroid_y, centroid_z, s3_mesh_location, s3_swb_location
neuron_relations — relation_id, pre_neuron_id, post_neuron_id, relation_type, neurotransmitter, strength, relation_count
synapses — synapse_id, pre_neuron_id, post_neuron_id, synapse_type, neurotransmitter, strength, synapse_count, centroid_x, centroid_y, centroid_z
Use POST /v1/syql/plan to inspect the resolved plan before execution. Current plan and explain responses include the logical plan, SQL preview or compiled SQL, optional rewrite target, rewrite advisories, optional federation scatter and gather SQL, optional query_id, and optional typed result_schema.
Materialized Views
Materialized views store pre-aggregated data for fast analytics:
| View | Use Case |
|---|---|
mv_dataset_summary | Row counts per compartment per dataset |
mv_neuron_morphometrics | Morphometric averages/stddev per dataset |
mv_neuron_stats | Neuron stats grouped by dataset + cell type |
mv_neuron_out_degree | Outgoing connections per neuron |
mv_neuron_in_degree | Incoming connections per neuron |
mv_platform_neuron_stats | Global (platform-wide) neuron statistics |
mv_synapse_connectivity | Synapse statistics per dataset |
mv_synapse_stats | Synapse count/strength aggregates |
mv_spatial_density | Spatial binning (bin_x, bin_y, bin_z) |
mv_neurotransmitter_profile | Neurotransmitter concentrations |
mv_vesicle_distribution | Vesicle count/volume distributions |
mv_mitochondria_stats | Mitochondria volume/count statistics |
mv_vesicle_diameter_histogram | Vesicle diameter binning |
mv_mitochondria_volume_histogram | Mitochondria volume binning |
mv_nt_by_region | Neurotransmitter by brain region |
MV columns that store intermediate aggregation state are automatically finalized — e.g., row_count becomes countMerge(row_count) in the compiled SQL. You don’t need to handle this yourself.
Precomputed Tables
Graph analysis results stored by the ETL pipeline:
| Table | Description |
|---|---|
precomputed_graph_summary | Network-level statistics (density, reciprocity, clustering, motifs) |
precomputed_degree_histogram | Degree distribution (in/out) |
precomputed_celltype_connectivity | Cell-type-to-cell-type connectivity matrix |
precomputed_bottleneck_neurons | Articulation point annotations |
precomputed_clique_detail | Maximal cliques with cell-type composition |
precomputed_dual_network | Chemical vs. electrical subnetwork metrics |
precomputed_developmental_metrics | Per-stage developmental metrics |
Views
| View | Description |
|---|---|
vw_celltype_connectivity | Cell-type connectivity (non-materialized view) |
Filtering (WHERE)
Data Column Filters
Standard SQL comparison operators:
WHERE cable_length > 100
WHERE cell_type = 'pyramidal'
WHERE cell_type != 'unknown'
WHERE cable_length BETWEEN 50 AND 200
WHERE cell_type IN ('pyramidal', 'interneuron', 'stellate')
WHERE name LIKE '%mushroom%'
WHERE cell_class IS NULL
WHERE cell_class IS NOT NULL
Boolean Logic
Combine predicates with AND, OR, NOT, and parentheses:
WHERE (cable_length > 100 AND volume < 500)
OR (cell_type = 'pyramidal' AND NOT cell_class IS NULL)
Metadata Filters
These special columns filter by dataset metadata — they resolve against PostgreSQL and restrict which dataset_id values are included:
| Column | Aliases | Resolves to |
|---|---|---|
species | — | dataset.animal_species |
brain_region | brain_structure | dataset_brain_region junction |
license | data_license | dataset.data_license |
microscopy | microscopy_name | dataset.microscopy_name |
cluster_name | — | federated_cluster.name |
SELECT neuron_id, cable_length
FROM neurons
WHERE species = 'mouse' AND brain_region = 'mushroom_body'
LIMIT 1000
FROM neurons WHERE species IN ('mouse', 'rat') LIMIT 1000
Expression Filters
Arithmetic and function calls work in WHERE:
WHERE cable_length * 2 > 500
WHERE SQRT(volume) < 100
WHERE ABS(centroid_x - centroid_y) < 10
Subquery Filters
WHERE neuron_id IN (SELECT pre_neuron_id FROM synapses WHERE strength > 5)
WHERE neuron_id NOT IN (SELECT neuron_id FROM axons)
SELECT Expressions
Columns and Aliases
SELECT neuron_id, cable_length AS length, cell_type
FROM neurons
LIMIT 100
Arithmetic
SELECT neuron_id,
mesh_volume / mesh_surface_area AS volume_to_area,
cable_length * 1000 AS cable_length_nm
FROM neurons
LIMIT 100
Operators: +, -, *, /, %
CASE Expressions
SELECT neuron_id,
CASE
WHEN cable_length > 1000 THEN 'long'
WHEN cable_length > 100 THEN 'medium'
ELSE 'short'
END AS size_class
FROM neurons
LIMIT 100
Simple form:
CASE cell_type
WHEN 'pyramidal' THEN 'excitatory'
WHEN 'interneuron' THEN 'inhibitory'
ELSE 'other'
END
DISTINCT
SELECT DISTINCT cell_type, brain_structure
FROM neurons
Aggregate Functions
| Function | Description |
|---|---|
COUNT(*) | Count rows |
COUNT(column) | Count non-null values |
COUNT(DISTINCT column) | Count unique values |
SUM(column) | Sum |
AVG(column) | Mean |
MIN(column) | Minimum |
MAX(column) | Maximum |
STDDEV_POP(column) | Population standard deviation |
VAR_POP(column) | Population variance |
QUANTILE(p)(column) | Quantile at level p (0.0–1.0) |
MEDIAN(column) | Median (alias for QUANTILE(0.5)) |
CORR(col1, col2) | Pearson correlation (two columns) |
SELECT dataset_id,
COUNT(*) AS neuron_count,
AVG(cable_length) AS avg_cable,
STDDEV_POP(cable_length) AS std_cable,
MEDIAN(mesh_volume) AS median_volume
FROM neurons
GROUP BY dataset_id
GROUP BY and HAVING
Group rows and filter groups:
SELECT cell_type, COUNT(*) AS n, AVG(cable_length) AS avg_cable
FROM neurons
WHERE dataset_id = '...'
GROUP BY cell_type
HAVING COUNT(*) > 10
ORDER BY avg_cable DESC
HAVING supports arithmetic, function calls, and subqueries:
HAVING STDDEV_POP(cable_length) < 50
HAVING COUNT(*) > (SELECT COUNT(*) / 100 FROM neurons WHERE dataset_id = '...')
Scalar Functions
Numeric
| Function | Description |
|---|---|
ABS(x) | Absolute value |
ROUND(x [, precision]) | Round |
FLOOR(x) | Round down |
CEIL(x) / CEILING(x) | Round up |
SQRT(x) | Square root |
POWER(x, y) / POW(x, y) | Exponentiation |
GREATEST(a, b, ...) | Maximum of arguments |
LEAST(a, b, ...) | Minimum of arguments |
Conditional
| Function | Description |
|---|---|
IF(cond, then, else) | Ternary conditional |
IFNULL(value, default) | Null coalescing |
COALESCE(a, b, ...) | First non-null value |
NULLIF(a, b) | Returns null if a = b |
String
| Function | Description |
|---|---|
LENGTH(s) | String length |
LOWER(s) | Lowercase |
UPPER(s) | Uppercase |
TRIM(s) | Strip whitespace |
CONCAT(a, b, ...) | Concatenate strings |
SUBSTRING(s, pos [, len]) / SUBSTR(...) | Extract substring |
Type Casting
| Function | Description |
|---|---|
TOFLOAT64(x) | Cast to Float64 |
TOINT32(x) | Cast to Int32 |
TOINT64(x) | Cast to Int64 |
TOUINT64(x) | Cast to UInt64 |
TOSTRING(x) | Cast to String |
Window Functions
Window functions compute values across a set of rows related to the current row.
Syntax
function(...) OVER (
[PARTITION BY expr, ...]
[ORDER BY expr [ASC|DESC], ...]
[ROWS|RANGE BETWEEN start AND end]
)
Ranking Functions
SELECT neuron_id,
cable_length,
RANK() OVER (PARTITION BY dataset_id ORDER BY cable_length DESC) AS rank,
DENSE_RANK() OVER (PARTITION BY dataset_id ORDER BY cable_length DESC) AS dense_rank,
ROW_NUMBER() OVER (PARTITION BY dataset_id ORDER BY cable_length DESC) AS row_num
FROM neurons
Offset Functions
SELECT neuron_id,
cable_length,
LAG(cable_length) OVER (ORDER BY neuron_id) AS prev_cable,
LEAD(cable_length, 2) OVER (ORDER BY neuron_id) AS next_2_cable,
FIRST_VALUE(cable_length) OVER (PARTITION BY dataset_id ORDER BY neuron_id) AS first_cable,
LAST_VALUE(cable_length) OVER (PARTITION BY dataset_id ORDER BY neuron_id) AS last_cable
FROM neurons
Window Frames
SUM(cable_length) OVER (
ORDER BY neuron_id
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
Frame bounds: UNBOUNDED PRECEDING, N PRECEDING, CURRENT ROW, N FOLLOWING, UNBOUNDED FOLLOWING.
Frame units: ROWS or RANGE.
Aggregates as Window Functions
Any aggregate function can be used with OVER:
SELECT neuron_id,
cable_length,
AVG(cable_length) OVER (PARTITION BY dataset_id) AS dataset_avg
FROM neurons
JOINs
Supported Join Types
INNER JOIN(or justJOIN)LEFT JOINRIGHT JOINFULL OUTER JOINCROSS JOIN
ON Conditions
ON clauses support equality conditions chained with AND:
SELECT n.neuron_id, n.cable_length, s.strength
FROM neurons AS n
INNER JOIN synapses AS s
ON n.dataset_id = s.dataset_id AND n.neuron_id = s.pre_neuron_id
WHERE n.dataset_id = '...'
LIMIT 1000
Self-Joins
Useful for analyzing reciprocal connections:
SELECT
count() / 2 AS reciprocal_pairs,
toFloat64(count()) / 2.0
/ greatest(toFloat64((SELECT count() FROM synapses WHERE dataset_id = '...')), 1.0)
AS reciprocity
FROM synapses AS s1
INNER JOIN synapses AS s2
ON s1.dataset_id = s2.dataset_id
AND s1.pre_neuron_id = s2.post_neuron_id
AND s1.post_neuron_id = s2.pre_neuron_id
WHERE s1.dataset_id = '...'
AND s1.pre_neuron_id < s1.post_neuron_id
Joining Materialized Views
SELECT o.neuron_id, out_degree, in_degree
FROM mv_neuron_out_degree AS o
FULL OUTER JOIN mv_neuron_in_degree AS i
ON o.dataset_id = i.dataset_id AND o.neuron_id = i.neuron_id
WHERE o.dataset_id = '...' OR i.dataset_id = '...'
GROUP BY o.neuron_id, i.neuron_id
ORDER BY (out_degree + in_degree) DESC
LIMIT 100
CROSS JOIN with Subqueries
Useful for z-score comparisons against global statistics:
SELECT
ds.dataset_id,
(AVG(ds.cable_length) - global.global_avg)
/ greatest(global.global_std, 0.0000000001) AS zscore
FROM neurons AS ds
CROSS JOIN (
SELECT AVG(cable_length) AS global_avg, STDDEV_POP(cable_length) AS global_std
FROM neurons
) AS global
WHERE ds.dataset_id IN ('...')
GROUP BY ds.dataset_id
CTEs (WITH Clauses)
Common Table Expressions let you name intermediate result sets:
WITH all_neurons AS (
SELECT pre_neuron_id AS neuron_id FROM synapses WHERE dataset_id = '...'
UNION ALL
SELECT post_neuron_id FROM synapses WHERE dataset_id = '...'
)
SELECT
neuron_count,
edge_count,
toFloat64(edge_count) / greatest(toFloat64(neuron_count) * (toFloat64(neuron_count) - 1), 1) AS density
FROM (
SELECT
(SELECT COUNT(DISTINCT neuron_id) FROM all_neurons) AS neuron_count,
count() AS edge_count
FROM synapses
WHERE dataset_id = '...'
) AS t
Later CTEs can reference earlier ones. WITH RECURSIVE is not supported.
UNION ALL
Combine multiple queries:
SELECT 'pre' AS direction, pre_neuron_id AS neuron_id FROM synapses WHERE dataset_id = '...'
UNION ALL
SELECT 'post' AS direction, post_neuron_id AS neuron_id FROM synapses WHERE dataset_id = '...'
ORDER BY neuron_id
LIMIT 1000
Only UNION ALL is supported. UNION (without ALL), EXCEPT, and INTERSECT are not available. Outer ORDER BY, LIMIT, and OFFSET apply across the combined result.
Ordering and Pagination
ORDER BY cable_length DESC
LIMIT 100
OFFSET 200
ORDER BY supports expressions:
ORDER BY cable_length * 2 DESC
ORDER BY SQRT(volume) ASC
Parameter Binding
Parameterized queries prevent injection and allow reuse.
Positional Parameters
Use ? placeholders — they are assigned 1-based indices left to right:
{
"query": "SELECT neuron_id, cable_length FROM neurons WHERE cable_length > ? AND volume < ? LIMIT ?",
"params": [100, 999.5, 1000]
}
Named Parameters
Use :name placeholders:
{
"query": "SELECT neuron_id FROM neurons WHERE cell_type = :ct AND species = :species LIMIT 1000",
"named_params": {"ct": "pyramidal", "species": "rat"}
}
Parameters can be used in WHERE, SELECT expressions, HAVING, ORDER BY, and function arguments.
Output Format (DOWNLOAD)
Control the result format:
SELECT neuron_id, cable_length FROM neurons LIMIT 1000 DOWNLOAD csv
| Format | Description |
|---|---|
arrow | Apache Arrow (default) |
parquet | Apache Parquet |
csv | CSV |
Federation Scope (SCOPE)
Control where the query executes:
SELECT COUNT(*) FROM neurons GROUP BY dataset_id SCOPE federation
| Scope | Description |
|---|---|
local | Execute on the local database (default) |
remote | Execute on a remote federated cluster |
federation | Scatter/gather across all federated nodes |
When using federation scope, the hub automatically decomposes the query into scatter SQL (sent to each node) and gather SQL (merged locally). See Cross-Cluster Queries.
Automatic MV Rewriting
SyQL automatically rewrites queries to use materialized views when possible. For example:
SELECT AVG(cable_length) FROM neurons GROUP BY dataset_id
This is automatically rewritten to query mv_neuron_morphometrics instead of scanning the full neurons table — significantly faster for large datasets.
Use the explain endpoint to see whether your query was rewritten and which MV was selected. The response includes advisories explaining why alternative MVs were rejected.
API Workflow
SyQL has a three-stage pipeline:
| Stage | Endpoint | What it does |
|---|---|---|
| Plan | POST /v1/syql/plan | Parse → validate → resolve metadata → return logical plan |
| Explain | POST /v1/syql/explain | Plan + compile to SQL → return compiled query and advisories |
| Execute | POST /v1/syql/exec | Plan + compile + submit to job queue → return job ID |
Use plan to validate syntax and inspect the resolved schema. Use explain to preview the generated SQL before committing to execution. Use exec when you’re ready to run.
Plan
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://api.syndb.xyz/v1/syql/plan \
-d '{"query": "SELECT neuron_id, cable_length FROM neurons WHERE species = '\''mouse'\'' LIMIT 100"}'
Returns the parsed logical plan: resolved tables, columns, filters, metadata, and result schema.
Explain
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://api.syndb.xyz/v1/syql/explain \
-d '{"query": "SELECT AVG(cable_length) FROM neurons GROUP BY dataset_id"}'
Returns:
- The compiled ClickHouse SQL (with any MV rewrites applied)
- Query advisories and any selected MV rewrite target
- Optional federation
scatter_sqlandgather_sql - Optional
query_idand typedresult_schema
Execute
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://api.syndb.xyz/v1/syql/exec \
-d '{"query": "SELECT neuron_id, cable_length FROM neurons WHERE species = '\''mouse'\'' ORDER BY cable_length DESC LIMIT 1000"}'
Returns a job_id. Track and download results via the Jobs System.
Cancel
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://api.syndb.xyz/v1/syql/cancel \
-d '{"query_id": "..."}'
The query_id may be returned by plan, explain, or execute when the query can be cancelled through ClickHouse.
Examples
Neuron morphometrics per dataset
SELECT dataset_id,
COUNT(*) AS n,
AVG(cable_length) AS avg_cable,
STDDEV_POP(cable_length) AS std_cable,
AVG(mesh_volume) AS avg_volume,
AVG(mesh_sphericity) AS avg_sphericity
FROM neurons
GROUP BY dataset_id
ORDER BY n DESC
Top connected neurons
SELECT o.neuron_id, out_degree, in_degree,
(out_degree + in_degree) AS total_degree
FROM mv_neuron_out_degree AS o
FULL OUTER JOIN mv_neuron_in_degree AS i
ON o.dataset_id = i.dataset_id AND o.neuron_id = i.neuron_id
WHERE o.dataset_id = '...'
GROUP BY o.neuron_id, i.neuron_id
ORDER BY total_degree DESC
LIMIT 50
Z-score comparison across datasets
SELECT
toString(ds.dataset_id) AS dataset_id,
'cable_length' AS metric,
(ds.dataset_avg - global.global_avg)
/ greatest(global.global_std, 0.0000000001) AS zscore
FROM (
SELECT dataset_id, AVG(cable_length) AS dataset_avg
FROM neurons
WHERE dataset_id IN ('uuid1', 'uuid2')
GROUP BY dataset_id
) AS ds
CROSS JOIN (
SELECT AVG(cable_length) AS global_avg, STDDEV_POP(cable_length) AS global_std
FROM neurons
) AS global
Network reciprocity
SELECT
count() / 2 AS reciprocal_pairs,
toFloat64(count()) / 2.0
/ greatest(toFloat64((SELECT count() FROM synapses WHERE dataset_id = '...')), 1.0)
AS reciprocity
FROM synapses AS s1
INNER JOIN synapses AS s2
ON s1.dataset_id = s2.dataset_id
AND s1.pre_neuron_id = s2.post_neuron_id
AND s1.post_neuron_id = s2.pre_neuron_id
WHERE s1.dataset_id = '...'
AND s1.pre_neuron_id < s1.post_neuron_id
Graph density with CTEs
WITH all_neurons AS (
SELECT pre_neuron_id AS neuron_id FROM synapses WHERE dataset_id = '...'
UNION ALL
SELECT post_neuron_id FROM synapses WHERE dataset_id = '...'
)
SELECT
neuron_count,
edge_count,
avg_strength,
toFloat64(edge_count)
/ greatest(toFloat64(neuron_count) * (toFloat64(neuron_count) - 1), 1)
AS density
FROM (
SELECT
(SELECT COUNT(DISTINCT neuron_id) FROM all_neurons) AS neuron_count,
count() AS edge_count,
avg(strength) AS avg_strength
FROM synapses
WHERE dataset_id = '...'
) AS t
Ranking neurons within a dataset
SELECT neuron_id, cable_length,
RANK() OVER (ORDER BY cable_length DESC) AS rank
FROM neurons
WHERE dataset_id = '...'
LIMIT 100
Parameterized query
{
"query": "SELECT neuron_id, cable_length, cell_type FROM neurons WHERE cell_type = :ct AND cable_length > :min_cable ORDER BY cable_length DESC LIMIT :n",
"named_params": {"ct": "pyramidal", "min_cable": 500, "n": 100}
}
Unsupported Features
These SQL features are intentionally not supported:
INSERT,UPDATE,DELETE,CREATE,DROP— SyQL is read-onlyWITH RECURSIVE— recursive CTEsUNION(without ALL),EXCEPT,INTERSECT— onlyUNION ALLis availableGROUPSwindow frame unit — onlyROWSandRANGE- Arbitrary ClickHouse functions — only whitelisted functions are allowed
Saved Queries
Frequently used SyQL queries can be saved for reuse. See Saved Queries.