Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

SyQL Query Language

SyQL (SynDB Query Language) is a declarative query language for neuroanatomical data. It resolves dataset metadata into optimized ClickHouse SQL, handles access control, and submits queries to the async job system.

Requires Academic verification.

Quick Start

SyQL follows familiar SQL syntax. The simplest query:

FROM neurons LIMIT 10

A more typical query:

SELECT neuron_id, cable_length, cell_type
FROM neurons
WHERE species = 'rat' AND cable_length > 100
ORDER BY cable_length DESC
LIMIT 1000

Query Structure

A full SyQL query can include:

[WITH cte_name AS (...), ...]
SELECT [DISTINCT] columns
FROM table [AS alias]
  [JOIN table [AS alias] ON conditions]
[WHERE predicates]
[GROUP BY columns]
[HAVING predicates]
[ORDER BY columns [ASC|DESC]]
[LIMIT n [OFFSET m]]
[SCOPE local|remote|federation]
[DOWNLOAD arrow|parquet|csv]

SELECT is optional — FROM neurons LIMIT 10 is equivalent to SELECT * FROM neurons LIMIT 10.


Tables

Base Compartment Tables

These are the primary data tables:

TablePrimary KeyDescription
neuronsneuron_idNeuron morphology and metadata
neuron_relationsrelation_idDirected non-synaptic neuron-to-neuron relations
synapsessynapse_idSynaptic connections between neurons
axonsaxon_idAxonal segments
dendritesdendrite_idDendritic segments
dendritic_spinesspine_idDendritic spines
pre_synaptic_terminalsterminal_idPre-synaptic terminals
vesiclesvesicle_idSynaptic vesicles
mitochondriamitochondria_idMitochondria

Every table includes dataset_id, created_at, and metadata columns.

Key Columns

neuronsneuron_id, name, brain_structure, polarity, cell_type, cell_class, cell_subclass, species, majority_neurotransmitter, gaba_avg, acetylcholine_avg, glutamate_avg, octopamine_avg, serotonin_avg, dopamine_avg, tyramine_avg, betaine_avg, cable_length, is_tree, n_branches, n_skeletons, n_trees, surface_area, max_axis_length, volume, voxel_volume, voxel_radius, mesh_volume, mesh_surface_area, mesh_area_volume_ratio, mesh_sphericity, centroid_x, centroid_y, centroid_z, s3_mesh_location, s3_swb_location

neuron_relationsrelation_id, pre_neuron_id, post_neuron_id, relation_type, neurotransmitter, strength, relation_count

synapsessynapse_id, pre_neuron_id, post_neuron_id, synapse_type, neurotransmitter, strength, synapse_count, centroid_x, centroid_y, centroid_z

Use POST /v1/syql/plan to inspect the resolved plan before execution. Current plan and explain responses include the logical plan, SQL preview or compiled SQL, optional rewrite target, rewrite advisories, optional federation scatter and gather SQL, optional query_id, and optional typed result_schema.

Materialized Views

Materialized views store pre-aggregated data for fast analytics:

ViewUse Case
mv_dataset_summaryRow counts per compartment per dataset
mv_neuron_morphometricsMorphometric averages/stddev per dataset
mv_neuron_statsNeuron stats grouped by dataset + cell type
mv_neuron_out_degreeOutgoing connections per neuron
mv_neuron_in_degreeIncoming connections per neuron
mv_platform_neuron_statsGlobal (platform-wide) neuron statistics
mv_synapse_connectivitySynapse statistics per dataset
mv_synapse_statsSynapse count/strength aggregates
mv_spatial_densitySpatial binning (bin_x, bin_y, bin_z)
mv_neurotransmitter_profileNeurotransmitter concentrations
mv_vesicle_distributionVesicle count/volume distributions
mv_mitochondria_statsMitochondria volume/count statistics
mv_vesicle_diameter_histogramVesicle diameter binning
mv_mitochondria_volume_histogramMitochondria volume binning
mv_nt_by_regionNeurotransmitter by brain region

MV columns that store intermediate aggregation state are automatically finalized — e.g., row_count becomes countMerge(row_count) in the compiled SQL. You don’t need to handle this yourself.

Precomputed Tables

Graph analysis results stored by the ETL pipeline:

TableDescription
precomputed_graph_summaryNetwork-level statistics (density, reciprocity, clustering, motifs)
precomputed_degree_histogramDegree distribution (in/out)
precomputed_celltype_connectivityCell-type-to-cell-type connectivity matrix
precomputed_bottleneck_neuronsArticulation point annotations
precomputed_clique_detailMaximal cliques with cell-type composition
precomputed_dual_networkChemical vs. electrical subnetwork metrics
precomputed_developmental_metricsPer-stage developmental metrics

Views

ViewDescription
vw_celltype_connectivityCell-type connectivity (non-materialized view)

Filtering (WHERE)

Data Column Filters

Standard SQL comparison operators:

WHERE cable_length > 100
WHERE cell_type = 'pyramidal'
WHERE cell_type != 'unknown'
WHERE cable_length BETWEEN 50 AND 200
WHERE cell_type IN ('pyramidal', 'interneuron', 'stellate')
WHERE name LIKE '%mushroom%'
WHERE cell_class IS NULL
WHERE cell_class IS NOT NULL

Boolean Logic

Combine predicates with AND, OR, NOT, and parentheses:

WHERE (cable_length > 100 AND volume < 500)
   OR (cell_type = 'pyramidal' AND NOT cell_class IS NULL)

Metadata Filters

These special columns filter by dataset metadata — they resolve against PostgreSQL and restrict which dataset_id values are included:

ColumnAliasesResolves to
speciesdataset.animal_species
brain_regionbrain_structuredataset_brain_region junction
licensedata_licensedataset.data_license
microscopymicroscopy_namedataset.microscopy_name
cluster_namefederated_cluster.name
SELECT neuron_id, cable_length
FROM neurons
WHERE species = 'mouse' AND brain_region = 'mushroom_body'
LIMIT 1000
FROM neurons WHERE species IN ('mouse', 'rat') LIMIT 1000

Expression Filters

Arithmetic and function calls work in WHERE:

WHERE cable_length * 2 > 500
WHERE SQRT(volume) < 100
WHERE ABS(centroid_x - centroid_y) < 10

Subquery Filters

WHERE neuron_id IN (SELECT pre_neuron_id FROM synapses WHERE strength > 5)
WHERE neuron_id NOT IN (SELECT neuron_id FROM axons)

SELECT Expressions

Columns and Aliases

SELECT neuron_id, cable_length AS length, cell_type
FROM neurons
LIMIT 100

Arithmetic

SELECT neuron_id,
       mesh_volume / mesh_surface_area AS volume_to_area,
       cable_length * 1000 AS cable_length_nm
FROM neurons
LIMIT 100

Operators: +, -, *, /, %

CASE Expressions

SELECT neuron_id,
       CASE
         WHEN cable_length > 1000 THEN 'long'
         WHEN cable_length > 100 THEN 'medium'
         ELSE 'short'
       END AS size_class
FROM neurons
LIMIT 100

Simple form:

CASE cell_type
  WHEN 'pyramidal' THEN 'excitatory'
  WHEN 'interneuron' THEN 'inhibitory'
  ELSE 'other'
END

DISTINCT

SELECT DISTINCT cell_type, brain_structure
FROM neurons

Aggregate Functions

FunctionDescription
COUNT(*)Count rows
COUNT(column)Count non-null values
COUNT(DISTINCT column)Count unique values
SUM(column)Sum
AVG(column)Mean
MIN(column)Minimum
MAX(column)Maximum
STDDEV_POP(column)Population standard deviation
VAR_POP(column)Population variance
QUANTILE(p)(column)Quantile at level p (0.0–1.0)
MEDIAN(column)Median (alias for QUANTILE(0.5))
CORR(col1, col2)Pearson correlation (two columns)
SELECT dataset_id,
       COUNT(*) AS neuron_count,
       AVG(cable_length) AS avg_cable,
       STDDEV_POP(cable_length) AS std_cable,
       MEDIAN(mesh_volume) AS median_volume
FROM neurons
GROUP BY dataset_id

GROUP BY and HAVING

Group rows and filter groups:

SELECT cell_type, COUNT(*) AS n, AVG(cable_length) AS avg_cable
FROM neurons
WHERE dataset_id = '...'
GROUP BY cell_type
HAVING COUNT(*) > 10
ORDER BY avg_cable DESC

HAVING supports arithmetic, function calls, and subqueries:

HAVING STDDEV_POP(cable_length) < 50
HAVING COUNT(*) > (SELECT COUNT(*) / 100 FROM neurons WHERE dataset_id = '...')

Scalar Functions

Numeric

FunctionDescription
ABS(x)Absolute value
ROUND(x [, precision])Round
FLOOR(x)Round down
CEIL(x) / CEILING(x)Round up
SQRT(x)Square root
POWER(x, y) / POW(x, y)Exponentiation
GREATEST(a, b, ...)Maximum of arguments
LEAST(a, b, ...)Minimum of arguments

Conditional

FunctionDescription
IF(cond, then, else)Ternary conditional
IFNULL(value, default)Null coalescing
COALESCE(a, b, ...)First non-null value
NULLIF(a, b)Returns null if a = b

String

FunctionDescription
LENGTH(s)String length
LOWER(s)Lowercase
UPPER(s)Uppercase
TRIM(s)Strip whitespace
CONCAT(a, b, ...)Concatenate strings
SUBSTRING(s, pos [, len]) / SUBSTR(...)Extract substring

Type Casting

FunctionDescription
TOFLOAT64(x)Cast to Float64
TOINT32(x)Cast to Int32
TOINT64(x)Cast to Int64
TOUINT64(x)Cast to UInt64
TOSTRING(x)Cast to String

Window Functions

Window functions compute values across a set of rows related to the current row.

Syntax

function(...) OVER (
  [PARTITION BY expr, ...]
  [ORDER BY expr [ASC|DESC], ...]
  [ROWS|RANGE BETWEEN start AND end]
)

Ranking Functions

SELECT neuron_id,
       cable_length,
       RANK() OVER (PARTITION BY dataset_id ORDER BY cable_length DESC) AS rank,
       DENSE_RANK() OVER (PARTITION BY dataset_id ORDER BY cable_length DESC) AS dense_rank,
       ROW_NUMBER() OVER (PARTITION BY dataset_id ORDER BY cable_length DESC) AS row_num
FROM neurons

Offset Functions

SELECT neuron_id,
       cable_length,
       LAG(cable_length) OVER (ORDER BY neuron_id) AS prev_cable,
       LEAD(cable_length, 2) OVER (ORDER BY neuron_id) AS next_2_cable,
       FIRST_VALUE(cable_length) OVER (PARTITION BY dataset_id ORDER BY neuron_id) AS first_cable,
       LAST_VALUE(cable_length) OVER (PARTITION BY dataset_id ORDER BY neuron_id) AS last_cable
FROM neurons

Window Frames

SUM(cable_length) OVER (
  ORDER BY neuron_id
  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total

Frame bounds: UNBOUNDED PRECEDING, N PRECEDING, CURRENT ROW, N FOLLOWING, UNBOUNDED FOLLOWING.

Frame units: ROWS or RANGE.

Aggregates as Window Functions

Any aggregate function can be used with OVER:

SELECT neuron_id,
       cable_length,
       AVG(cable_length) OVER (PARTITION BY dataset_id) AS dataset_avg
FROM neurons

JOINs

Supported Join Types

  • INNER JOIN (or just JOIN)
  • LEFT JOIN
  • RIGHT JOIN
  • FULL OUTER JOIN
  • CROSS JOIN

ON Conditions

ON clauses support equality conditions chained with AND:

SELECT n.neuron_id, n.cable_length, s.strength
FROM neurons AS n
INNER JOIN synapses AS s
  ON n.dataset_id = s.dataset_id AND n.neuron_id = s.pre_neuron_id
WHERE n.dataset_id = '...'
LIMIT 1000

Self-Joins

Useful for analyzing reciprocal connections:

SELECT
    count() / 2 AS reciprocal_pairs,
    toFloat64(count()) / 2.0
      / greatest(toFloat64((SELECT count() FROM synapses WHERE dataset_id = '...')), 1.0)
      AS reciprocity
FROM synapses AS s1
INNER JOIN synapses AS s2
    ON s1.dataset_id = s2.dataset_id
    AND s1.pre_neuron_id = s2.post_neuron_id
    AND s1.post_neuron_id = s2.pre_neuron_id
WHERE s1.dataset_id = '...'
    AND s1.pre_neuron_id < s1.post_neuron_id

Joining Materialized Views

SELECT o.neuron_id, out_degree, in_degree
FROM mv_neuron_out_degree AS o
FULL OUTER JOIN mv_neuron_in_degree AS i
    ON o.dataset_id = i.dataset_id AND o.neuron_id = i.neuron_id
WHERE o.dataset_id = '...' OR i.dataset_id = '...'
GROUP BY o.neuron_id, i.neuron_id
ORDER BY (out_degree + in_degree) DESC
LIMIT 100

CROSS JOIN with Subqueries

Useful for z-score comparisons against global statistics:

SELECT
    ds.dataset_id,
    (AVG(ds.cable_length) - global.global_avg)
        / greatest(global.global_std, 0.0000000001) AS zscore
FROM neurons AS ds
CROSS JOIN (
    SELECT AVG(cable_length) AS global_avg, STDDEV_POP(cable_length) AS global_std
    FROM neurons
) AS global
WHERE ds.dataset_id IN ('...')
GROUP BY ds.dataset_id

CTEs (WITH Clauses)

Common Table Expressions let you name intermediate result sets:

WITH all_neurons AS (
    SELECT pre_neuron_id AS neuron_id FROM synapses WHERE dataset_id = '...'
    UNION ALL
    SELECT post_neuron_id FROM synapses WHERE dataset_id = '...'
)
SELECT
    neuron_count,
    edge_count,
    toFloat64(edge_count) / greatest(toFloat64(neuron_count) * (toFloat64(neuron_count) - 1), 1) AS density
FROM (
    SELECT
        (SELECT COUNT(DISTINCT neuron_id) FROM all_neurons) AS neuron_count,
        count() AS edge_count
    FROM synapses
    WHERE dataset_id = '...'
) AS t

Later CTEs can reference earlier ones. WITH RECURSIVE is not supported.


UNION ALL

Combine multiple queries:

SELECT 'pre' AS direction, pre_neuron_id AS neuron_id FROM synapses WHERE dataset_id = '...'
UNION ALL
SELECT 'post' AS direction, post_neuron_id AS neuron_id FROM synapses WHERE dataset_id = '...'
ORDER BY neuron_id
LIMIT 1000

Only UNION ALL is supported. UNION (without ALL), EXCEPT, and INTERSECT are not available. Outer ORDER BY, LIMIT, and OFFSET apply across the combined result.


Ordering and Pagination

ORDER BY cable_length DESC
LIMIT 100
OFFSET 200

ORDER BY supports expressions:

ORDER BY cable_length * 2 DESC
ORDER BY SQRT(volume) ASC

Parameter Binding

Parameterized queries prevent injection and allow reuse.

Positional Parameters

Use ? placeholders — they are assigned 1-based indices left to right:

{
  "query": "SELECT neuron_id, cable_length FROM neurons WHERE cable_length > ? AND volume < ? LIMIT ?",
  "params": [100, 999.5, 1000]
}

Named Parameters

Use :name placeholders:

{
  "query": "SELECT neuron_id FROM neurons WHERE cell_type = :ct AND species = :species LIMIT 1000",
  "named_params": {"ct": "pyramidal", "species": "rat"}
}

Parameters can be used in WHERE, SELECT expressions, HAVING, ORDER BY, and function arguments.


Output Format (DOWNLOAD)

Control the result format:

SELECT neuron_id, cable_length FROM neurons LIMIT 1000 DOWNLOAD csv
FormatDescription
arrowApache Arrow (default)
parquetApache Parquet
csvCSV

Federation Scope (SCOPE)

Control where the query executes:

SELECT COUNT(*) FROM neurons GROUP BY dataset_id SCOPE federation
ScopeDescription
localExecute on the local database (default)
remoteExecute on a remote federated cluster
federationScatter/gather across all federated nodes

When using federation scope, the hub automatically decomposes the query into scatter SQL (sent to each node) and gather SQL (merged locally). See Cross-Cluster Queries.

Automatic MV Rewriting

SyQL automatically rewrites queries to use materialized views when possible. For example:

SELECT AVG(cable_length) FROM neurons GROUP BY dataset_id

This is automatically rewritten to query mv_neuron_morphometrics instead of scanning the full neurons table — significantly faster for large datasets.

Use the explain endpoint to see whether your query was rewritten and which MV was selected. The response includes advisories explaining why alternative MVs were rejected.


API Workflow

SyQL has a three-stage pipeline:

StageEndpointWhat it does
PlanPOST /v1/syql/planParse → validate → resolve metadata → return logical plan
ExplainPOST /v1/syql/explainPlan + compile to SQL → return compiled query and advisories
ExecutePOST /v1/syql/execPlan + compile + submit to job queue → return job ID

Use plan to validate syntax and inspect the resolved schema. Use explain to preview the generated SQL before committing to execution. Use exec when you’re ready to run.

Plan

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://api.syndb.xyz/v1/syql/plan \
  -d '{"query": "SELECT neuron_id, cable_length FROM neurons WHERE species = '\''mouse'\'' LIMIT 100"}'

Returns the parsed logical plan: resolved tables, columns, filters, metadata, and result schema.

Explain

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://api.syndb.xyz/v1/syql/explain \
  -d '{"query": "SELECT AVG(cable_length) FROM neurons GROUP BY dataset_id"}'

Returns:

  • The compiled ClickHouse SQL (with any MV rewrites applied)
  • Query advisories and any selected MV rewrite target
  • Optional federation scatter_sql and gather_sql
  • Optional query_id and typed result_schema

Execute

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://api.syndb.xyz/v1/syql/exec \
  -d '{"query": "SELECT neuron_id, cable_length FROM neurons WHERE species = '\''mouse'\'' ORDER BY cable_length DESC LIMIT 1000"}'

Returns a job_id. Track and download results via the Jobs System.

Cancel

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://api.syndb.xyz/v1/syql/cancel \
  -d '{"query_id": "..."}'

The query_id may be returned by plan, explain, or execute when the query can be cancelled through ClickHouse.


Examples

Neuron morphometrics per dataset

SELECT dataset_id,
       COUNT(*) AS n,
       AVG(cable_length) AS avg_cable,
       STDDEV_POP(cable_length) AS std_cable,
       AVG(mesh_volume) AS avg_volume,
       AVG(mesh_sphericity) AS avg_sphericity
FROM neurons
GROUP BY dataset_id
ORDER BY n DESC

Top connected neurons

SELECT o.neuron_id, out_degree, in_degree,
       (out_degree + in_degree) AS total_degree
FROM mv_neuron_out_degree AS o
FULL OUTER JOIN mv_neuron_in_degree AS i
    ON o.dataset_id = i.dataset_id AND o.neuron_id = i.neuron_id
WHERE o.dataset_id = '...'
GROUP BY o.neuron_id, i.neuron_id
ORDER BY total_degree DESC
LIMIT 50

Z-score comparison across datasets

SELECT
    toString(ds.dataset_id) AS dataset_id,
    'cable_length' AS metric,
    (ds.dataset_avg - global.global_avg)
        / greatest(global.global_std, 0.0000000001) AS zscore
FROM (
    SELECT dataset_id, AVG(cable_length) AS dataset_avg
    FROM neurons
    WHERE dataset_id IN ('uuid1', 'uuid2')
    GROUP BY dataset_id
) AS ds
CROSS JOIN (
    SELECT AVG(cable_length) AS global_avg, STDDEV_POP(cable_length) AS global_std
    FROM neurons
) AS global

Network reciprocity

SELECT
    count() / 2 AS reciprocal_pairs,
    toFloat64(count()) / 2.0
      / greatest(toFloat64((SELECT count() FROM synapses WHERE dataset_id = '...')), 1.0)
      AS reciprocity
FROM synapses AS s1
INNER JOIN synapses AS s2
    ON s1.dataset_id = s2.dataset_id
    AND s1.pre_neuron_id = s2.post_neuron_id
    AND s1.post_neuron_id = s2.pre_neuron_id
WHERE s1.dataset_id = '...'
    AND s1.pre_neuron_id < s1.post_neuron_id

Graph density with CTEs

WITH all_neurons AS (
    SELECT pre_neuron_id AS neuron_id FROM synapses WHERE dataset_id = '...'
    UNION ALL
    SELECT post_neuron_id FROM synapses WHERE dataset_id = '...'
)
SELECT
    neuron_count,
    edge_count,
    avg_strength,
    toFloat64(edge_count)
      / greatest(toFloat64(neuron_count) * (toFloat64(neuron_count) - 1), 1)
      AS density
FROM (
    SELECT
        (SELECT COUNT(DISTINCT neuron_id) FROM all_neurons) AS neuron_count,
        count() AS edge_count,
        avg(strength) AS avg_strength
    FROM synapses
    WHERE dataset_id = '...'
) AS t

Ranking neurons within a dataset

SELECT neuron_id, cable_length,
       RANK() OVER (ORDER BY cable_length DESC) AS rank
FROM neurons
WHERE dataset_id = '...'
LIMIT 100

Parameterized query

{
  "query": "SELECT neuron_id, cable_length, cell_type FROM neurons WHERE cell_type = :ct AND cable_length > :min_cable ORDER BY cable_length DESC LIMIT :n",
  "named_params": {"ct": "pyramidal", "min_cable": 500, "n": 100}
}

Unsupported Features

These SQL features are intentionally not supported:

  • INSERT, UPDATE, DELETE, CREATE, DROP — SyQL is read-only
  • WITH RECURSIVE — recursive CTEs
  • UNION (without ALL), EXCEPT, INTERSECT — only UNION ALL is available
  • GROUPS window frame unit — only ROWS and RANGE
  • Arbitrary ClickHouse functions — only whitelisted functions are allowed

Saved Queries

Frequently used SyQL queries can be saved for reuse. See Saved Queries.