Data Standards

SynDB implements open data standards to ensure interoperability, discoverability, and long-term preservation of neuroscience datasets.

FAIR Data Principles

SynDB aligns with the FAIR principles for scientific data management:

Findable: Datasets are indexed by Meilisearch full-text search. Each dataset is assigned a persistent UUID. Metadata is exposed via JSON-LD for search engine discovery.
Accessible: A RESTful API with an OpenAPI specification provides structured access. Arrow Flight enables high-throughput data transfer. Authentication uses standardized PASETO tokens.
Interoperable: Metadata is serialized as JSON-LD using Schema.org vocabulary. Controlled vocabularies draw from OBO Foundry ontologies. Data is exported in Apache Parquet and Apache Arrow formats.
Reusable: Licenses are stored as machine-readable SPDX identifiers. Provenance tracking, version history, and auto-generated citations support reproducibility.

Metadata Standards

SynDB dataset metadata follows established web standards:

Schema.org: Dataset metadata uses the Schema.org Dataset type, enabling discovery by Google Dataset Search and other aggregators.
JSON-LD: Metadata is serialized as JSON-LD – a linked data format that embeds semantic context in standard JSON. Access via GET /v1/neurodata/datasets/{id}/metadata.jsonld.
DCAT: Vocabulary alignment with the W3C Data Catalog Vocabulary for catalog interoperability.
Dublin Core: Core metadata terms (title, creator, date, rights) follow Dublin Core conventions.
SynDB Connectomics Data Profile: Required profile for ontology-backed dataset metadata, DataCite relation types, JSON-LD export, and archival metadata bundles.

Citation Formats

SynDB generates citations in multiple formats via GET /v1/neurodata/datasets/{id}/citation?format=<fmt>:

Format	Use Case	Specification
BibTeX	LaTeX documents	`.bib` entries
RIS	Reference managers (Zotero, EndNote, Mendeley)	Tagged text format
APA	Inline text citations	APA 7th edition
CSL-JSON	Programmatic citation processing	Citation Style Language data model
CFF	Software/dataset citation files	CITATION.cff format

SynDB uses SPDX license identifiers internally. When you select a license during dataset creation, it is stored as an SPDX expression (e.g., ODC-BY-1.0, CC-BY-4.0). This enables machine-readable license detection and compatibility checking.

See the license selection guide for help choosing a license.

Data Formats

Format	MIME Type	Used For
Apache Parquet	`application/vnd.apache.parquet`	Dataset export and `DOWNLOAD parquet` in SyQL
Apache Arrow IPC	`application/vnd.apache.arrow.stream`	Job results, Flight data transfer
CSV	`text/csv`	`DOWNLOAD csv` in SyQL, ontology bulk import

Arrow IPC and Parquet files can be read with pandas, Polars, DuckDB, or any Arrow-compatible library.

External Integrations

SynDB metadata is designed to interoperate with these neuroscience data ecosystems:

Platform	Integration
DataCite	DOI registration and metadata schema alignment (DataCite Metadata Schema 4.5)
DANDI Archive	Complementary neurophysiology data archive
OpenNeuro	Complementary neuroimaging data archive
Google Dataset Search	Automatic discovery via Schema.org/JSON-LD metadata

Synapse DB

Data Standards

FAIR Data Principles

Metadata Standards

Citation Formats

License Identifiers

Data Formats

External Integrations

Keyboard shortcuts

Synapse DB

Data Standards

FAIR Data Principles

Metadata Standards

Citation Formats

License Identifiers

Data Formats

External Integrations