Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Data Standards

SynDB implements open data standards to ensure interoperability, discoverability, and long-term preservation of neuroscience datasets.

FAIR Data Principles

SynDB aligns with the FAIR principles for scientific data management:

  • Findable: Datasets are indexed by Meilisearch full-text search. Each dataset is assigned a persistent UUID. Metadata is exposed via JSON-LD for search engine discovery.
  • Accessible: A RESTful API with an OpenAPI specification provides structured access. Arrow Flight enables high-throughput data transfer. Authentication uses standardized PASETO tokens.
  • Interoperable: Metadata is serialized as JSON-LD using Schema.org vocabulary. Controlled vocabularies draw from OBO Foundry ontologies. Data is exported in Apache Parquet and Apache Arrow formats.
  • Reusable: Licenses are stored as machine-readable SPDX identifiers. Provenance tracking, version history, and auto-generated citations support reproducibility.

Metadata Standards

SynDB dataset metadata follows established web standards:

  • Schema.org: Dataset metadata uses the Schema.org Dataset type, enabling discovery by Google Dataset Search and other aggregators.
  • JSON-LD: Metadata is serialized as JSON-LD – a linked data format that embeds semantic context in standard JSON. Access via GET /v1/neurodata/datasets/{id}/metadata.jsonld.
  • DCAT: Vocabulary alignment with the W3C Data Catalog Vocabulary for catalog interoperability.
  • Dublin Core: Core metadata terms (title, creator, date, rights) follow Dublin Core conventions.
  • SynDB Connectomics Data Profile: Required profile for ontology-backed dataset metadata, DataCite relation types, JSON-LD export, and archival metadata bundles.

Citation Formats

SynDB generates citations in multiple formats via GET /v1/neurodata/datasets/{id}/citation?format=<fmt>:

FormatUse CaseSpecification
BibTeXLaTeX documents.bib entries
RISReference managers (Zotero, EndNote, Mendeley)Tagged text format
APAInline text citationsAPA 7th edition
CSL-JSONProgrammatic citation processingCitation Style Language data model
CFFSoftware/dataset citation filesCITATION.cff format

License Identifiers

SynDB uses SPDX license identifiers internally. When you select a license during dataset creation, it is stored as an SPDX expression (e.g., ODC-BY-1.0, CC-BY-4.0). This enables machine-readable license detection and compatibility checking.

See the license selection guide for help choosing a license.

Data Formats

FormatMIME TypeUsed For
Apache Parquetapplication/vnd.apache.parquetDataset export and DOWNLOAD parquet in SyQL
Apache Arrow IPCapplication/vnd.apache.arrow.streamJob results, Flight data transfer
CSVtext/csvDOWNLOAD csv in SyQL, ontology bulk import

Arrow IPC and Parquet files can be read with pandas, Polars, DuckDB, or any Arrow-compatible library.

External Integrations

SynDB metadata is designed to interoperate with these neuroscience data ecosystems:

PlatformIntegration
DataCiteDOI registration and metadata schema alignment (DataCite Metadata Schema 4.5)
DANDI ArchiveComplementary neurophysiology data archive
OpenNeuroComplementary neuroimaging data archive
Google Dataset SearchAutomatic discovery via Schema.org/JSON-LD metadata