Data Standards
SynDB implements open data standards to ensure interoperability, discoverability, and long-term preservation of neuroscience datasets.
FAIR Data Principles
SynDB aligns with the FAIR principles for scientific data management:
- Findable: Datasets are indexed by Meilisearch full-text search. Each dataset is assigned a persistent UUID. Metadata is exposed via JSON-LD for search engine discovery.
- Accessible: A RESTful API with an OpenAPI specification provides structured access. Arrow Flight enables high-throughput data transfer. Authentication uses standardized PASETO tokens.
- Interoperable: Metadata is serialized as JSON-LD using Schema.org vocabulary. Controlled vocabularies draw from OBO Foundry ontologies. Data is exported in Apache Parquet and Apache Arrow formats.
- Reusable: Licenses are stored as machine-readable SPDX identifiers. Provenance tracking, version history, and auto-generated citations support reproducibility.
Metadata Standards
SynDB dataset metadata follows established web standards:
- Schema.org: Dataset metadata uses the Schema.org Dataset type, enabling discovery by Google Dataset Search and other aggregators.
- JSON-LD: Metadata is serialized as JSON-LD – a linked data format that embeds semantic context in standard JSON. Access via
GET /v1/neurodata/datasets/{id}/metadata.jsonld. - DCAT: Vocabulary alignment with the W3C Data Catalog Vocabulary for catalog interoperability.
- Dublin Core: Core metadata terms (title, creator, date, rights) follow Dublin Core conventions.
- SynDB Connectomics Data Profile: Required profile for ontology-backed dataset metadata, DataCite relation types, JSON-LD export, and archival metadata bundles.
Citation Formats
SynDB generates citations in multiple formats via GET /v1/neurodata/datasets/{id}/citation?format=<fmt>:
| Format | Use Case | Specification |
|---|---|---|
| BibTeX | LaTeX documents | .bib entries |
| RIS | Reference managers (Zotero, EndNote, Mendeley) | Tagged text format |
| APA | Inline text citations | APA 7th edition |
| CSL-JSON | Programmatic citation processing | Citation Style Language data model |
| CFF | Software/dataset citation files | CITATION.cff format |
License Identifiers
SynDB uses SPDX license identifiers internally. When you select a license during dataset creation, it is stored as an SPDX expression (e.g., ODC-BY-1.0, CC-BY-4.0). This enables machine-readable license detection and compatibility checking.
See the license selection guide for help choosing a license.
Data Formats
| Format | MIME Type | Used For |
|---|---|---|
| Apache Parquet | application/vnd.apache.parquet | Dataset export and DOWNLOAD parquet in SyQL |
| Apache Arrow IPC | application/vnd.apache.arrow.stream | Job results, Flight data transfer |
| CSV | text/csv | DOWNLOAD csv in SyQL, ontology bulk import |
Arrow IPC and Parquet files can be read with pandas, Polars, DuckDB, or any Arrow-compatible library.
External Integrations
SynDB metadata is designed to interoperate with these neuroscience data ecosystems:
| Platform | Integration |
|---|---|
| DataCite | DOI registration and metadata schema alignment (DataCite Metadata Schema 4.5) |
| DANDI Archive | Complementary neurophysiology data archive |
| OpenNeuro | Complementary neuroimaging data archive |
| Google Dataset Search | Automatic discovery via Schema.org/JSON-LD metadata |