Upload
Note
Prerequisites
This article requires that you understand how data is stored on SynDB, we recommend reading through the overview article if you are uncertain.
Uploading to SynDB is a multistep process, and requires understanding of the SynDB dataset model.
The process
Preparation
We recommend you to follow the guide in the exact sequence provided. This ensures the instructions are followed effectively and idiomatically.
Terms and conditions
You must accept the terms and conditions before uploading data. The terms include:
- Statement that the data is not false or misleading
- Redistribution rights
- Data licensing agreement with the license of your choice, see guide to pick license; the current default is CC BY 4.0.
Data structuring
SynDB utilizes data standardization to facilitate uploads. Your imaging metrics must be in a tabular data format; for instance, .xlsx, .csv, or .parquet. Read more about the data structuring in the contributor’s guide.
Login
Once you enter the upload page, you will be prompted to log in to your SynDB account if you are not already; furthermore, you must verify your academic status by logging in to your institution’s account.
The upload
You can upload data using the CLI or the web UI, including mixing both approaches. The UI is usually the simplest path for a first upload, while the CLI is better for reproducible and scripted ingestion.
1. Assign IDs, and correlate relations
Each SynDB unit requires a unique ID assigned before being uploaded to the platform. The web UI does this automatically, but not the CLI. When you have multiple SynDB tables under one dataset it is expected that these have some relations with each other.
Warning
Dataset integrity
As it may lead to undefined behaviour, it is disallowed to upload SynDB table data that are unrelated under the same dataset!
Meaning that you cannot upload a table of neurons and a table of synapses under the same dataset unless each synapse has a relation to a neuron from the respective table of neurons.
Web UI
The web UI will automatically assign UUIDs to each SynDB unit. Parent-child relations are checked against the current SynDB table hierarchy during validation; see the data structuring guide for the current dataset model and naming rules.
CLI
The CLI flow is explicit and reproducible:
- Create the dataset metadata record and note the returned dataset ID.
syndb data new \
--label "My connectome release" \
--animal "Drosophila melanogaster" \
--microscopy EM \
--table 1 \
--table 6 \
--brain-structure "mushroom body" \
--license CC_BY
- Prepare raw tabular files into a validated parquet upload directory.
syndb data prepare \
--input-dir raw_dataset \
--output-dir prepared_dataset
- Validate the prepared parquet files before upload.
syndb data validate --input-dir prepared_dataset
- Upload the prepared dataset through Arrow Flight.
syndb data upload \
--input-dir prepared_dataset \
--dataset-id <syndb-dataset-id>
This CLI flow mirrors the current validator and upload path used by the rest of the platform.
2. Selecting or creating the SynDB dataset metadata
As mentioned, in the overview article, every dataset has a metadata defined by the data owner during the upload. You can either select an existing dataset or create a new one.
3. Confirm and upload
Before the upload starts you will be prompted to confirm the dataset and the data you are uploading. Once you confirm, the upload will start. Should be relatively quick.
Delete owned datasets
You may at any time delete datasets that you own. This will remove the dataset and all the data associated with it. The deletion is permanent and cannot be undone.