Preserving and Publishing Data

Author: Carolin Odebrecht

The preservation and publication of CLS data represents the crucial step in the research data lifecycle for data users because it makes the results of the previous steps available for re-use. Publication enables data creators to publish data sets (corpora, collections, editions) in order to follow good research practices that require validation, reproducibility, and citability of research data (cf. FAIR Data and Research Data Lifecycle). Data preservation is typically done by publication in a trustworthy repository.

Two aspects are important when publishing corpora: first the data set as such (Section Data Set) and the choice of the publication platform (Section Publication Platform). With the help of a to-do-list for data publication (Odebrecht and Biskup 2023) we aim to provide a helpful start for CLS data publications.

Data Set

Identify which version of a data set (title, version of the collection/corpus/edition, metadata) is eligible for publication. This data set needs to be completed by a README file containing the following information:

Authors, contributors
Contact person
(References to) documentation(s)
LICENSE statement
Short introduction to data and software workflow
Software and format dependencies
Reference(s) to supplemental material(s)

A README is a type of brief documentation in plaint text format (e.g. README.txt) that is directly assigned to the data set and contains the necessary explanations and references for data (re-)use.

Publication Platform for Digital Edition

For digital editions, the interaction between data sets, visualisation, and exploration / filter mechanism are important. Therefore, data sets of a digital editions are often published not in data repositories alone but on specific search and visualisation platforms. Digital editions platforms are findable for example by Greta Franzini’s catalogue or by Patrick Sahle’s catalogue.

The blog post of Marta Błaszczyńska and Bartłomiej Szleszyński discuss in more depth issues concerning digital scholarly editions and FAIR.

Licensing

Each data publication needs a licence statement that ensures transparency for data (and software) re-use scenarios. With a licence, creators grant right to use their work. Importantly, if there is no right of use, there is no (re-)use.

For CLS data, copy right regulations often play a crucial role. Copy rights might be regulated on the national, European, or international level. The OpenAire project provides a blog post about how to license my data. For re-using copy-right protected data, (Andresen et al. 2023) provide a workflow.

References

Andresen, Melanie, Markus Gaertner, Janina Jacke, Nora Ketschik, and Axel Pichler. 2023. “Urheberrechtlich Geschützte Texte Nachnutzen – Der XSample-Workflow,” March. https://doi.org/10.5281/ZENODO.7715448.

Consortium, Textgrid. 2006. “TextGrid: A Virtual Research Environment for the Humanities.” Göttingen. http://textgrid.de/.

Gouzi, Françoise, Laure Barbot, Matej Durco, Sally Chambers, and Toma Tasovac. 2024. “DARIAH Data Policy,” January. https://doi.org/10.5281/ZENODO.10409009.

Harrower, Natalie et als. 2020. “Sustainable and FAIR Data Sharing in the Humanities: Recommendations of the ALLEA Working Group E-Humanities.” https://doi.org/10.7486/DRI.TQ582C863.

Odebrecht, Carolin, and Till Biskup. 2023. “ToDo-Liste Für Die Publikation von Forschungsdaten.” https://doi.org/10.5281/ZENODO.7674307.

Preserving and Publishing Data

Data Set

Publication platform

Generic Repository

Publication Platform for Digital Edition

Licensing

References