Reusing Data

Author: Michał Mrugalski

The reuse of data lies at the core of the FAIR-ness and Open Science. Reusing data goes both ways. While reusing somebody else’s data, researchers are encouraged to design their own data reuseable.

Whether looking for data or publishing one’s own, a researcher is well advised to check whether a repository (potentially) containing data is certified by a recognized standard such as the CoreTrustSeal.

The Registry of Research Data Repositories * https://www.re3data.org/) is an intuitive place to start your search for a suitable repository to ingest or store your data. Searching databases by subject at * https://fairsharing.org/ likewise bodes well for a research project; this database contains items classified “Humanities and Social Sciences.”

Another major source of data, which however may also assume the role of a storehouse for your results, are so called GLAM (Galleries, Libraries, Archives, and Museums) a.k.a. Cultural Heritage Institutions (CHI).

The Heritage Data Reuse Charter was developed by several European organizations (APEF, CLARIN, DARIAH, Europeana, E-RIHS) and European projects (Iperion-CH, PARTHENOS) with the goal of “designing a common environment that will enable all the relevant actors to work together to connect and improve access to heritage data” (Tóth-Czifra and Romary 2020, 2). The principles of

“These principles are fully compliant with and map onto the FAIR principles and can be taken as their optimization for cultural heritage data exchange settings” (Tóth-Czifra and Romary 2020, p 6).

These values materialize in concrete situations involving researchers and institutions; those situations demand that the requirements for data access and reuse as well as individual accountability for all decisions be specified. Thus, the paper “The Heritage Data Reuse Charter: from principles to research workflows” (Tóth-Czifra and Romary 2020) concludes with an Annex containing a questionary for researchers, CHI, and technical partners that addresses the principles and responsibilities, specifically:

It is recommended that researchers add the Charter’s principles to their Data Management Plans.

One of the most disputed aspects of data reuse is entitlement: the problem of data ownership implying licenses and other legal aspects, on which (Tóth-Czifra et al. 2023, 56ff.) expand in an engaging way, including the difference between Anglo-Saxon and Continental legal traditions concerning intellectual property.

Once more, those legal issues affect researchers during the data creation and publication processes. Working in closed spaces, like

are two examples of places where work with restricted material can be carried out. Vivli, RAIRD, Corpuscle, Project Data Sphere, and INESS are a few virtual analysis portal examples. Another instance of how license and legal problems impact research practices furnishes the work with are

These limitations also apply to researchers that attempt to open their own data. Your best bet would be referring to the CESSDA access categories for qualitative and quantitative data and / or the CLARIN licensing framework for language data. With four types of licenses (or even five, if we include CC0, which amounts to the renunciation of rights) and six possible combinations (Harrower 2020, 23), the Creative Commons (CC) system is complicated. As a result, consulting a license selector, such as

could be a good solution.

One of the most important aspects of data reuse is appropriate citation of data; this can be a time-consuming process that involves organizing PIDs and tying together data from many sources. No wonder therefore that particular communities of interest such as DataCite work on facilitating the process. The group formulated a set of best practices: DataCite Best Practice Guide. A set of guidelines for citing data was developed by the Data Citation Synthesis Group (Data Citation Synthesis Group 2014). These citation forms ought to be machine-readable and intelligible to humans.

Take-home message: Data reusing is a two-way process that prompts researchers from the outset of a research project to shape their data as reusable, while at the same time recycling other people’s data. The cooperation with stakeholders and other researchers alongside the problems of entitlement are thus at the heart of the research process in CLS, which turns out to be cooperative beyond the boundaries of a research team and their institutional affiliations.

References

Data Citation Synthesis Group. 2014. “Joint Declaration of Data Citation Principles.” Force11. https://doi.org/10.25490/A97F-EGYK.
Harrower, Natalie et als. 2020. “Sustainable and FAIR Data Sharing in the Humanities: Recommendations of the ALLEA Working Group E-Humanities.” https://doi.org/10.7486/DRI.TQ582C863.
Schöch, Christof, Frédéric Döhl, Achim Rettinger, Evelyn Gius, Peer Trilcke, Peter Leinen, Fotis Jannidis, Maria Hinzmann, and Jörg Röpke. 2020. “Abgeleitete Textformate: Text Und Data Mining Mit Urheberrechtlich Geschützten Textbeständen.” https://doi.org/10.17175/2020_006.
Tóth-Czifra, Erzsébet, Marta Błaszczyńska, Francesco Gelati, Femmy Admiraal, Mirjam Blümm, Erik Buelinckx, Vera Chiquet, et al. 2023. “Research Data Management for Arts and Humanities: Integrating Voices of the Community.” Zenodo. https://doi.org/10.5281/ZENODO.8059626.
Tóth-Czifra, Erzsébet, and Laurent Romary. 2020. “The Heritage Data Reuse Charter: From Principles to Research Workflows.” https://shs.hal.science/halshs-02475692/document.