Andrew Piper

1. Could you describe your research project, which required gathering and disseminating a large amount of data?

I study storytelling in numerous forms with a special focus on contemporary fiction.

2. How do you discover data that is relevant for your research and which factors help you to assess its quality and trustworthiness?

This is a hard question to answer briefly. It is a case by case basis based on domain knowledge. There is no single answer that suffices.

3. What are the scholarly workflows that turn source material into data (extraction, transformation, unifying in a repository, etc.)? How do you develop a shared understanding about data with your collaborators and stakeholders?

Workflow = identification of an ideal data set, collect a sample either manually or in automated fashion (scraping), clean, analyse, clean, then begin processing the data for any subsequent analytical tasks. Again can’t be standardised depends on research question on a case by case basis.

4. What is the effect of legal or regulatory limitations on your research design and execution, as well as on your data sharing procedures? What were your relations with data providers and/or copyright holders?

Legal limitations are the single biggest inhibitor to my research. Contemporary publishing has very strong IP protections limiting what researchers can do and study. Imagine having a library where you can’t actually use their data! That’s our current situation.

5. Do you release your datasets together with your research findings? If yes, in what formats / standards and which repositories? What kind of metadata is used?

Yes. I use Figshare or Dataverse. Data formats depend on the project. Mostly full text data or derived data in tables + metadata.

6. How can you facilitate mutual understanding of each other‘s data within your discipline? Do you have shared vocabularies, ontologies or metadata standards available?

No standards. Would be good to develop. Though there are standard formats: - Full text (gold standard rarely achieved) - Derived data in the form of word frequencies or other features (POS tags etc) - Metadata that points to a digital library such as the Hathi Trust

7. Have you ever approached a support research agent (such as a data steward, librarian, or archivist) and requested for or received their assistance? Could you name them? How cultural heritage professionals (archivists, librarians, museologists, etc.) can support your work?

No one has ever satisfactorily solved my problems. We do everything ourselves.

8. Have you ever used tools supporting the principles of Open Science, FAIR or CARE in your Data Management Plan, such as automated FAIR metrics evaluation tools (FAIR enough, the FAIR evaluator, data management platform, or others)?

Not currently.

9. Have you ever found it difficult to replicate findings obtained by another researcher or to reuse the data that served as the foundation for those conclusions? What was the main reason behind irreproducibility?

Yes! The main culprit is the unavailability of the data.

10. Are you aware of anyone who has attempted to replicate your findings? Was the endeavor a success?

No not aware.

11. According to you, has the institutional mindset around sharing data evolved over time? In particular, how did you find the attitudes of publishers, libraries and policy-makers?

Improving at a snail’s pace.