Planning and Designing Data

Author: Carolin Odebrecht

Firstly, CLS data typically require a careful corpus design reflecting for example the preservation conditions of sources and their genre see CLS INFRA’s Survey of Methods (Schöch, Dudar, and Fileva 2023) and register (Biber and Conrad 2019) identification, authorship attribution (Schöch, Dudar, and Fileva 2023), and the evaluation or classification of textual material with regard to, e.g., cultural, social, and literary contexts, traditions, and canonicity in specific CLS domains. In this section, we present three types of corpus design: Designed Corpus, Opportunistic/Growing/Dynamic Corpus and Edition.

Secondly, designing the ways of the preparation and especially the annotation of CLS data appears to be particularly demanding (Section Preparing and Enriching Data).

Thirdly, the availability of resources and their terms of use must be clarified at the initial stage of planning and designing (Preserving and Publishing Data).

CLS data in general requires an intensive corpus design including but not limited to:

eligibility: definition of the CLS data that is the research object
scope: parameters which refer either to literature itself (corpus-internal) or to contextual information not directly connected to the literature (corpus-external)
amount, proportion and sampling: parameters defining the amount and the composition of a data collection with regard to the design parameters and metadata
metadata: often interrelated to the corpus design parameter
domain-adaptability: reflection about representativeness, bias, and canonicity

These design criteria operationalise the decision made by the corpus creators in relation to the corpus-internal and corpus-external aspects of the literary data, e.g., authorship, work, genre, period, bibliographical metadata, publication history and availability, text length and language. The following examples illustrate how these criteria might be implemented. However, the examples are not a representative or normative selection.

Digital Edition

The Faust-Edition (Bohnenkamp, Henke, and Jannidis n.d.) is an example for a digital edition with a common design focus. It collects and presents the manuscripts and the text-critically relevant printings of Faust that were published during Goethe’s lifetime in order to make the analysis of the work’s genesis possible. Central for this digital edition is that the textual variants can be analysed individually, in the context of others variants, and the entire genesis of the work.

eligibility: every piece of text which is a variant of the work Faust and is written by Goethe
scope: the lifetime of the author
amount, proportion and sampling: everything that is preserved in an archiv or collection
metadata: bibliographic metadata
domain-adaptability: rsearch on the drama Faust, methodical research on digital editions, multitextual visualisation and analysis

References

Biber, Douglas, and Susan Conrad. 2019. Register, Genre, and Style. 2nd ed. Cambridge University Press. https://doi.org/10.1017/9781108686136.

Bohnenkamp, Anne, Silke Henke, and Fotis Jannidis. n.d. “Faust. Historisch-Kritische Edition.” Digital {Edition}. Frankfurt am Main / Weimar / Würzburg. Accessed March 7, 2024. https://faustedition.net/.

Burnard, Lou, Christof Schöch, and Carolin Odebrecht. 2021. “In Search of Comity: TEI for Distant Reading.” Journal of the Text Encoding Initiative, no. Issue 14 (March). https://doi.org/10.4000/jtei.3500.

Fischer, Frank, Ingo Börner, Mathias Göbel, Angelika Hechtl, Christopher Kittel, Carsten Milling, and Peer Trilcke. 2019. “Programmable Corpora: Introducing DraCor, an Infrastructure for the Research on European Drama,” July. https://doi.org/10.5281/ZENODO.4284002.

Odebrecht, Carolin, Lou Burnard, and Christof Schöch. 2021. “European Literary Text Collection (ELTeC): April 2021 Release with 14 Collections of at Least 50 Novels.” Zenodo. https://doi.org/10.5281/ZENODO.4662444.

Röttgermann, Julia. 2023. “Collection de Romans Français Du Dix-Huitième Siècle (1751-1800) / Collection of Eighteenth Century French Novels 1751-1800.” [object Object]. https://doi.org/10.5281/ZENODO.10404966.

Ruiz Fabo, Pablo, and Helena Bermúdez Sabel. 2023. “Pruizf/Disco: Version 5.0.” [object Object]. https://doi.org/10.5281/ZENODO.1012567.

Schöch, Christof, Julia Dudar, and Evgeniia Fileva. 2023. “CLS INFRA D3.2: Series of Five Short Survey Papers on Methodological Issues (= Survey of Methods in Computational Literary Studies).” Zenodo. https://doi.org/10.5281/ZENODO.7892112.

Planning and Designing Data

Designed corpus

Opportunistic/Growing/Living/Dynamic Corpus

Digital Edition

References