Monica Berti (University of Leipzig)
Christopher W. Blackwell (Furman University)
Gregory R. Crane (Tufts University & University of Leipzig)
D. Neel Smith (College of the Holy Cross)
Alexandra Trachsel (University of Hamburg)
Bridget Almas (Perseus Project, Tufts University)
Alison Babeu (Perseus Project, Tufts University)
Gabriel Bodard (King’s College London)
Hugh Cayless (Duke University)
David Dubin (University of Illinois at Urbana-Champaign)
Bruce Robertson (Mount Allison University)
Greta Franzini (University of Leipzig & UCL Centre for Digital Humanities)
Simona Stoyanova (University of Leipzig & King’s College London)
Center for Hellenic Studies, Harvard University
Humboldt Chair of Digital Humanities, University of Leipzig
Perseus Project, Tufts University
The Humboldt Chair of Digital Humanities at the University of Leipzig is pleased to announce a new effort within the Open Philology Project: the Leipzig Open Fragmentary Texts Series (LOFTS). In the first phase of LOFTS we invite public discussion as we finalize the goals, technological methods and editorial practices. LOFTS has been presented at the AIUCD 2013 Conference, at the Digital Classics panel at the 2014 APA Annual Meeting (“Getting Started with Digital Classics”), at the 2. Workshop “Digital Humanities and Social Sciences”, and will be discussed at the Intertextuality Workshop at the Fondation Hardt and at the 2014 NEH Workshop “Publishing Text for a Digital Age”. A workshop at Leipzig in July 2014 (“Open Philology – Historical languages in an open, global society”) will finalize the details of LOFTS. This announcement provides an initial, high-level description of the plan for LOFTS and is intended to provoke open discussion before final decisions are made.
The Leipzig Open Fragmentary Texts Series is a new effort to establish open editions of ancient works that survive only through quotations and text re-uses in later texts (i.e., those pieces of information that humanists call “fragments”). In the field of textual evidence, fragments are not portions of an original larger whole, but the result of a work of interpretation conducted by scholars who extract and collect information pertaining to lost works embedded in other surviving texts. These fragments include a great variety of formats that range from verbatim quotations to vague allusions and translations, which are only a more or less shadowy image of the original according to their closer or further distance from a literal citation.
Print editions of fragmentary works include excerpts extracted from their contexts and from the textual data about those contexts. The result is that they produce annotated indices in the sources that they cite. Moreover, editions of fragmentary works are fundamentally hypertexts and the goal of this project is to produce a dynamic infrastructure for a full representation of relationships between sources, citations, and annotations about them. In a true digital edition, fragments are not only linked directly to the source text from which they are drawn, but can also be precisely aligned to multiple editions. Accordingly, digital fragments are contextualized annotations about reused authors and works. As new versions of (or scholarship on) the source text emerge in a standard, machine-actionable form, these new findings are automatically linked to the digital fragments.
LOFTS has two goals: 1) digitize paper editions of fragmentary works and link them to source texts; 2) produce born-digital editions of fragmentary works. In order to achieve such goals, LOFTS editions primarily consist of:
- TEI XML versions of paper editions of fragmentary works.
- Dynamic excerpts from source texts: Digitized paper editions of fragmentary works are linked to the source texts that they cite and their metadata are annotated in the source texts. The result is the production of dynamic excerpts that can be extracted from source texts.
- Multiple alignments with multiple editions: Digitized paper editions of fragmentary works are aligned with the source editions they use and with other editions of the same source texts.
- Contextualized annotations about fragmentary authors and works: LOFTS editors of fragments annotate directly the source texts. These annotations mark all those elements of the source text that reveal the presence of a quotation or reuse of another text (e.g., names of fragmentary authors, titles or descriptions of the content of fragmentary works, verba dicendi, etc.).
- Standard textual annotations: These include not only variants in the source text but also morpho-syntactic analyses and named entity identification. Where these annotations are not already available for the source text, the fragmentary text provides them for the sections that it cites. Where these are available, the fragmentary text may suggest alternate interpretations (e.g., selecting a different reading, an alternate morpho-syntactic analysis or prosopographic judgment).
- Syntactic reuse analysis: Text reuse works not only at a word level, but also at a syntactic one, because reusing a text means not only quoting and readapting words in a new context, but also reproducing syntactic features. Treebank grammar techniques are used to annotate the syntactic structure of sources that preserve quotations of lost texts in order to detect possible syntactic reuses.
- Alignments with still existing sources: Where one work quotes another existing work (e.g., Athenaeus quoting Homer), word-level alignments between the two sources are provided. Such alignments check reliability and precision of quotations produced by ancient authors. This model assumes also that a work that paraphrases, cites or quotes an existing work may preserve independent and superior data not available in the transmission of the quoted work. Arabic translations of Greek authors, for example, can depend, and shed light, upon Greek manuscripts far older than those that currently survive.
- Metadata on each word that is, or is judged to be, either a direct quotation from, or close paraphrase of, another work. Where a version of the original does not survive, these metadata include an estimate of the confidence that the surviving word was a direct quotation from the source text or a paraphrase.
- Translations of lost works: Where a text only survives because it has been translated into another language (e.g., a Greek text translated into Arabic) and where we have comparable translations (e.g., other Greek texts by the same author translated into Arabic), we use the translations of the surviving works to show what original words could lie behind the translation of the lost text. Syntactic annotations may also help reconstruct the syntax of the original lost text, as it happens in Arabic and Syriac sources that preserve the syntax of the original Greek text.
- Translation alignments: translations of fragments published in digitized print editions are aligned to source texts and new translations in multiple languages are produced by new editors of fragments.
LOFTS uses both XML and RDF, and can be fully represented either as XML or RDF:
- LOFTS uses the EpiDoc subset of the Text Encoding Initiative as its XML tagset.
- LOFTS uses the CTS/CITE Architecture, developed by researchers at Harvard’s Center for Hellenic Studies, to extend the Functional Requirements for Bibliographic Records (FRBR) Data Model down to the word level. Use of the CTS/CITE Architecture allows LOFTS to represent every word in every version of every text with its own unique URN. LOFTS can thus be serialized in a format that is compatible with the Europeana Data Model, with every distinctly citable word in LOFTS as an individual object with its own metadata (e.g., variants, morpho-syntactic analysis, named entity alignment).
- LOFTS uses the Prov-O ontology to represent the provenance of each distinct statement. A statement may be a narrative discussion or a single annotation. Sources can include one or more human authors, an automated system (e.g., a syntactic analyzer) or combination (e.g., one or more humans reviewing and correcting automatically generated syntactic analyses).
- LOFTS uses the Systematic Assertion Model (SAM) to identify the contingent aspect of the underlying resources as things which are subject to interpretation and which were in existence prior to their use as data in our analysis.
- LOFTS uses the Open Annotation (OA) data model to share concrete serializations of the analysis in the form of annotations.
- LOFTS publications will include a snapshot representation of all content and linked data at the time of publication. This snapshot will be an HTML5 presentation of the publication that can stand on its own. This is not intended to duplicate or invalidate the use of URIs and linked data structures for the data being indexed by the publication, but instead as a mitigation against the possibility that those URIs may not remain permanently accessible.
All data in LOFTS is available under a Creative Commons Attribution-ShareAlike license. Because LOFTS is a meta-text – essentially an annotated index into existing editions – this implies that the source texts cited are also available under a Creative Commons license. LOFTS is based upon the following open corpora:
- Editions that are fully in the public domain: These are editions where the editors have died at least 70 years ago and all the contents of the edition – including introduction, textual notes, appendices etc. – are in the public domain. The Open Greek and Latin Project (OGL) has set out to provide at least one fully public domain edition of every major Greek and Latin work that survives through c. 600 CE and of critical later sources (e.g., the Suda, Scholia, etc.), expanding the amount of Greek and Latin available under a CC license in TEI XML from c. 20 million to 150 million words. OGL aims to provide (1) a TEI XML transcript of the reconstructed text, (2) transcript of the textual notes with minimal TEI encoding, and (3) a page image of the original source text. All OGL texts are designed to be available as Linked Open Data, with CTS/CITE URNs for each word in each version of each text.
- Reconstructed texts that are in the public domain: Germany provides copyright protection to scholarly editions for 25 years after publication. The European Union recommended copyright protection of up to 30 years for scholarly editions. The argument has been made, however, that this limited copyright covers only the reconstructed text and that ancillary materials (such as textual notes on the bottom of the page) are distinct creative works protected by the life of the author + 70 year rule. In this case, OGL aims to provide (1) a TEI XML transcript of the reconstructed text, (2) an index of the variants cited on any given page of an edition (but not the textual notes themselves), (3) an image of that part of the original page with the reconstructed text but without the textual notes or other elements that are claimed not to fall under the scholarly editions category of limited copyright.
- Indices of reconstructed texts and accompanying textual notes to which European law provides copyright protection: here OGL provides an index of significant differences between copyrighted texts and those texts that are open. The index allows readers to assess how, how often and where restricted texts differ from open texts. The index includes both editorial choices in the reconstructed texts and variants in the textual notes. The model in this case would be extensive reviews of new editions that set out to list their distinct editorial choices.
Where OGL has not yet provided the necessary textual data, LOFTS editors will provide the textual data that they feel is necessary. In practice this may lead to editions that look, in sections, very much like traditional editions of fragmentary authors. The excerpts that LOFTS editors create are available as open data and as part of an extensible authoring environment, where others can extend the LOFTS beginning and develop comprehensive coverage for works or editions not yet available under an open license.
- M. Berti, M. Romanello, A. Babeu, G. Crane. “Collecting Fragmentary Authors in a Digital Library.” In Proceedings of the 2009 Joint International Conference on Digital Libraries (JCDL ’09). Austin, TX, 259-62. New York, NY: ACM Digital Library (DOI: 10.1145/1555400.1555442)
- M. Berti, M. Romanello, A. Babeu, G. Crane. “When Printed Hypertexts Go Digital: Information Extraction from the Parsing of Indices.” In Hypertext 2009: Proceedings of the 20th ACM Conference on Hypertext and Hypermedia, Turin, Italy, 357-58. New York, NY: ACM Digital Library (DOI: 10.1145/1557914.1557987)
- M. Berti, M. Romanello, F. Boschetti, A. Babeu, G. Crane. “Rethinking Critical Editions of Fragmentary Texts by Ontologies.” In Rethinking Electronic Publishing: Innovation in Communication Paradigms and Technologies – Proceedings of the 13th International Conference on Electronic Publishing, 155-174. ELPUB. Milano, Italy, 2009
- G. Crane. “From Subjects to Citizens in a Global Republic of Letters”. In Going Digital. Evolutionary and Revolutionary Aspects of Digitization. Ed. K. Grandin. Nobel Symposium 147, 251-254. The Nobel Foundation 2011
- M. Berti, “Citazioni e dinamiche testuali. L’intertestualità e la storiografia greca frammentaria”. In Tradizione e Trasmissione degli Storici Greci Frammentari II. Atti del Terzo Workshop Internazionale. Roma, 24-26 febbraio 2011. Ed. V. Costa, 439-458. Tivoli (Roma) 2012
- M. Berti. “Collecting Quotations by Topic: Degrees of Preservation and Transtextual Relations among Genres”. In Ancient Society 43 (2013), 269-288
- M. Berti, M. Büchler, A. Geßner, T. Eckart. “Measuring the Influence of a Work by Text Reuse.” In The Digital Classicist 2013. Eds. S. Dunn & S. Mahony. BICS Supplement 122, 63-79. The Institute of Classical Studies, University of London 2013
- B. Almas and M. Berti. “Perseids Collaborative Platform for Annotating Text Re-Uses of Fragmentary Authors”. In DH-Case 2013. Collaborative Annotations in Shared Environments: metadata, vocabularies and techniques in the Digital Humanities. Florence, September 10, 2013. New York, NY: ACM Digital Library (DOI: 10.1145/2517978.2517986)
- B. Almas, M. Berti, S. Choudhury, D. Dubin, M. Senseney, K.M. Wickett, “Representing Humanities Research Data Using Complementary Provenance Models”, in Building Global Partnerships – RDA Second Plenary Meeting – Washington DC, September 16-18, 2013 (poster)
- B. Almas and M. Berti. “The Linked Fragment: TEI and the Encoding of Text Re-uses of Lost Authors”. In The Linked TEI: Text Encoding in the Web. TEI Conference and Members Meeting 2013, October 2-5, Rome (Italy). Eds. F. Ciotti & A. Ciula, 12-16. DIGILAB Sapienza University and TEI Consortium 2013