The Open Greek and Latin (OGL) project at the “Alexander von Humboldt Chair for DH” aims to represent “every source text produced in Classical Greek or Latin from antiquity through the present”. But how would we or the users measure the success of this enterprise? What metrics can we employ? The number of authors and works included or the coverage (expressed in percentage or any other ratio) of digitized texts over the totality of surviving materials are obvious answers that come to everybody’s mind; yet, when dealing with ancient texts, these concepts are more ambiguous than they appear.
One aspect that is often overlooked is that the vast majority (if not all) of ancient texts exist in many different versions. The great majority of literary works survive in manuscripts that were copied in different periods and from different originals and may therefore contain variant readings on many passages. In addition, the scholars and the editors of those texts have taken different positions on how to choose the correct variants or, for instance, on how to reconstruct the missing part of a broken inscription or what letter to read in a damaged papyrus. Their editions vary on many significant points from one to another.
Thus, it is not just the interpretation of an ancient work that is controversial. Very often the reconstruction of the text that is printed as “the original” in a book is extremely problematic and open to discussion. Readers who have some familiarity with Greek tragedy would know that the formula pathei mathos (“learning through suffering”) encapsulates the laws that Zeus has set for mortals in the vision of the Chorus of Aeschylus’ Agamemnon (lines 176-8). Fewer would know that the subtleties of the language make the exact interpretation of that pivotal passage very hard. Fewer yet know that those problems affect also the reconstruction of the Greek text, with editors oscillating between the reading tōi pathei (“learning through the suffering”?) transmitted by the manuscripts and those who print the modern correction ton pathei proposed by an 18th century scholar (Schütz 1780) in order to ease the syntax.
|Author||Nr. of Editions|
Table 1, which reports the numbers of different editions of ancient authors that we have collected for three major authors of Greek literature during the pilot project that is described below in full detail, gives a sufficient (yet by no means complete!) idea of the scale of this phenomenon. Only for Aeschylus, whose extant works known today amount to seven complete tragedies (one of which is suspected to be spurious) and a number of often short fragments, we have compiled a preliminary list of 101 editions to be digitized. Figure 1 visualizes the number of editions that contain all the author’s works or only a single play (click on the figure to enlarge it).
The two mentioned sources of variation, interpretation and text, mirror one another to a very important degree. The study of textual variation in ancient literature can serve as a gateway to accessing aspects of cultural history, at least since the beginning of modern textual criticism in the 15th century. To limit the scope of a digital collection to one single edition amidst this flux of constant re-editing means to severely limit our understanding of one important dimension of the meaning and history of ancient texts.
Some may be tempted to rely on concepts such as “authority” to select one single edition of the ancient text. However, those concepts provide only an illusory solution. For the “authority” of a single edition is transient by nature.
On the contrary, we decided not to obliterate the history of textual variations, but to turn it instead into a distinguishing feature of OGL. As Boschetti  has pointed out, every collection of ancient texts involves at least two dimensions: “breadth”, i.e. the number of ancient surviving texts included, as well as “depth”, the number of different editions of the same text. OGL is therefore determined to include both “depth” and “breadth” of the supported texts, starting with some of the most important and debated texts of ancient literature.
The rest of this post explains how, and more importantly why, the list summarized in Table 1 was put together.
The Greek Tragedies: a case study of “depth” and “breadth”
The extant dramas of Aeschylus, Sophocles and Euripides, on which our knowledge of Greek tragedy is largely based, are a very good example of the “bidimensionality” of ancient text, on account of the everlasting influence that these often enigmatic texts have played throughout the centuries. In the quest to reconstruct an “original” version of the surviving plays, questions of language and grammar intermingle with the history and the meaning of institutions such as the state or the family. For this reason, tragedies make an excellent test case to exemplify how OGL intends to combine breadth and depth.
In partnership with the “Dresdner Digitalisierungszentrum” of the Sächsische Landesbibliothek – Staats- und Universitätsbibliothek Dresden (SLUB) we decided to conduct a digitization project on the multiple editions of Aeschylus, Sophocles and Euripides. The digitized editions of the tragedians will be included, along with those texts that were digitized in a previous partnership, in the collection of the SLUB Digitale Sammlungen named “Open Philology Project” (OPP).
Our collection of editions of Aeschylus, Sophocles and Euripides will ultimately constitute a modern chapter in the history of a “tragedy multitext”.
Potential use cases
Famous examples of variants among the Medieval manuscripts or modern editors that reveal conflicting interpretations or can be used to exemplify modern approaches to Greek culture could be multiplied. Is the Chorus of the Thebans saying that Antigone, that last ray of hope for the ruling house, has been cut down by “the blood-stained dust” (konis, transmitted by the manuscripts) or by “the blood-stained knife” (kopís, conjectured by Jortin and printed by many) of the infernal gods (Sophocles, Antigone 601)? Is Odysseus baffled because he doesn’t know where the enemy that he is tracking down is (hopou, transmitted by some manuscripts) or whose prints (hotou, other manuscripts) he is tracking (Soph. Ajax 33)?
A collection with the depth that we intend to give to OGL will allow users to do much more than read or browse through the controversial passages that are already known.
Firstly, the OGL collection is part of a constellation of digital tools that will enable sophisticated analyses on the corpus. For instance, by interacting with a treebanking environment such as Arethusa, it would be possible for users to visualize and formalize how the variant readings impact the syntax or the linguistic interpretation of a sentence. Bamman et al. , for example, provide a discussion of multiple treebanks for the pathei mathos sentence in the Agamemnon.
But in particular, text-alignment technologies can be used to systematically compare and extract all the differences between multiple editions of the same work. iAligner, the software developed at the Alexander von Humboldt Chair for DH, is specifically designed to work with multiple versions of ancient texts. Along with comparison of OCR outputs and correction of errors, the software can be used to extract all the differences between editions of the same work.
Figure 2 shows the output of a comparison between four editions of Euripides, Bacchae, line 21 (along with a summary of the options selected for the software) using iAligner. The edition compared here are those of Wecklein (1898) , Nauck (1901) , Murray (1902)  and Diggle (1994) . From the image it is easy to visualize what different solutions and what kind of interventions the editors of the texts have adopted in order to solve the problems in this passage. One editor chose to transpose the line 20 after 22, thus creating a mismatch that results in the whole line 21 differing from that of the other editions. The others disagree on the adoption of the conjecture takei in place of the transmitted kakei.
Naturally, though the screenshot shows just one line, the greatest potential of this approach is in its applicability to the scale of the whole tragedy, or even a whole corpus of plays. Using iAligner and a collection of digitized multiple editions it will be possible to get first-hand, comprehensive data on editorial variation in critical editions.
How many words in a Greek tragedy are controversial and how many words do not vary at all? How does that ratio change from work to work or from author to author? Can we assess how many of these changes relate just to the orthography of words or the meter of a line, and how many impact the syntax or the meaning of a passage? Or, by crossing our data with the output of scripts such as Matteo Romanello’s canonical-citation extractor (Romanello 2015), is there a correlation between the degree of variance that a passage shows in editions and the number of citations to it in the scholarly literature? These are all examples of research questions that would be supported by a digital collection that makes the “depth” in the textual history one of its key dimensions.
Interested readers can check this presentation to get a sense of what type of research questions and approaches iAligner is designed to support.
Collecting the editions
The first task in this effort, which has been going on since April 15th 2017, is to identify the editions of the tragedies to be digitized at the Dresdner Zentrum.
The OGL and SLUB staff have agreed to concentrate their efforts on those volumes that are in possession of either the University Library of Leipzig or of the SLUB library in Dresden. Though the Greek tragedies have been printed continuously since the 16th century, we set the lower chronological boundary to 1800, so that we would include only modern editions, whose Greek fonts can be processed by state-of-the-art OCR technologies for polytonic Greek.
As with the previous OPP collection, the digitized versions of the tragedies published after 1922 whose editor(s) died after 1943 will reproduce only the Greek text; critical apparatuses and other editorial works (notes, introductions, translations) that are subject to copyright will be blanked out.
Other guidelines were established during the work and scrutiny of the materials. In particular:
- when an edition was re-issued several times (e.g. Dain and Mazon 1955 ), we give priority to the original publication, whenever this is not possible, we include the re-issue possessed by either Leipzig or Dresden;
- when a work was re-edited (with changes) by a single editor several times, we decide on a case-by-case basis. In general, we try to include all the successive editions, so as to represent the whole history of each scholar’s work. In practice, that solution would increase the number of volumes beyond the budget of the project. Therefore, we list the successive editions only in a limited number of cases that are known for their importance in the history of classical philology (e.g. Hermann 1852  and 1859 ). In other cases, we opt for the most recent edition, the one possessed by one of the two libraries, or the one within the chronological boundaries of our work (e.g. Dawe 1984-1985 , whose third edition of 1996 would have been outside our scope).
- when a work was republished after a revision by another scholar and or scholars (such as the series of Sophocles’ plays edited by Schneidewin, then revised by Nauck, and finally by Radermacher/Bruhn), we include all the versions (e.g. , , and ).
- Commentaries are included only if they also print an original text. Some important works (such as Dodd 1960 ) are thus excluded, as they reproduce the Greek version of a previous edition.
Our survey is mainly based on the catalogues of the two libraries and on some repertoires or bibliographies of Greek tragedy. Most notably, we consulted the bibliographies in: Wartelle 1978 , Lesky 1979 , Saïd 1988 , Finglass 2007  and 2011 , as well as the voices on Aeschylus, Sophocles and Euripides of the “Supplement I – Volume 2 : Dictionary of Greek and Latin Authors and Texts” in the New Pauly Online.
Data about the collected editions
Our final list collects the metadata of 306 volumes stored in the library of Leipzig (135) and Dresden (171).
The metadata are available in a BibTex file. Apart from the standard set of information (publisher, city of publication, year, edition etc.), every record in the list includes some special metadata that are relevant for our purposes, such as:
- the library holding the volume (Leipzig or Dresden)
- URL of the record in the library
- Editor’s year of death (if any and if known)
- VIAF ID of the editor, that can be used to obtain further biographical information
- A unique identification number (PPN) of each record in the “Online-Katalog des Südwestdeutschen Bibliotheksverbundes (SWB)”
Although our BibTeX file represents a significant collection of metadata on many volumes, it is important to note that it was never intended to serve as a comprehensive bibliography of the editions of the three tragedians. Many important editions (even if they were published within the chronological boundaries of our enquiry) are not included, either because the books are not possessed by either of the two libraries, or because they are already part of the SLUB OPP collection (e.g. Murray 1937, ).
The distribution of the edition per ancient author is detailed in Table 1 and Figure 1 (above). As can be seen, the collection is roughly balanced between the three major authors, even if Euripides has a far higher number of surviving plays than the other two (and a number of plays whose fragments are sufficiently extended to allow for a separate complete edition: see e.g. Diggle 1970 ). This situation reflects well the history of the studies.
Apart from the volumes containing the complete works of each of the three poets, the first two plays of Aeschylus’ most famous trilogy (the Orestea, including Agamemnon, Choephoroe and Eumenides) stand out. Sophocles’ play dealing with the same myth (the Electra) and his famous Antigone are also well represented. Once digitized, these editions will empower a data-driven approach to the history of the textual reconstruction and interpretation of some of the most influential texts of ancient literature, along the lines discussed above.
The chronological span of the collected editions is visualized in Figure 3.
Our collection provides good coverage of all the years included in the proposed timeframe, with a significant peak around the beginning of the 20th century; thus, our list of editions provides a good representation of all the most important periods of modern classical scholarship.
The national distribution of the editions is harder to assess from the collected metadata. Nevertheless, it could be roughly estimated by considering the place of publication of the volumes, as in Figure 4.
In that sense, our collection reflects a somewhat significant bias for German and English scholarship, which is only partially due to the role played by major publishers such as Teubner, Oxford and Cambridge University Press. Unfortunately, Italian and French scholarship are rather poorly represented in the two libraries; some important volumes for the history of the editions of tragedy (e.g. Di Benedetto 1965 ) are altogether missing from both institutions. Expanding the representation of other national traditions will most likely be a priority for the future of our project.
 Romanello, Matteo. 2015. From Index Locorum to Citation Network: an Approach to the Automatic Extraction of Canonical References and its Applications to the Study of Classical Texts. PhD Thesis. King’s College London. https://kclpure.kcl.ac.uk/portal/en/persons/matteo-romanello(f98ba73f-7cf3-478d-867d-c351794e628b)/theses.html