Summary of a workshop attended by Greta Franzini. Authored and posted by Greta Franzini. Photo of Florence Greta’s own.
On Monday 28th April I attended a Cost Action workshop in Florence entitled Towards a Medieval Latin Digital Library – A “Medioevo Europeo”. The workshop invited scholars from different countries and backgrounds to talk about their digital libraries and databases in an effort to better understand what’s available on the web today and, more importantly, how we can join forces to make our collections more useful and usable.
The workshop was led by Professor Agostino Paravicini Bagliani, whose foreword introduced the Cost Action working group responsible for the promotion of the interoperability between Medieval databases and textual corpora. During the morning session each guest presented his/her own database so as to set the scene for the afternoon discussion, where participants defined next steps towards an international collaboration.
Clemens Radl (München), MGH Digital
Clemens works on the Monumenta Germaniae Historica, a corpus whose development began in 1826. To date, it contains 400 volumes of medieval Latin text as well as other relevant materials, including middle-high French, Icelandic and Greek texts. The corpus is not limited to modern Germany but has a European scope, with texts dating from 500-1500. It features both digital critical editions and scans of the original volumes.
MGH provides HTML versions of the texts, which do not enhance the text in any way but nevertheless provide a digital version of the text upon which further work can build. The HTML text contains a number of OCR errors, which the project is currently reviewing and correcting. MGH focuses more on layout rather than on the text itself as the volumes the team is working with feature complex layout structures, which represent significant OCR obstacles.
MGH has digitised more texts than it currently showcases through its website. Copyright issues refrain the team from making all of of texts available online.
Moving forward, the project seeks to make more texts available in XML and to provide users with a revision history of every file so as to better visualise ongoing changes.
Maurizio Lana (Vercelli), digilibLT
Maurizio introduced us to digilibLT, a digital library of Medieval Latin texts. Running on XTF, digilibtLT offers full access to the reconstructed text of editions of Medieval Latin texts but not to the critical commentaries as these are protected by copyright. Every text is available under a CC BY-NC-SA licence, can be download in various formats (TEI, TXT, PDF, EPUB) and is enriched with contextual information and bibliographies so as to allow citizen scholars to familiarise themselves better with the works. digitlibLT believes all of its digital texts are true editions inasmuch as additional enhancement work of the source editions is performed during digitisation.
While the project would like to extend the functionality of its database, lack of funds is currently preventing any further development.
Eva Sediki (Zürich), Corpus Corporum
Eva gave us a tour of Corpus Corporum, a large repository of Latin texts housed in the University of Zürich. The projects aims at becoming the platform for Latin texts and is seeking collaboration as a means of achieving this goal. To date, it contains 120 million worlds of Latin. The corpus exploits and integrates Perseus tools, public domain dictionaries and Helmut Schmidt’s TreeTagger to provide translations and linguistic information for every word. The TEI XML texts can be easily searched thanks to the MySQL database and Apache server the project runs on.
While the source of most of the texts in Corpus Corporum remains to be clarified, moving forward the project will make all of its contents, including linguistic annotations, available to download in PDF and XML formats, and will enable syntactic searches.
Alain Meurant (Louvain-la-Neuve), Itinera Electronica
Itinera Electronica is a large database of Latin texts in French translation. The project shares many similarities with Corpus Corporum in that each word is a hyperlink and allows for multiple searches across the entire database. Like Corpus Corporum, every word search returns that word in context and provides a breakdown of the work (chapter, book, etc.). Each text has a French translation (some are out of copyright, for others the project obtained the rights) as well as concordances, work frequencies and linguistic ‘tables’.
The project is continuously working on improving and adding more texts to this great resource.
Emiliano degl’Innocenti (Firenze), Biblioteca Digitale SISMEL-ENTMI
Emiliano presented the newly developed SISMEL digital library (development started in 2013), containing 70 digital texts previously published in the ENTMI series and, like digilibLT, based on the California Digital Library’s eXtensible Text Framework. XTF was chosen because of its flexibility in terms of types of documents it can ingest: Microsoft Word, PDF, TEI, HTML, etc. The project is working towards the addition of more than 350.000 pages, 70 critically edited texts in digital format, and more than 500 printed volumes dating between 16th-20th centuries. All these materials were digitised by the Biblioteca Digital Italiana and come with metadata.
The first phase does not include OCR scanning. Moreover, XTF is a good solution for a first implementation of this system but cannot handle sophisticated services. For this reason, the project is already thinking about a second phase of development which will see the adoption of a different infrastructure integrating TRAME (Texts and Manuscript Transmission of the Middle Ages in Europe), an improved interface, as well as the addition of even more manuscript collections.
Finally, the project will also be soon incorporating the Digital Editions of Inventories and Catalogues of Medieval Italian Libraries (more than 5800 records and 16.000 digital images).
Erwin Rauner (Augsburg), Analecta Hymnica
Erwin presented his project on medieval Latin poetry, the Analecta hymnica medii evi digitalia. For those of you who don’t know, the Analecta Hymnica is a 55-volume compendium of Latin poetry of the Medieval church, a wonderful resource for historians of liturgy and music. Erwin’s project does not only reproduce a digital version of this monumental work, but provides its own scans, adds background information, as well as advanced search capabilities to help users navigate hundreds and hundreds of pages worth of verse.
Jan Koláček (Praha), Global Chant – Cantus Database
Jan introduced us to Cantusindex.org, a database network connecting six online databases on medieval chant (Europe and Canada) manuscripts. Following a description of each database, Jan explained how Cantusindex.org organises its content, whereby chants are indexed and each assigned a Cantus unique ID (used to refer to manuscript databases). What’s more, the project is connectable and interactive, with registered users contributing new chants which immediately receive an automatically generated Cantus ID. The database is built in Drupal 7 and shares data via cURL.
Jean-Philippe Genet (Paris), PALM
PALM (Plateforme d’Analyse Linguistique Médiévale) is a platform and a library (Méditext) of medieval sources dating between 12th-16th centuries. To date, the library contains 300-400 political texts in English, French and Latin. The database allows users to analyse the text and compare different translations of the same work. Furthermore, one can semi-automatically normalise and lemmatise texts, and download them for further annotation. All texts are open and free with the exception of a limited few, which have be reserved for students of the Université of Paris I-Panthéon- Sorbonne (PALM’s home) and their assignments. While originally conceived as a scholar-led project, PALM is now calling for and working towards the integration of external contributions.
Francesco presented ALIM, a project collaboration between the Universities of Siena, Verona, Palermo, Venezia Ca’ Foscari, and Napoli Suor Orsola Benincasa. As stated on the project website, “ALIM makes openly available all Latin texts produced in Italy during the Middle Ages”. Users can view the texts online or download them as .zip files. While offering nothing but an HTML version of these texts, ALIM is planning on adding an English interface as well as linguistic analysis tools, such as Lexicon. Furthermore, Francesco envisages the adoption of an intra-textual digital library model, and the integration with Documenta Catholica Omnia and our very own Perseus Digital Library. The tools will enable popular statistical searches as well as new kinds of explorations, such as the comparison between texts (lexical proximities between texts), including the study of texts belonging to anonymous authors.
Looking ahead, ALIM would like to rid itself of the proprietary database it currently runs on (IBM Notes) and include newly published texts (especially if published outside Italy – copyright). The questions the ALIM team is currently addressing are: How to implement the archive after 2016, the year the funding runs out? How to create representative samples of each category of texts and which texts should be chosen as representative?
Tim Geelhaar (Frankfurt), Computational Historical Semantics
Launched on 28th April 2014, the Computational Historical Semantics project is database that gathers Latin textual collections already available online and allows for sophisticated searches across images and texts belonging to these collections. Whilst showcasing a variety of texts, the project’s main focus lies within the Patrologia Latina. As a web scraper, the database contains many errors, which the team is hoping to rectify via the error report form provided.
Computational Historical Semantics bridges the gap between computational humanities and classics. The goal of the project is, amongst others, to understand how meaning is produced and how languages shaped the past.
The afternoon session (following a delicious Florentine lunch!) consisted of a round table discussion about the ways in which collaboration could take shape and optimise these wonderful and yet overlapping efforts. How can we work together towards the consolidation of our collections, and how can we make them more useful and usable? A rich debate yielded a unanimous decision: the databases should not be merged into a single resource but, rather, crawled by a TRAME-style European search engine whose job is to: A) tell users which texts are available where, and B) inform project investigators as to what has already been digitised, thus avoiding unnecessary duplication of content. All project representatives recognised the value of being able to navigate to individual projects from a central European hub and were instructed to go home and review their technological infrastructures as a first step towards this goal.
While Emiliano degli’Innocenti, a skilful humanist and computer scientists, took on the initial crawling task to test how each and every database responds to external requests, a coordinator or group of coordinators is needed to pave the way. Who should take the lead? What can Open Philology and Perseus contribute? How can Europeana help? These and many other questions give us plenty of food for thought.