Authored and posted by Greta Franzini.
We’re really proud to announce that EpiDoc XML versions of the monumental Corpus Scriptorum Ecclesiasticorum Latinorum (CSEL) are now being added to the Open Greek and Latin Project‘s GitHub repository! We are in the process of digitising the public domain volumes of CSEL — you can the volumes with which we are beginning at http://www.roger-pearse.com/weblog/2009/10/24/list-of-csel-volumes-at-google-books/.
The Latin text was OCR-ed, corrected (at 99% accuracy) and encoded according to our specifications by French Data Entry company Jouve. CSEL is the first in a line of texts Jouve is currently helping us digitise. Each XML file is available under a Creative Commons Attribution-ShareAlike 4.0 International License and contains a link to the Archive.org scan it was taken from.
An accuracy of 99% means that there are plenty of data entry errors to be fixed. Similarly, our basic CTS-compliant EpiDoc markup is waiting to be further enriched. The raw text was annotated by operators with no knowledge of Latin nor Greek, so a lot can –and should– be done to improve the XML.
So come and help us out! Feel free to download, modify, improve and share this work with friends and colleagues. The more, the merrier!