Global Philology: Big Textual Data


Florilegia: Big Textual Data Workshop, July 10-11, 2017

Preliminary Programme

Monday, July 10 – Raum 402

Paulinum, Augustuplatz 10, 4th Floor


09:00-11:30 Kick-Off Talks

09:00-09:20 Thomas Koentges and Gregory R. Crane (Universität Leipzig and Tufts University): Welcome

09:30-10:30 Introduction and initial Discussion

10:30-11:30 David Smith: Exploiting Relational Structure in Large Text Corpora


11:30-12:00 Coffee Break

12:00-14:30 How-Tos
12:00-12:45 Benjamin Kiessling: OCR of Different Languages

12:45-13:30 Alicia Gonzalez: Pushing Annotations of Different Languages to Annis


13:30-14:30 Lunch

14:30-16:30 Deep Learning and Topic Modelling
14:30-15:30 Oliver Hellwig: A Deep Learning approach to Tokenization of Sanskrit Texts

15:30-16:30 Paul Dilley (and Thomas Koentges): Iowa Corpus and Topic Modelling


16:30-17:00 Coffee Break

17:00-18:00 End-of-Day Discussion


Tuesday, July 11 – Raum 402 (Paulinum, Augustuplatz 10, 4th Floor)


09:00-11:00 Corpus Infrastructure and Resources

09:00-10:00 Thomas Koentges: Let’s Talk About .cex

10:00-10:30 Frederik Baumgardt: Perseid’s Plokamos: Of changing Corpora and Annotations

10:30-11:00  Patrick J. Burns: External Resources for Corpus Approaches


11:00-11:30 Coffee Break

11:30-13:30 Corpus Building and Presentation

11:30-12:00 Cliff Wulfman: Blue Mountain

12:00-12:30 Matt Munson: CapiTainS, the CHS, and First1KGreek

12:30-13:00 Neven Jovanović: Croatiae Auctores Latini (CroaLa) – a Neo-Latin Corpus for Fun
and Profit

13:00-13:30 Tyler Neill: Sanskrit Text Corpora and the Nyāyabhāṣya Digital Critical Edition


13:30-14:30 Lunch

14:30-16:30 Big Textual Data and Text Reuse

14:30-15:30 Donald Sturgeon: Text Tools for

15:30-16:30 Paul Vierthaler: Working with Imperial Chinese Corpora: Studying Document

Similarity and Text Reuse


16:30-17:00 Coffee Break

17:00-18:00 Final  Discussion and Future Plans