Blog

Our News!

Call for Papers: Classical Philology goes digital. Working on textual phenomena of ancient texts (Potsdam, February 16-17, 2017)

Call for papers

Workshop
Classical Philology goes digital. Working on textual phenomena of ancient texts. University of Potsdam, February 16-17, 2017

Digital technologies continue to change our daily lives, including the way scholars work. As a result, the Classics are currently also subject to constant change. Having established itself as an important field in the scientific landscape, Digital Humanities (DH) research provides a number of new possibilities to scholars who deal with analyses and interpretations of ancient works. Greek and Latin texts become digitally available and searchable (editing, encoding), they can be analyzed to find certain structures (text-mining), and they can also be provided with metadata (annotation, linking, textual alignment), e.g. according to traditional commentaries to explain terms, vocabulary or syntactic relationships (in particular tree-banking) for intra- and intertextual linking as well as for connections with research literature. Therefore, an important keyword in this is ‘networking,’ because there is so much potential for Classical Philology to collaborate with the Digital Humanities in creating useful tools for textual work, that a clear overview is difficult to obtain. Moreover, this scientific interest is by no means unilateral: Collaboration is very important for Digital Humanities as a way of (further) developing and testing digital methods.

This is exactly where the proposed workshop comes in: representing several academic disciplines and institutions, scholars will come together to talk about their projects. We have invited Digital Humanists to the discussion who have experience pertaining to special issues in Classical Philology and can present the methods and potentials of their research (including the AvH Chair of DH / Leipzig, the CCeH, the DAI and Dariah-DE). In order to enable intensive and efficient work involving the various ideas and projects, the workshop is aimed at philologists whose research interests focus on certain phenomena of ancient texts, e.g. similes or quotations, and who want to examine more closely how such phenomena are presented and used, including questions of intertextuality and text-reuse. The aim of extracting and annotating textual data as similes poses the same type of practical philological problems for Classicists. Therefore, the workshop provides insight in two main ways: First, in an introductory theoretical section, DH experts will present keynote lectures on specific topics such as encoding, annotating, linking and text-mining; second, the focus of the workshop will be to discuss project ideas with DH experts, to explore and explain possibilities for digital implementation, and ideally to offer a platform for potential cooperation. The focus is explicitly on working together to explore ideas and challenges, based also on concrete practical examples.

This main section will be divided into two sessions based on methods from the Digital Humanities; according to their main focus, projects will be assigned to one of the following groups:
1. producing digital data: computational analysis of ancient texts, detecting textual elements;
and 2. commenting on texts: annotation and linking. It is entirely possible that some themes will be more or less important for the different research goals.

The keynotes and project presentations will be classified into the following sessions:
I. DH keynote speaker
The workshop begins with keynotes held by invited DH specialists who have expertise in the special issues of Digital Classics. The aim of these lectures is to describe possibilities for implementing information technology for philological purposes, taking into account the specific challenges of ancient texts, their conditions and transmission. By demonstrating best-practice examples, the speakers will provide initial ideas as to what is useful and possible. This session serves as an introduction to the two following sessions that are focused on the discussion of specific projects.

II. Project presentations
1) Producing digital data: computational analysis of ancient texts, detecting textual elements. Projects within Session 1 will mainly deal with the question of how specific textual elements that have a more or less fixed structure in a text may be systematically detected: How might the conventional readings of texts and the manual search in various textual resources be combined with automated analyses? How might text-mining and natural language processing techniques be used to supplement a reading? The DH experts will provide insight into such topics as the possibilities of named entity recognition and collections of textual elements in semantically linked datasets that leverage formal ontologies. Networking with already existing resources for ancient texts as well as with similar current projects will be discussed. Questions relating to editing a text, especially to how a text can be presented and preserved for online research, may briefly be mentioned. However, the main focus here is on the extraction of information.
2) Commenting on texts: annotation and linking
Session 2 includes projects that focus on providing a text with metadata. How might the different parts of a textual element, e.g. specific terms and the syntactic or semantic sentence structure, be explained by annotation? Which open standards for annotating a text may be wisely used? What kind of linking is possible, not only with the primary source text, but also with research literature and lexical entities, for instance? Participants will also talk about how the resulting resources could be used as real research tools for users, e.g. for a comprehensive search of particular terms.

The presentations will be given in German or English, as well as the discussions. Addressing this specific interest in textual philology, the searched projects should deal with certain types of textual elements that have a more or less fixed structure, e.g. figurative language, quotations or special terms. The purpose should be to analyze texts focusing on these forms and to annotate and align passages. The discussions, therefore, will address how to extract and annotate data, i.e. how to work with them in a digital environment. The Classical Philology department at the University of Potsdam is very well equipped for this kind of joint project. The presentations should not exceed 15 minutes. As the focus of the workshop is on the following discussion, 30 minutes are scheduled for collaborative exchange after each lecture. Contributions should be submitted by May 15th, 2016, in the form of a short abstract (max. 300 words) along with a brief biography. Digital Humanists are also invited to submit further proposals for lectures in the DH section, which should not exceed 30 minutes in length.

The workshop will take place at the University of Potsdam from February 16th to 17th, 2017.

Important dates:
15/05/16 deadline for abstracts
30/05/16 notification of authors
16-17/02/17 workshop in Potsdam

Organization:
Dr. Karen Blaschka, Klassische Philologie, Universität Potsdam
Dr. Monica Berti, AvH Chair of DH, Universität Leipzig

Contact:
Dr. Karen Blaschka
Klassische Philologie
Universität Potsdam
Am Neuen Palais 10
14469 Potsdam

Dr. Monica Berti
Alexander von Humboldt-Lehrstuhl für Digital Humanities
Institut für Informatik
Universität Leipzig
Augustusplatz 10
04109 Leipzig

Mail to:
karen.blaschka@uni-potsdam.de

 

Digital humanities enhanced. Challenges and prospects of Ancient Studies. A retrospect on the DH-conference in November 2015 in Leipzig

We are very happy to publish the report that Julia Jushaninowa has written about DHEgypt2015 (Altertumswissenschaften in a Digital Age: Egyptology, Papyrology and Beyond – Leipzig, November 4-6, 2015):

Digital humanities enhanced. Challenges and prospects of Ancient Studies. A retrospect on the DH-conference in November 2015 in Leipzig by Julia Jushaninowa (PDF).

Unlocking the Digital Humanities – lecture series by Tufts and Leipzig, also web cast

Unlocking the Digital Humanities

http://tiny.cc/k8ad9x

An Open Research Series organized by the Tufts Department of Classics and by the Alexander von Humboldt Chair of Digital Humanities at the University of Leipzig.

Talks will take place in Eaton Hall on the Medford Campus of Tufts University and in Paulinum 402 at the University of Leipzig. All talks will be broadcast as Google Hangouts and published on Youtube.
The URLs for the Google Hangouts and for the Youtube recordings will be posted at http://tiny.cc/k8ad9x.

Part 1. Introducing Digital Humanities

What is digital humanities? Why does it matter to you? All humanities disciplines welcome.

29 Feb, 12–1:00pm, Eaton 202
Language, Digital Philology and the Humanities in a Global Society.
Gregory Crane, Winnick Family Chair and Professor of Classics, Tufts University; Alexander von Humboldt Professor of Digital Humanities, University of Leipzig

2 Mar, 12–1:00pm, Eaton 202
Digital Humanities: Everything you wanted to know but haven’t yet asked.
Thomas Koentges, Assistant Professor of Digital Humanities, University of Leipzig

7 Mar, 12–1:00pm, Eaton 202
Combining Qualitative and Quantitative Research Methods.
Thomas Koentges, Assistant Professor of Digital Humanities, University of Leipzig.
Melinda Johnston, prev. Cartoon Specialist, National Library of New Zealand

Part 2. Digital Humanities Showcase

Ask the experts! Hear and discuss use-cases of recent DH research and teaching.

10 Mar, 4:00-5:00pm, Eaton 123
Valid and Verified Undergraduate Research.
Christopher Blackwell, Forgione University Professor, Furman University
Marie-Claire Beaulieu, Assistant Professor, Tufts University

14 Mar, 12:00-1:00pm, Eaton 202
eLearning and Computational Language Research.
Thomas Koentges, Assistant Professor of Digital Humanities, Leipzig

4 Apr, 12:00-1:00pm, Eaton 202
Rediscovery of Postclassical Latin and European Culture.
Neven Jovanovic, Associate Professor of Latin, University of Zagreb
Petra Sostaric, Lecturer, University of Zagreb

11 Apr, 12:00-1:00pm, Eaton 202
Visualizing Literary and Historical Social Networks.
Ryan Cordell, Assistant Professor of English, Northeastern University

11 Apr, 5:00-6:00pm, Eaton 123
From Archive to Corpus: Bottom-Up Bibliography for Millions of Books.
David A Smith, Assistant Professor College of Computer and Information Science, Northeastern University

25 Apr, 12:00-1:00pm, Eaton 202
Spatial and Chronological Patterns in Historical Texts.
Maxim Romanov, Postdoctoral Researcher, Digital Humanities, University of Leipzig

27 Apr, 12:00-1:00pm, Eaton 202
Digital Art History.
Chiara Pidatella, Lecturer in Art History, Tufts

2 May, 12:00-1:00pm, Eaton 202
Representing Influence: writing about text reuse when everything is online.
Ioannis Evrigenis, Professor of Political Science, Tufts University
Monica Berti, Assistant Professor of Digital Humanities, University of Leipzig

For information, contact Thomas Koentges (thomas.koentges@tufts.edu) at Tufts or Matt Munson (munson@dh.uni-leipzig.de) at Leipzig.

Topic Modelling of Historical Languages in R

Topic Modelling of Historical Languages in R

By Thomas Köntges

This is a quick note and introduction to topic-modelling historical languages in R and is intended to supplement three publications forthcoming in 2016: one for the AMPHORAE issue of the Melbourne Historical Journal; one for Alexandria: The Journal of National and International Library and Information Issues (currently under review), and one for DH2016. This blog entry also summarises some points I have made in several talks in the past few months about topic-modelling historical languages (including in my talk at the Analyzing Text Reuse at Scale / Working with Big Humanities Data  workshop during the DH Leipzig Workshop Week in December 2015). This blog is therefore intended as a short summary of some of the more important points previously made and in contrast to the specific applications covered in each of the articles it provides an overview of the subject.

My work on topic-modelling did not start in Leipzig, rather, it was part of a project I worked on during my time as a research associate at the Victoria University of Wellington (VUW), New Zealand, in 2013: the Digital Colenso Project. Back then I thought that there was only one ideal number of topics for each corpus and I used Martin Ponweiser’s harmonic mean-method (see chapter 3.7.1 and 4.3.3 in his master’s thesis) to attempt to determine this ideal. Although this approach was useful, albeit slow, for the Digital Colenso Project, I now think that this assumption was wrong, because the ideal topic granularity depends more on the research question and use-case of the application of topic modelling to a certain corpus than on the data itself. To clarify this I will showcase the results of my topic-modelling research during my 2015 visiting fellowship at VUW. This research was undertaken in collaboration with staff at the Alexander Turnbull Library, National Library of New Zealand (ATL), and was subsequently applied to the Open Philology Project (OPP) of the Department of Digital Humanities at the University of Leipzig, Germany.

After a brief introduction to the research projects in Wellington and Leipzig and to topic-modelling itself, this blog-entry will summarize the limitations of topic-modelling with special emphasis on how to determine an ideal number of topics, as well as a short discussion of morphosyntactic normalization and the use of stop words. It will then suggest a researcher-focused method of addressing these limitations and challenges. I will then briefly demonstrate the applicability to the different use-cases at ATL and OPP, which deal with very different fields and languages, including English, Latin, Ancient Greek, Classical Arabic, and Classical Persian. I will finish by stressing how digital humanities research results and practices can be improved by enabling humanities researchers, who focus on more traditional and qualitative analyses of the corpora, to use the quantitative method of topic-modelling as a macroscope and faceting tool.

During my research stay at VUW I worked with the Research Librarian for cartoons at ATL, Dr Melinda Johnston, on a mixed-methods-based analysis of the reactions of cartoonists and New Zealand print publications to the Canterbury Earthquakes in 2010 and 2011. ATL is part of the National Library of New Zealand, an institution that is interested in making the country’s cultural heritage more accessible to a digital audience and researchers. Within the short project I attempted to automatize the detection and analysis of cartoon descriptions created by ATL and in over 100,000 abstracts produced by Index New Zealand (INNZ); all items were published between September 2010 and January 2014. The INNZ data could be retrieved as a double-zipped XML file from INNZ’s webpage and ATL’s item descriptions could be queried using the Digital New Zealand (DNZ) API. During the project it became apparent how a topic-modelling approach could considerably speed up the finding and faceting of earthquake-related descriptions and abstracts.

The results were so impressive that the author decided to apply it to Latin and Greek literature in Leipzig’s OPP project. OPP has a text collection of over 60 million Greek and Latin words, and has recently begun to add Classical Persian and Arabic texts. It is one of the core interests of OPP to produce methods that can swiftly generate results on big data and that can compete with more traditional approaches. OPP is maintained and organized using exist DB, the CTS/CITE-Architecture developed by the Homer Multitext Project, and additional web-based tools and services (e.g. Morpheus, a Greek and Latin morpho-syntactic analyser and Github). This structure enables researchers to use a CTS-API to retrieve their desired text-corpora or specific texts. In a first evaluation run of the topic-modeller, 30,000 Classical Arabic biographies have been used. At both research institutions, OPP and ATL, researchers applied more qualitative methods to complement the process and evaluate results. Because those evaluations were promising, the topic-modelling process will be showcased in what follows.

Topic modelling is “a method for finding and tracing clusters of words (called “topics” in shorthand) in large bodies of texts”. A topic can be described as a recurring pattern of co-occurring words. Topic models are probabilistic models that are often based on the number of topics in the corpus being assumed and fixed. The simplest and probably one of the most frequently applied topic models is the latent Dirichlet allocation (LDA). Success and results of LDA rely on a number of apriori-set variables: for instance, the number of topics assumed in the corpus, the number of iterations of the modelling process, the decision for or against morpho-syntactic normalisation of the research corpus, and how stop words are implemented in the process. Furthermore, its interpretation is often influenced by how the topics are graphically represented and how the words of each topic are displayed.

While C. Sievert has found already a very convincing solution for the latter, the former is often out of the hands of the qualitative traditional researcher and any bigger modifications will have to be implemented by a computer-savvy researcher. Yet, often topic-modelling is not an end in itself, rather it is a tool used to help answer a specific humanities research question or to facet large text-corpora, so further methods can be applied to a much smaller selection of texts. For example, one could use the theta-values to find topic-similarity between two paragraphs in R to determine text-reuse or to find similar sentences for teaching purposes:

### find out how topic similarity of citable units
### comparing the mean deviation of theta-values for each topic

is_similar <- function(x) {
check <- all.equal(theta.frame[which(theta.frame[,1] == first_element(unlist(x))),], theta.frame[which(theta.frame[,1] == last_element(unlist(x))),]) # comparing with all.equal
result <- mean(as.numeric(sub(".*?difference: (.*?)", "\\1", check)[3:length(check)]))
return(result)
} # produces NA if compared with itself

Or find example paragraphs in the corpus that belong to a certain topic:

## Find exemplar sentence
topic_number = 1
exemplar_level = 1
corpus[which(corpus[,1] == theta.frame[order(theta.frame[,topic_number+1],decreasing=TRUE),][exemplar_level,1]), 2]

Traditional researchers often have to continue to work with topic-modelling results, but may not always be aware of the bias that the apriori-set variables have brought into the selection process. One possible way to bridge this gap between researcher and method is to involve the qualitative researcher earlier in the process by providing them with agency in the topic-modelling process. To do this, I have used R and the web-application framework Shiny to combine J. Chang’s LDA- and C. Sievert’s LDAvis-libraries with DNZ/CTS API requests and language-specific handling of the text data to create a graphical user interface (GUI) that enables researchers to find, topic-model, and browse texts in the collections of ATL, OPP, and INNZ. They can then export their produced corpus and model, so they can apply qualitative methods on a precise facet of a large text corpora, rather than the whole text corpora itself, which contains texts that are irrelevant for answering the researcher’s specific research question.

On the left side of the GUI, the researcher will be able to set the following variables: a) search term(s) or CTS-URN(s); b) the source collection; c) certain stop word lists or processes; d) additional stop words; e) the number of topics; and f) the number of terms shown for each topic in the visualization. The application then generates the necessary API-requests, normalizes the text as desired by the researcher, applies J. Chang’s LDA-library, and finally presents a D3 visualisation of the topics, their relationship to each other, and their terms using C. Sievert’s LDAvis and dimension reduction via Jensen-Shannon Divergence & Principal Components as implemented in LDAvis. The researcher can then directly and visually evaluate the success of their topic-modelling and use the settings on the left as if they were setting a microscope to focus on certain significant relationships of word co-occurrences within the corpus. If they have focussed their research tool, they then can export visualisations, topics, and their corpus for further research. For further clarification, a few example use-cases will be described in what follows.

As stated above one application was building a finding and analytical aid for cartoon descriptions and INNZ abstracts related to the Canterbury earthquakes of 2010 and 2011. One of the original research questions was to what extent did cartoonists react differently to the Canterbury earthquakes than print journalists. To answer this, over 100,000 abstracts and over 30,000 descriptions of born-digital cartoons had to be evaluated regarding their potential to answer the research question. ATL uses Library-of-Congress (LOC) based subject-headings for their collection items. At first, we thought they could be used efficiently to generate ideal selections of descriptions that fit with the research question, but we found that not only is using LDA topic-modelling quicker (among other things because it needs less human-intervention), but it was better able to identify trends over time, for example, it showed much more realistic topic-development regarding texts that dealt with the initial destruction and texts about the rebuild, while a LOC subject-heading based analysis needed much manual labour and yielded a different (unexpectedly wrong) result. A more thorough discussion can be found in the upcoming Alexandria article.

The other application of the topic modelling tools were further use-cases in which OPP corpora were processed: one Latin, one Greek, and one Arabic (please find the most recent, better documented approach for Greek here and for Latin here). In these more morphologically complex languages special emphasis had to be placed on the influence of morpho-syntactic normalisation, this is, the reducing of morphological complexity of different instances of the same word to the same morphological base (the so-called “dictionary-form” of a word). This normalisation by reduction can contribute immensely to the success of topic modelling. The degree of the impact of this normalisation on the success of topic-modelling, however, is language dependent, or more specifically depends on the kind of loss of information that occurs during the normalisation process in a specific language and also on which tools are available to reduce the morphological complexity in a specific language: for instance, while it is useful for Ancient Greek and Latin to normalise by reducing morphological complexity, because a frequency of a term then becomes more detectable and usable (see Greek topic-modelling results for Thucydides and also for Caesar’s De Bello Gallico), there are reasons why it might be better to not normalise Classical Arabic text in the same way. Classical Arabic has a genus verbi that expresses the sexus of the subject of the sentence, making it possible to easily detect, for example, female biographies using topic-modelling (a link to Dr Maxim Romanov’s blog will be posted here, once it’s published, but also see figure 1).

Figure 1. LDA topic-model generated from around 30,000 Classical Arabic biographies. Topic 20 “Biographies of women” is selected.

Currently, the R-script uses Perseid’s Morphological Service API to find a vector of possible lemmata for each word:

parsing <- function(x){
 word_form <- x
 URL <- paste(morpheusURL, word_form, "&lang=grc&engine=morpheusgrc", sep = "")
 message(round((match(word_form, corpus_words)-1)/length(corpus_words)*100, digits=2), "% processed. Checking ", x," now.")
 
 URLcontent <- tryCatch({
 getURLContent(URL)}, 
 error = function(err)
 {tryCatch({
 Sys.sleep(0.1)
 message("Try once more")
 getURLContent(URL)},
 error = function(err)
 {message("Return original value: ", word_form)
 return(word_form)
 })
 })
 if (URLcontent == "ServerError") {
 lemma <- x
 message(x, " is ", lemma)
 return(lemma)}
 else {
 lemma <- if (is.null(XMLpassage2(URLcontent)) == TRUE) {
 lemma <- x
 message(x, " is ", lemma)
 return(lemma)}
 else {lemma <- tryCatch({XMLpassage2(URLcontent)},
 error = function(err) {
 message(x, " not found. Return original value.")
 lemma <- "NotFound1"
 message(x, " is ", lemma)
 return(lemma)})
 
 lemma <- gsub("[0-9]", "", lemma)
 lemma <- tolower(lemma)
 lemma <- unique(lemma)
 if (nchar(lemma) == 0) {
 lemma <- x
 message(x, " is ", lemma)
 return(lemma)}
 else {
 message(x, " is ", lemma)
 return(lemma)
 }
 }
 }
}

The script then guesses the most frequent vector element in the corpus. Because vocabulary repeats in the same corpus and because different morphological versions of the same word have a different vector of potential vectors, this solution works well for many classical authors. That said, it is very slow! The slowest parts in the R script are the retrieving of the text itself and the retrieving of the lemma-vectors for each word. The former could be addressed either by a local installation of exist DB and the Perseus/OPP texts or if the API were modified to allow for requesting the whole text of an author in one API call. The latter could be addressed either by a local installation of Morpheus or even better by new parsers that are finite state transducers like Harry Schmidt’s Parsley written in Go or Neel Smith’s adaption for Greek. Currently, however, a researcher depends on morphological services that are delivered via individual API requests. It is obvious that the success of sending thousands of API requests to the same API service not only depends on the topic-modelling script itself, but also on the stability of the connection to the API service. For instance, connections can time-out and it needs error handling and testing of the corrected corpus to ensure the success of the morphological normalisation and the data-quality of the newly generated parsed corpus.

Here is a possible way of comparing the data-quality of the morphologically normalised corpus with the original corpus (see the parsing function above for handling possible connection problems):

### Compare length of corpus and corpus_parsed

length_corpus <- length(corpus)/2
test_corpus_length <- vector()
for (i in 1:length_corpus){
test_corpus_length[i] <- length(unlist(strsplit(as.character(unlist(corpus_parsed[i,2])), "[[:space:]]+")[1])) == length(unlist(strsplit(as.character(unlist(corpus[i,2])), "[[:space:]]+")[1]))
}
table_corpus_length <- table(test_corpus_length)
bug_report <- which(test_corpus_length == FALSE)

For both tasks, error-handling and the testing of the data-quality, most researchers who are just interested in creating a topic-model of a corpus or only see topic-modelling as a step for further research (e.g. social network analysis, text-reuse) or as a tool (e.g. finding similar sentences of already read passages to test the students’ language comprehension) would welcome help or a solution where this is already done for them. Offering those kinds of services has frequently been discussed in our team and the extended Digital Classics universe and it is not unlikely that researchers will in the future have those resources, especially because this is, at least in the research of historical languages, a finite problem.

In any case, as the reader can see, it is an exciting time to work on topic-modelling historical languages given that almost every month there are better tools and APIs out there that can be used to speed up code and given the push from traditional researchers to better understand topic-modelling and to use its results in their research and teaching. I hope that this short blog has shown a few examples and contributed to a better understanding of how topic-modelling, in all its complexity, can be opened up to and its development positively influenced by more traditional researchers with little computer skills, in turn enabling them to answer specific research question based on large text corpora.

The Big Humanities, National Identity and the Digital Humanities in Germany

Gregory Crane
June 8, 2015

Alexander von Humboldt Professor of Digital Humanities
Universität Leipzig (Germany)

Professor of Classics
Winnick Family Chair of Technology and Entrepreneurship
Tufts University (USA)

Summary

Alexander von Humboldt Professors are formally and explicitly “expected to contribute to enhancing Germany’s sustained international competitiveness as a research location”. And it is as an Alexander von Humboldt Professor of Digital Humanities that I am writing this essay. Two busy years of residence in Germany has allowed me to make at least some preliminary observations but most of my colleagues in Germany have spent their entire careers here, often in fields where they have grown up with their colleagues around the country. I offer initial reflections rather than conclusions and write in order to initiate, rather than to finish, discussions about how the Digital Humanities in Germany can be as attractive outside of Germany as possible. The big problem that I see is the tension between the aspiration to attract more international research talent to Germany and the necessary and proper task of educating the students in any given nation in at least one of their national languages, as well as their national languages and histories. The Big Humanities — German language, literature and history — drive Digital Humanities in Germany (as they do in the US and every other country with which I am familiar).

In my experience, however, the best way to draw new talent into Germany is to develop research teams that run in English and capitalize on a global investment in the use of English as an academic language — our short term experience bears out the larger pattern, in which a large percentage of the students who come to study in Germany enjoy their stay, develop competence in the language and stay in Germany. Big Humanities in Germany, however, bring with them the assumption that work will be done in German and have a corresponding — and entirely appropriate — national and hence inwardly directed focus.

But if it makes sense to have a German Digital Humanities, that also means that Germany may have its own national infrastructure to which only German speaking developers may contribute — 77% of the Arts and Humanities publications in Elsevier’s Scopus Publication database are in English, very few developers outside of the German speaking world learn German and the Big Humanities in the English speaking world tend to cite French as their second language (only 0.3% of the citations of the US Proceedings of the Modern Language Association pointed to German, while the Transactions of the American Philological Association, with 10% of its citations pointing to German, made the most use of German scholarship).

The best way to have a sustainable digital infrastructure is to have as many stakeholders as possible and, ideally, to be agile enough to draw on funding support from different sources, including (especially including) internationally sources of funding. We also need to decide what intellectual impact we wish German investments in Digital Humanities to have outside of the German speaking world and the related question of how the Digital Humanities can expand the role that German language, literature and culture play beyond the German speaking world.

Details and the full text are available here.

Seven reasons why we need an independent Digital Humanities

Gregory Crane
Professor of Classics and Winnick Family Chair of Technology and Entrepreneurship
Tufts University

Alexander von Humboldt Professor of Digital Humanities
Open Access Officer
University of Leipzig

April 28, 2015

Draft white paper available at http://goo.gl/V9Ddjq

This paper describes two issues, the need for an independent Digital Humanities and the opportunity to rethink within a digital space the ways in which Humanists can contribute to society and redefine the social contract upon which they depend.

The paper opens by articulating seven cognitive challenges that the Humanities can, in some cases only, and in other cases much more effectively, combat insofar as we have an independent Digital Humanities: (1) we assume that new research will look like research that we would like to do ourselves; (2) we assume that we should be able to exploit the results of new methods without having to learn much and without rethinking the skills that at least some senior members of our field must have; (3) we focus on the perceived quality of Digital Humanities work rather than the larger forces and processes now in play (which would only demand more and better Digital Humanities work if we do not like what we see); (4) we assume that we have already adapted new digital methods to existing departmental and disciplinary structures and assume that the rate of change over the next thirty years will be similar to, or even slower than, that we experienced in the past thirty years, rather than recognizing that the next step will be for us to adapt ourselves to exploit the digital space of which we are a part; (5) we may support interdisciplinarity but the Digital Humanities provides a dynamic and critically needed space of encounter between not only established humanistic fields but between the humanities and a new range of fields including, but not limited to, the computer and information sciences (and thus I use the Digital Humanities as a plural noun, rather than a collective singular); (6) we lack the cultures of collaboration and of openness that are increasingly essential for the work of the humanities and that the Digital Humanities have proven much better at fostering; (7) we assert all too often that a handful of specialists alone define what is and is not important rather than understanding that our fields depends upon support from society as a whole and that academic communities operate in a Darwinian space.

The Digital Humanities offer a marginal advantage in this seventh and most critical point because the Digital Humanities (and the funders which support them) have a motivation to think about and articulate what they contribute to society. The question is not whether the professors in the Digital Humanities or traditional departments of Literature and History do scholarship of higher quality. The question is why society supports the study of the Humanities at all and, if so, at what level and in what form. The Digital Humanities are important because new digital media and automated methods enable all of us in the Humanities to reestablish the social contracts upon which we always must depend for our existence.

The Digital Humanities provides a space in which we can attack the three fundamental constraints that limited our ability to contribute to the public good: (1) the distribution problem, (2) the library problem, and (3) the comprehension problem. First, all Humanities have the power to solve the distribution problem by insisting upon Open Access (and Open Data) as essential elements of modern publication. Here the Digital Humanities arguably provide a short-term example of leadership because of the greater prevalence of open publication. The second challenge has two components. On the one hand, we need to rethink how we document our publications with the assumption that our readers will, sooner or later, have access to digital libraries of the primary and secondary sources upon which we base our conclusions. At the same time, developing comprehensive digital libraries requires a tremendous amount of work, including fundamental research on document analysis, optical character recognition, and text mining, as well as analysis of the economics and sociology of the Humanities. Third, the comprehension problem challenges us to think about how we can make the sources upon which base our conclusions intellectually accessible — what happens when people in Indonesia confront a text in Greek or viewers in American view a Farsi sermon from Tehran, artifacts of high art from Europe or of religious significance from Sri Lanka, a Cantata of Bach or music played on an Armenian duduk?

The basic questions that we ask in the Humanities will not change. We will still, as Livy pointed out in the opening to his History of Rome, confront the human record in all its forms, ask how we got from there to where we are now and then where we want to go. And we may still, like Goethe, decide that the best thing about the past is simply how much enthusiasm it can kindle within us. But the speed and creativity with which we answer the distribution, library and comprehension problems determines the degree to which our specialist research can feed outwards into society and serve the public good.
The more we labor to open up our work — even the most specialized work — and to articulate its importance, the better we understand ourselves what we are doing and why. Non-specialists include other professional researchers as well as the general public.  We may think that we are giving up, in practice if not in law, something of our perceived (and always only conditional and always short-term) disciplinary autonomy but, in so doing, to win the freedom to serve, each of us according to the possibilities of our individual small subfields within the humanities, the intellectual life of society.