Embracing Computer Corpora in the Language Learning Classroom and Using It in Your Classroom

Sally Durand (University of Illinois at Chicago, USA)
Among the many facets of Computer Assisted Language Learning (CALL), studies using computer language corpora have risen to considerable prominence in research agendas. The author argues that corpora are useful tools for practicing teachers. However, the myth that “corpus-based research is too complicated to be useful for teachers” (Conrad, 2009) prevails in pedagogical contexts. This chapter strives to dispel that myth by synthesizing a wealth of research and their accompanying pedagogical applications. Secondly, it shares specific pedagogical activities to implement corpus data in classroom teaching. These corpus-informed classroom strategies provide concrete examples that will assist TESOL/TEFL teachers in making their coursework authentic and therefore more meaningful to students learning language.
What is a corpus? What is corpus-based research? When I gave my first presentation on the topic of using corpora in the classroom at Illinois’ TESOL conference in early 2014, two attendees answered my opening question, “What do corpora in the classroom mean?” with: “Our bodies, using the body in the classroom?” While my presentation title and topic may have misled those two guests, the word corpus, in fact, does mean body in Latin. A corpus is a collection of texts and speech (written) comprised for the purpose of linguistic analysis. Corpus linguistics has roots in theology. Religious scribes scoured their bibles, coding certain words or themes, making lists of their locations together in a kind of index, in other words, collecting a corpus. Imagine how those early scholars would have benefitted from our modern technology. In his book, An Introduction to Corpus Linguistics, Kennedy (1998) attributes the modern field of corpus linguistics to the capabilities offered by computerized databases. As we will see shortly, interest in corpus data has spread beyond linguists to pedagogical researchers and teachers. While linguists study language corpus data to better understand the language, corpus-informed/based language learning researchers examine corpus data to understand how language is learned and taught. This pedagogically-oriented area of inquiry, made possible by computers, is the epitome of Computer Assisted Language Learning.

The creation of computer technology has made corpus research possible, as the computer is able to store, code, categorize, and retrieve massive amounts of information. The field began with just a small group of dedicated enthusiasts, but by the end of the century, it had become a major research strand in both the fields of linguistics and language learning. Over the last two decades, educators have been finding ways to exploit the potential of corpora (the plural of corpus), channeling research data into pedagogical applications for language instruction. Research using corpus data in English language learning has become a broad field drawing from a wealth of corpora. This abundance and breadth of corpora and corpus-based research studies may contribute to the myth that it is all just too complicated for teachers (Conrad, 2009). Some teachers believe that the technology is too advanced or that the data are incomprehensible. These may be common sentiments and they are easily understandable. The sheer mass of a given database alone can be overwhelming. In my experience discussing corpora use in the classroom with peers, I have found that a few teachers are unaware of the existence of corpora. Many are vaguely to somewhat familiar with the concept of corpora and generally aware of the growing interest and research in corpus-informed language learning. Most have limited to no hands-on experience working with corpora. It is noted in the field that academic preparation programs lack any training in corpus linguistics (Granger, 2009; Hasko, 2013).

Many research studies use prominent corpora such as the British National Corpus (BNC), and the Corpus of Contemporary American English, usually referred to as COCA (Davies, 2008). These immensely huge corpora contain hundreds of millions of words from sources of news broadcasts to literary fiction. These resources allow for the analysis of language features and use such as medium and register by searches for spoken versus written language use, for example. Corpora also consist of professional and academic content, facilitating research in academic disciplines or subfields, and enabling comparisons on aspects of general English use versus English for specialized or specific purposes. Learner corpora provide insights into the language use of language learners as they collect actual learners’ work. One prominent corpus of this kind is the Michigan Corpus of Upper level Student Papers (MICUSP), from the University of Michigan. Another is the International Corpus of Learner English (ICLE). Learner corpora provide insight into learners’ needs and actual habits of learners’ language use. The features of corpora seem limitless as does the pedagogical potential. For example, along with generating millions of words in frequency lists, certain corpora are capable of charting changes in usage patterns over time, or the geographical status of varied world Englishes.

