Corpora and Concordancers

Corpora and Concordancers

Charles Hall (The University of Memphis, USA)
DOI: 10.4018/978-1-61350-447-5.ch004
OnDemand PDF Download:
No Current Special Offers


At the heart of almost all ANLP is the corpus. This chapter provides an overview of the history and development of the corpus and crucial criteria that define the modern corpus. It ends with a discussion of the most basic analytical tool for corpus linguistics, the concordancer.
Chapter Preview


In 2011 the Centre for English Corpus Linguistics (CECL) at the University of Louvain (Belgium), will celebrate its 20th anniversary and is among the oldest modern institutions dealing with corpus work. However, the history of corpus research and concordancers actually predates that center by almost 800 years.

One of the first research tools developed for corpus work was the written concordance. In contrast to an index that normally only lists important topics or names, a written concordance lists the location of every occurrence of every word in the corpus. The first concordance recorded was of the Vulgate Bible completed in the 13th century by Dominicans ( Although their original purpose was not linguistic, Bible concordances were later essential in the 19th century efforts at authorship identification issues in Genesis, for example. Researchers were able to use lexical means to support the documentary hypothesis (Wellhausen, 1905) that there were several different writers for first books of Bible.

In the 19th century, individuals would spend many years of their lives preparing written concordances of the works of individual authors, such as the complete concordance of Milton’s work by Cleveland (1867). These works could be used to investigate language patterns by other scholars; however, these printed concordances were both unwieldy and subject to human error in compilation. Indeed, these last two factors were crucial in limiting the use of corpus research. Before corpus linguistics could become widespread and accepted, two events were essential to the growth of contemporary corpus linguistics: the development of the computer and an awareness of the need for empirical data in language analysis.

Complete Chapter List

Search this Book: