Language Exercise Generation: Emulating Cambridge Open Cloze

Alexey Malafeev (National Research University Higher School of Economics, Moscow, Russia)
This manuscript presents an approach to the automatic generation of open cloze exercises based on arbitrary English text. The exercise format is similar to the open cloze test used in Cambridge English certificate exams (FCE, CAE, CPE). The presented method also makes it possible to adjust the difficulty of the resulting exercises to better suit specific proficiency levels. Three experiments were conducted to evaluate the usefulness of the machine-generated exercises, compare them with authentic Cambridge English tests and study the difficulty-setting capabilities. The experiments showed that the generation method used was quite effective. With some customization, the method can be applied to generating similar exercises for other languages.
Vocabulary and grammar exercises are widely used in teaching English as a foreign language (TEFL), but creating them manually is time-consuming and expensive. In response to this, many methods for the automated generation of language exercises have been proposed in the past two decades. These solutions rely on various NLP tools and techniques and can produce different types of exercises. Using automatically generated teaching materials that focus on grammar, vocabulary or reading comprehension may be especially relevant in view of the current trends in education, namely blended learning (Graham, 2006) and computer-assisted language learning (CALL) (Levy, 1997). Indeed, because of its inexpensiveness and richer variety, this kind of automatically generated content could create a more positive learning experience in comparison with conventional textbooks that usually offer a very limited number and range of exercises.

This paper presents an effective approach to the generation of text-based open cloze exercises similar to those used in Cambridge English certificate exams (FCE, CAE and CPE). The method used does not rely on many sophisticated NLP tools, yet it is powerful enough to generate realistic and useful exercises. In fact, as shown in the evaluation section of this paper, experienced EFL instructors find it somewhat difficult to tell the difference between exercises generated with the help of the method presented here and authentic Cambridge English tests. With this method, it is also possible to adjust the difficulty of the resulting exercises to better suit specific proficiency levels.

In the most general sense, the cloze is a test of language ability and/or reading comprehension, which is created by removing certain words from a text. The gaps are to be filled in with appropriate words. In the open cloze, the test-taker is to guess the suitable words from the context, without seeing any multiple-choice options. It may therefore be a challenging task, requiring a deep understanding of language structure (“Cambridge English: Advanced Handbook for Teachers,” 2012; Lee, 2008).

Although the open cloze may be of different varieties (Lee, 2008), the approach presented in this paper is aimed at emulating the open cloze test used in Cambridge English certificate exams (FCE, CAE and CPE). In this test, “[t]he focus of the gapped words is either grammatical, such as articles, auxiliaries, prepositions, pronouns, verb tenses and forms; or lexicogrammatical, such as phrasal verbs, linkers and words within fixed phrases” (“Cambridge English: Advanced Handbook for Teachers,” 2012).

The reasons for choosing the Cambridge open cloze as the target exercise type are the following:

  • 1.

    Cambridge English certificate exams are well-established and highly regarded, and they tend to emphasize a close relationship between teaching and testing (Chalhoub-Deville & Turner, 2000). Open cloze exercises are a useful stimulus in integrated reading, writing and vocabulary instruction (Lee, 2008).

  • 2.

    This exercise type largely focuses on using function words in English, an analytic language. These may be difficult for learners to master and often require extensive practice. This is especially true for learners whose mother tongue (L1) differs from English in its structure.

Depending on whether the generated activity is to be used for practice or assessment of learners’ proficiency, it may be called an exercise or a test, respectively. For simplicity, in this paper we will refer to the generated activities as exercises.

The method discussed in this paper is part of a larger system called Exercise Maker, which is freely available for non-commercial use. The system is aimed at generating vocabulary and grammar exercises of various types based on real-life texts. Being able to use arbitrary texts (e.g. news articles, blog entries, film reviews etc.) for generating exercises gives the user a lot of freedom in choosing interesting and relevant material. Research has shown that learners’ motivation can be improved by tailoring texts to their interests (Heilman et al., 2010).

