The User-Language Paraphrase Corpus

The User-Language Paraphrase Corpus

Philip M. McCarthy, Danielle S. McNamara
DOI: 10.4018/978-1-61350-447-5.ch006
(Individual Chapters)
No Current Special Offers


The corpus in this challenge comprises 1998 target-sentence/student response text-pairs, or protocols. The protocols have been evaluated by extensively trained human raters; however, unlike established paraphrase corpora that evaluate paraphrases as either true or false, the User-Language Paraphrase Corpus evaluates protocols along 10 dimensions of paraphrase characteristics on a six point scale. Along with the protocols, the database comprising the challenge includes 10 computational indices that have been used to assess these protocols. The challenge posed for researchers is to describe and assess their own approach (computational or statistical) to evaluating, characterizing, and/or categorizing, any, some, or all of the paraphrase dimensions in this corpus. The purpose of establishing such evaluations of user-language paraphrases is so that ITSs may provide users with accurate assessment and subsequently facilitative feedback, such that the assessment would be comparable to one or more trained human raters. Thus, these evaluations will help to develop the field of natural language assessment and understanding (Rus, McCarthy, McNamara, & Graesser, 2008 [a]).
Chapter Preview

The Seven Major Problems With Evaluating User-Language

Although a wide variety of tools and approaches have assessed edited, polished texts with considerable success, research on the computational assessment of ITS user-language textual relatedness has been less common and is less developed. As ITSs become more common, the need for accurate, yet fast evaluation of user-language becomes more pressing. However, meeting this need is challenging. This challenge is due, at least partially, to seven characteristics of user-language that complicate its evaluation,

Text Length

User-language is often short, typically no longer than a sentence. Established textual relatedness indices such as latent semantic analysis (LSA; Landauer et al., 2007) operate most effectively over longer texts where issues of syntax and negation are able to wash out by virtue of an abundance of commonly co-occurring words. Over shorter lengths, such approaches tend to lose their accuracy, generally correlating with text length (Dennis, 2007; McCarthy et al., 2007; McNamara et al., 2006; Penumatsa et al., 2004; Rehder et al. 1998; Rus et al., 2007; Wiemer-Hastings, 1999). The result of this problem is that longer responses tend to be judged more favorably in an ITS environment. Consequently, a long (but wrong) response may receive more favorable feedback than one that is short (but correct).

Complete Chapter List

Search this Book: