Maximizing ANLP Evaluation: Harmonizing Flawed Input

Maximizing ANLP Evaluation: Harmonizing Flawed Input

Adam Renner (The University of Memphis, USA), Philip M. McCarthy (The University of Memphis, USA), Chutima Boonthum-Denecke (Hampton University, USA) and Danielle S. McNamara (Arizona State University, USA)
DOI: 10.4018/978-1-60960-741-8.ch026
OnDemand PDF Download:
List Price: $37.50


A continuing problem for ANLP (compared with NLP) is that language tends to be more natural in ANLP than that examined in more controlled natural language processing (NLP) studies. Specifically, ineffective or misleading feedback can result from faulty assessment of misspelled words. This chapter describes the Harmonizer system for addressing the problem of user input irregularities (e.g., typos). The Harmonizer is specifically designed for Intelligence Tutoring Systems (ITSs) that use NLP to provide assessment and feedback based on the typed input of the user. Our approach is to “harmonize” similar words to the same form in the benchmark, rather than correcting them to dictionary entries. This chapter describes the Harmonizer, and evaluates its performance using various computational approaches on unedited input from high school students in the context of an ITS (i.e., iSTART). Our results indicate that various metric approaches to NLP (such as word-overlap cohesion scores) are moderately affected when student errors are filtered by the Harmonizer. Given the prevalence of typing errors in the sample, the study substantiates the need to “clean” typed input in comparable NLP-based learning systems. The Harmonizer provides such ability and is easy to implement with light processing requirements.
Chapter Preview


ITSs often assess user-language via matching principles. For instance, user input is compared to a pre-selected benchmark response (e.g., ideal answer, solution to a problem, misconception, target sentence/text) by measuring content word overlap or semantic similarity (McNamara et al., 2007). Systems that use this principle include AutoTutor (Graesser et al., 1999), Why2-Atlas (VanLehn et al., 2007), and iSTART (McNamara, Levenstein, & Boonthum, 2004). Although ITSs vary widely in their goals and composition, ultimately their feedback systems rely upon comparing one text against another and evaluating their degree of similarity. Similarity assessments may falter when dealing with user-language, which is usually unedited and abundant with typographical errors and poor grammar. For instance, a word in a target sentence that a user intended to type may not be matched properly if the user misspells the word. In order for traditional NLP tools to be robust to ITS user-language, the evident implication is that the input needs to be “cleaned” prior to assessment. In other words, editing is required.

Complete Chapter List

Search this Book: