Article Preview
Top1. Introduction
Rating essays is a costly, laborious and time-consuming effort, which is especially true in China due to the large number of students. Statistics show that the number of college students in China has soared to twenty-six million in 2013 (Bureau of Statistics of China, 2013) including more than ten million engineering students, making up the largest proportion of English as Second Language (ESL) learners worldwide. Since 1987, the writing test has become an important aspect in the College English testing in China. Essay writing is the fourth part of these tests. Trained English teachers manually rate the essays due to the nature of subjectivity and creativity of essay writing. However, rating essays is a time-consuming effort and at the same time the ratings are prone to the subjective judgment of the trained English teachers leading to inconsistent and unreliable scores due to the impact of fatigue, deadlines or biases.
Computer aided assessment (CAA) has become an important educational technology (Clark & Byl, 2007) since it reduces teacher work-loads (Peat, Franklin, & Lewis, 2001), provides timely feedback to students (Sheard & Carbone, 2000), reduces in educational material development and delivery costs (Jefferies, 2000), and proliferate online education (White, 2000). Research in computer-based essay scoring, referred to as automatic essay scoring (AES), has been a real and viable alternative and complement to human scoring for more than 40 years (Shermis & Burstein, 2003). AES systems do not actually read or understand essays as humans do. Whereas human raters may directly evaluate various intrinsic features, such as diction, fluency and grammar, in order to produce an essay score, the AES systems rely on a statistical scoring model, which combines these features and approximates a final machine-generated score of the essay. In general, the task of automated grading can be viewed as a regression problem in which the objective is to find a set of features that represent the essays and serve as inputs of the regression methods. Regression algorithms are utilized to estimate the weights of each term (i.e. feature) in the regression equation so that the prediction performance can be optimized with regard to the actual values of the variable to be predicted/explained by the model.
Many AES systems, such as e-rater and PEG (Attali & Burstein, 2006; Page, 1966; Warschauer & Ware, 2006), based on a multiple linear regression model with predefined textual features extracted by using computational linguistic tools. Another approach to AES is based on Latent Semantic Analysis technique (Landauer, McNamara, Dennis, & Kintsch, 2007) such as Intelligent Essay Assessor (Foltz, Streeter, Lochbaum, & Landauer, 2013; Landauer, Laham, & Foltz, 2003) and IntelliMetric (Elliot, 2003; Rudner, Garcia, & Welch, 2006). But, this approach requires a large training corpus for a specific essay prompt. More recently, McNamara et al. (2015) proposed a hierarchical classification approach to automated essay scoring. In this study, we extend the traditional linear regression model to the non-linear regression model for automated essay scoring since the qualities of ESL writing do not linear relationship with the textual features.