An Automatic Method to Extract Online Foreign Language Learner Writing Error Characteristics

An Automatic Method to Extract Online Foreign Language Learner Writing Error Characteristics

Brendan Flanagan (Academic Center for Computing and Media Studies, Kyoto University, Kyoto, Japan) and Sachio Hirokawa (Research Institute for Information Technology, Kyushu University, Fukuoka, Japan)
Copyright: © 2018 |Pages: 16
DOI: 10.4018/IJDET.2018100102

Abstract

This article contends that the profile of a foreign language learner can contain valuable information about possible problems they will face during the learning process, and could be used to help personalize feedback. A particularly important attribute of a foreign language learner is their native language background as it defines their known language knowledge. Native language identification serves two purposes: to classify a learners' unknown native language; and to identify characteristic features of native language groups that can be analyzed to generate tailored feedback. Fundamentally, this problem can be thought of as the process of identifying characteristic features that represent the application of a learner's native language knowledge in the use of the language that they are learning. In this article, the authors approach the problem of identifying characteristic differences and the classification of native languages from the perspective of 15 automatically predicted writing errors by online language learners.
Article Preview
Top

Introduction

Online language learning is gaining popularity, with many sites offering forms of mutual correction. This involves the pairing of language learners from different native language backgrounds, who correct each other’s writing as a native speaker of the language. This circumstance differs from a traditional classroom setting as it lacks a teacher to guide the language learning process, and is essentially a student-to-student relation. Previous research (Flanagan, 2013) has proposed the use of automatically generated quizzes as a method for reflecting on past errors made by online language learners. However, a method of detecting errors is required in order for a learner to practice similar errors that they have made. Also, the profile of a foreign language learner can contain valuable information about possible problems they will face during the learning process, and could be used to help personalize feedback. A particularly important attribute of a foreign language learner is their native language background as it defines their known language knowledge. Native Language Identification (NLI) is a process of determining the native language of a foreign language learner by analyzing a piece of their writing. Fundamentally, this problem can be thought of as the process of identifying characteristic features that represent the application of a learner’s native language knowledge in the use of the language that they are learning. Previous research has shown that learners from different native language backgrounds have different characteristics in their use of foreign language (Swan, 2001). Recently, research into the automation of NLI has been gaining in popularity and there are several practical applications to which the process could be applied, such as: providing targeted feedback on detected and potential errors in learner writing based on known problems for native language groups, and forensic linguistic author profiling where the native language of the author can be an important feature for investigation (Tetreault, 2013).

In this paper, we approach the problem of identifying characteristic differences and the classification of learner native languages from the perspective of writing errors. The basis for this is that learner writing can contain words, in particular nouns, that have a strong relationship with the learner’s native language. While these words can be a good indicator of the learner’s native language, the use is highly dependent on the subject or theme of the writing and less to do with the language learning process, for example: the differences in the nouns used by a learner writing a personal diary versus those used in an essay on a subject that requires specialist nouns, such as computer science, and mathematics. Analysis on learner writing errors is less dependent on the subject of the writing as the target of analysis is based on writing error concepts rather than the actual words of the learners’ writing.

A set of 15 predicted writing error scores made from the normalized output of 15 different Support Vector Machine (SVM) classifiers trained in previous research (Flanagan, 2013) are used as the basis of this analysis. We refer to these predicted writing error scores as a 15-dimension error prediction vector. Preliminary investigation by clustering will be used to show the differences of co-occurring writing errors between native language groups. The error prediction vector will then be analyzed by SVM machine learning to classify a learner’s native language. As a naïve baseline for comparison we will classify the native language using all words to compare the effectiveness of the proposed method. In the final section of this paper, we will examine the influence of words that have strong cultural or nationalistic relations, such as nouns representing: people, places, food, religion, etc. A method of removing words that are characteristic to a native language will be proposed. This method will then be applied to filter out cultural or nationalistic words from the corpus to provide an alternative “non-biased” baseline for critical evaluation of the proposed error prediction vector method.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 19: 4 Issues (2021): Forthcoming, Available for Pre-Order
Volume 18: 4 Issues (2020): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2019)
Volume 16: 4 Issues (2018)
Volume 15: 4 Issues (2017)
Volume 14: 4 Issues (2016)
Volume 13: 4 Issues (2015)
Volume 12: 4 Issues (2014)
Volume 11: 4 Issues (2013)
Volume 10: 4 Issues (2012)
Volume 9: 4 Issues (2011)
Volume 8: 4 Issues (2010)
Volume 7: 4 Issues (2009)
Volume 6: 4 Issues (2008)
Volume 5: 4 Issues (2007)
Volume 4: 4 Issues (2006)
Volume 3: 4 Issues (2005)
Volume 2: 4 Issues (2004)
Volume 1: 4 Issues (2003)
View Complete Journal Contents Listing