In contrast to the assessment of reading, listening, or writing skills, the assessment of oral proficiency has seen a growing preference for interactive task formats as the context in which to evaluate a learner's spoken ability. Tasks that require learners to engage in authentic, meaningful interactions with their peers or the teacher, however, deserve attention due to potential benefits as well as unintended consequences. This is because in interactive tasks a learner's performance is not the sole display of individual competence, as it is the case with writing or listening tests: The presence of an interlocutor introduces a social dimension, raising issues of test validity and reliability, and ultimately impacting test design and scoring systems. Therefore, the goal of this chapter is to discuss the challenges and implications of assessing oral proficiency when a social dimension is added to the picture and to contribute to a better understanding of how a joint construction of speaking performance can impact not only test development but also students' scores.
TopIntroduction
Particularly since the mid-1970s, research in applied linguistics has seen a growing interest in how second language learners acquire and develop the ability to use language appropriately and competently in their social interactions in the target language (e.g. Bachman, 1990; Canale & Swain, 1980; Gardner & Wagner, 2004; Gass & Selinker, 2008; Hall, Hellermann, & Pekarek-Doehler, 2011; Hymes, 1972; Kasper & Wagner, 2014; Kramsch, 1986). This interest is reflected on second language assessment research, which in turn has focused on investigating how the complex dynamics of social interactions impacts the assessment of language proficiency, in particular of oral proficiency (Lazaraton, 2014; McNamara & Roever, 2006; O’Sullivan, 2012; Tsagari & Banerjee, 2016).
In contrast to the assessment of reading, listening, or writing skills, the assessment of oral proficiency has received a greater deal of attention because it has seen a growing preference for interactive task formats as the context in which to evaluate a learner’s spoken ability (Katz, 2014; Luoma, 2004; McNamara & Roever, 2006). This preference stems from an increased understanding that the main goal of teaching spoken language is to help learners develop their ability to interact successfully in the target language, not only in a wide variety of social contexts, but also with a greater range of interlocutors. As a result, a learner’s performance in those tasks should no longer be seen as the sole display of individual competence, as it is the case with tests of writing, reading, or listening ability – skills that are often tested in isolation (Fulcher & Davidson, 2007; McNamara, 1997; O’Sullivan, 2012; Underhill, 1987). The very presence of an interlocutor (be this person an interviewer, a rater, or a conversation partner) introduces a social dimension to the task, allowing for a range of factors to potentially affect a learner’s performance – positively or negatively. For example, during an interview, the social distance that may exist or may be perceived to exist between a teacher and a student, and the intrinsic power dynamics that characterizes this type of interaction can have an impact on how successful learners are in demonstrating what they can do with the language. During group discussions, another common interactive task, more talkative participants may receive a better grade despite not necessarily showing higher linguistic proficiency. Other factors, such as participants’ overall language proficiency, native languages, cultural identities, personalities, and ages seem to affect not only individual performance in joint tasks but, perhaps most importantly, the performance of other learners.
The question then becomes whether it is necessary or even feasibly possible to account for all such factors, when assessing oral language proficiency. As meaning is co-constructed in real time, by participants engaged in a particular interaction, it is affected by factors that go beyond their actual linguistic and content knowledge. This begs the question regarding the validity and reliability of interactional tasks used to elicit samples of language and the rubrics used to score them (Brown, 2003; Foot, 1999; McNamara & Roever, 2006).