Article Preview
Top1. Introduction
Learning assessments are important to monitor the progress of students throughout the teaching process. In addition to providing an individual screening of the students, large-scale learning assessments allow educators to obtain rich information about classes, schools, and regions across the country. Decision-makers (e.g., heads of governmental institutions or presidents of private schools) are able to detect problems in an early stage, set consistent learning goals, and draw plans for improving learning in a wider scale through the reports of large-scale assessments (Suskie, 2018).
In the digital era, many local and large-scale learning assessments are conducted through technological tools. Some examples are language proficiency exams (e.g., TOEFL iBT1) and university entrance exams (e.g., GRE2). These assessments are often able to achieve their goals through a combination of multiple-choice questions, which are well-structured and easy for computers to understand, and open-response questions, which are filled up freely by the student. Open-response questions may appear not only as texts but also as audios or videos recorded during the exam. Multiple-choice questions can be quickly graded by an algorithm, whereas open-response questions are commonly evaluated by a human professional. The unstructured nature of open-response questions requires refined approaches to automatically grade the performance of a student and explains why many decision-makers opt to hire human professionals.
There has been considerable development in the literature regarding automatic assessment of unstructured data. Regarding texts, Natural Language Processing (NLP) techniques are able, for example, to automatically grade short answers (Sijimol, 2018) and essays (Rokade, 2018). Similar development has been made for audios through Automatic Speech Recognition (ASR) techniques (e.g., extracting phones pronounced in the recording as well as discovering age and gender of the speakers (Safavi, 2018)). Thus, technology in large-scale assessments has potential to help in two fronts. First, it acts as a facilitator from a logistical perspective by avoiding the use of some materials, the need of transporting cargo from one place to another, and the manual analysis of the exams. Second, it performs the role of a grader by automatically assessing responses through simple and refined techniques, speeding up final reports and considerably decreasing overall costs.
A large-scale learning assessment can be designed to tackle one or multiple parts of the teaching process. However, an important aspect of literacy is often forgotten by learning assessments: the reading fluency, i.e., how fluent students are in reading texts of their own native language. Reading fluently shows that the speaker has a good comprehension of the text (Rasinski, 2017) and it is crucial for every individual to continue the studies, to learn a profession and to ultimately live in society (Dias, 2016). Therefore, educators have become increasingly interested in proposing ways to collect and analyze information about the reading fluency of their students during the stages of reading development within large-scale learning assessments.
Oral reading fluency assessments, more simply referred as fluency assessments, entail additional complexity to the already non-trivial open-response questions from common learning assessments. Establishing a comprehensive methodology to grade reading skills, which should rely on objective points and not subjective interpretation, and handling multiple accents within the same assessment, which increases the need of credible professionals since people may have a bias towards their accent, are examples of challenges related to fluency assessments. In a large-scale setting, those assessments also generate a high number of audios to be evaluated, which implies higher costs than other learning assessments especially when hiring specialized workforce. Nevertheless, assessing the reading fluency is of tremendous significance since it can preemptively help diagnosing local and widespread problems (e.g., the difficulty of an individual in reading fluently or the failure of teaching methodologies in certain schools or regions across the country) and directing future funding for materials or teacher training.