Evaluating the Assessment Qualities of Teacher-Created Tests

Lauren M. Rein (University of Northern Iowa, USA)
DOI: 10.4018/978-1-5225-6986-2.ch013


Due to programmatic pressures and professional reasons, language teachers and program administrators are becoming more interested in the analytical side of test evaluation. Program-wide tests are often maintained by program administrators, while teacher-created tests too often remain in the classroom and are never effectively analyzed for their assessment qualities. This chapter will attempt to describe several practical and approachable methods to plan and evaluate teacher-created tests without complicated and advanced statistical measures. This chapter argues for a harmonious existence of both standardized exams and classroom assessments built around student learning outcomes. After outlining well-known assessment foundations and defining terminology, the chapter will describe the practical necessity of aligning with outcomes in order to meet programmatic standards. Finally, this chapter will provide a variety of qualitative measures to evaluate teacher-created tests, along with a small selection of basic statistical measures.
Chapter Preview


Classroom Assessment

Language teaching and language testing existed on the periphery of academia and research until the late twentieth century, when investigation in the field increased, especially in the direction of qualitative testing (Lazaraton, 2004; Richards, 2009). Traditionally, when assessment researchers described methods in analyzing testing qualities, language tests fell into the paradigms of norm-referenced and criterion-referenced testing. Norm-referenced testing (NRT), such as the TOEFL or IELTS tests, seeks to compare the test-taker to a norm or standard distribution of the collection of all test-takers. NRT is large-scale testing, where a test-taker’s skills are evaluated without classroom instruction. Students are evaluated on a scale and given a score in reference to his or her position in relation to the other test-takers.

Criterion-referenced testing (CRT), on the other hand, measures a test-taker’s specific knowledge in reference to specific skills or a standard of performance, rooted in classroom instruction. In other words, classroom teachers have a list of objectives to teach, and they must evaluate how students are progressing in mastery and skill of these outcomes by using a variety of classroom assessments. Classroom teachers continuously evaluate student performance in achieving objectives, making decisions about the direction of what to teach next based on classroom assessment, both formative and summative. Summative and formative assessment are described more extensively in Fulcher and Davidson (2007). Brown and Hudson (2002) also introduced an alternative to criterion-referenced testing: objective-referenced testing. Objective-referenced testing (ORT) measures the students’ achievement of specific objectives of the class. These tests are designed to evaluate the specific learning objectives as pre-determined by the course designer. See their book for a discussion on the differences of NRT, CRT, and ORT (along with other testing terminology) in more detail.

