Using Automated Procedures to Generate Test Items That Measure Junior High Science Achievement

Using Automated Procedures to Generate Test Items That Measure Junior High Science Achievement

Mark Gierl (University of Alberta, Canada), Syed F. Latifi (University of Alberta, Canada), Hollis Lai (University of Alberta, Canada), Donna Matovinovic (CTB/McGraw-Hill, USA) and Keith A. Boughton (CTB/McGraw-Hill, USA)
DOI: 10.4018/978-1-4666-9441-5.ch022
OnDemand PDF Download:


The purpose of this chapter is to describe and illustrate a template-based method for automatically generating test items. This method can be used to produce a large numbers of high-quality items both quickly and efficiency. To highlight the practicality and feasibility of automatic item generation, we demonstrate the application of this method in the content area of junior high school science. We also describe the results from a study designed to evaluate the quality of the generated science items. Our chapter is divided into four sections. In section one, we describe the methodology. In the section two, we illustrate the method using items generated for a junior high school physics curriculum. In section three, we present the results from a study designed to evaluate the quality of the generated science items. In section four, we conclude the chapter and identify one important area for future research.
Chapter Preview


Profound global and economic changes are shaping how we develop and deliver educational tests. These changes can be traced to the growing emphasis on knowledge services, information, and communication technologies. To thrive in this new environment, countries require skilled workers who can solve complex problems, adapt to novel situations, and collaborate effectively with others. Educational tests, once developed almost exclusively to satisfy demands for accountability and outcomes-based assessment, are now expected to provide teachers and students with frequent, detailed, feedback to directly support the teaching and learning of these new 21st century skills (Black & Wiliam, 1998, 2010).

Two conditions now exist that, when taken together, will permit us to begin to develop and deliver the kinds of tests that would provide teachers and students with frequent and timely feedback on a diverse range of 21st century knowledge and skills. The first condition stems from the dramatic change in how students’ access technology and use the Internet. For example, MediaSmart, a not-for-profit organization that collects and disseminates information about digital literacy in Canada, recently conducted a national survey of 5436 Canadian students from Grades 4 to 11. The report “Life Online: Young Canadians in a Wired World” (Steeves, 2014) provides a comprehensive snapshot of how students access and use technology. Many striking findings were reported. For instance, 99% of the students in the survey accessed the Internet outside of school. The point of access has shifted from desktop home computers, which was the most common method in the MediaSmart 2005 survey, to portable personal computers such as laptops, tablets, and smartphones, which is the most common method today. More than half of the Grade 4 students surveyed said that they accessed the Internet using a portable computer. Twenty-four percent of these students owned their own portable computer. Internet access and portable computer use jumped dramatically as students progressed through school. More than 70% of the Grade 11 students surveyed accessed the Internet using a portable computer. Eighty-five percent of these students owned their own portable devices. These findings demonstrate that Internet access is common, even among elementary students, and that portable computing is viable because many students own their own computers.

The second condition stems from the rapid integration of technology and educational testing. The importance of technology in testing was first described by Bennett more than a decade ago when he claimed that no topic would become more central to innovation and future practice in educational testing than computers and the Internet (see Bennett, 2001). Since Bennett made this claim, there has been a gradual but steady migration from paper- to computer-based testing at both the K-12 and post-secondary education levels as well as among licensure and certification agencies to the point where computerized testing could now be characterized as a common practice. One reason for this migration is feasibility. Simply put, educational testing is no longer feasible when delivered in a paper-based format because it is a resource-intensive process. The printing, scoring, and reporting of paper-based tests require tremendous efforts, expenses, and human interventions. Moreover, as the demand for testing continues to escalate, the cost of developing, administering, and scoring paper-based tests will also increase. The solution that curtails some of these costs is to adopt a computerized testing system. By administering tests on computers over the Internet, educators are liberated from performing the costly and time-consuming administration processes associated with disseminating, scanning, and scoring paper-based tests. Instead, tests can be administered by computers over the Internet and scored automatically.

Key Terms in this Chapter

Formative Assessment Principles: Includes any assessment-related activities that yield constant and specific feedback to modify teaching and improve learning—can include testing on-demand, providing students with instant feedback, permitting testing in different locations and at different times.

Variants: Generated items produced from the same item model that appear different from one another.

Item Model: A template, a mould, or a rendering that highlights the features in an item that must be manipulated to generate new items.

Automatic Item Generation: A process of using item models to generate test items with the aid of computer technology.

Cognitive Model: A representation that highlights the knowledge, skills, and problem-solving processes students require to answer test items.

Cosine Similarity Index: A measure of similarity between two vectors of co-occurring texts computed using the cosine of the angle between the two vectors in a multidimensional space of unique words.

Isomorphs: Generated items produced from the same item model that appear similar to one another.

Elements: Variables in the item model that can be modified to create new test items.

Complete Chapter List

Search this Book: