Evaluating Computerized Adaptive Testing Systems

Evaluating Computerized Adaptive Testing Systems

Anastasios A. Economides (University of Macedonia, Greece) and Chrysostomos Roupas (University of Macedonia, Greece)
DOI: 10.4018/978-1-60566-238-1.ch011
OnDemand PDF Download:
No Current Special Offers


Many educational organizations are trying to reduce the cost of exams, the workload, delay of scoring, and the human errors. Also, organizations try to increase the accuracy and efficiency of the testing. Recently, most examination organizations use Computerized Adaptive Testing (CAT) as the method for large scale testing. This chapter investigates the current state of CAT systems and identifies their strengths and weaknesses. It evaluates 10 CAT systems using an evaluation framework of 15 domains categorized into 3 dimensions: Educational, Technical and Economical. The results show that the majority of the CAT systems give priority to security, reliability, and maintainability. However, they do not offer to the examinee any advanced support and functionalities. Also, the feedback to the examinee is limited and the presentation of the items is poor. Recommendations are made in order to enhance the overall quality of a CAT system. For example, alternative multimedia items should be available so that the examinee would choose his preferred media type. Feedback could be improved by providing more information to the examinee or providing information anytime the examinee wished.
Chapter Preview


The increasing number of students, the need for effective and fast student testing, multimedia-based testing, self-paced testing, immediate feedback, and accurate, objective and fast scoring push many organizations to use Computer-Based Testing (CBT) or Computer Assisted Assessment (CAA) tools (Brown, 1997). But this is not enough. Current learning theories lead towards student-centred and personalized learning. There is also increased interest for reducing the cheating, reducing the examinee’s anxiety, challenging but not frustrating the examinees, as well as for immediate and continuous examinee’s guidance based on his knowledge, proficiency, ability and performance. Thus, many organizations are further driving towards computerized adaptive testing (CAT) tools (e.g. GMAT, GRE, MCSE, TOEFL). CAT is a special case of CBT. It is a computer-based interactive method for assessing the level of a student’s knowledge, proficiency, ability or performance using questions tailored to the specific student. The CAT system selects questions from a pool of pre-calibrated items appropriate for the level of the specific student. Wainer (1990) indicated that two of the benefits of CATs over CBTs are higher efficiency and increased student motivation due to higher levels of interaction provided. CAT can estimate the student’s level in a shorter time than any other testing method. CAT is based on either Item Response Theory (IRT) or Decision Theory (Welch & Frick, 1993; Wainer, 1990; Rudner, 2002). It is a valid and reliable testing method.

A CAT system tailors the test to the proficiency of the individual examinee. The CAT system adjusts the test by presenting easy questions to a low-proficiency examinee and difficult questions to a high-proficiency examinee. However, the score of each examinee depends not only on the percentage of questions answered correctly but also on the difficulty level of these questions. Even if both examinees answer the same percentage of questions correctly, the high-proficiency examinee gets a higher score because he answers correctly more difficult questions. Because each test is tailored to the individual examinee, far more information is gained from the examinee’s response to each item than in conventional test (Young et al., 1996). The main advantage of a CAT is efficiency (Straetmans & Eggen, 1998). IRT-based CAT has been shown to significantly reduce testing time without sacrificing reliability of measurement (Weiss & Kingsbury, 1984). It has been shown that CAT needs fewer questions and less time than paper-and pencil tests to accurately estimate the examinee’s level (Jacobson, 1993; Carlson, 1994; Wainer, 1990; Wainer et al., 2000). However, Lilley, Barker & Britton (2004) argued that the stop condition of a CAT can create a negative atmosphere amongst examinees, which could result in the rejection of the CAT altogether. Examinees might consider that the fairness of the assessment is jeopardised if the set of questions is not the same for all participants. Furthermore, examinees expressed their concern about not being able to return to review and modify previous responses. Olea et al. (2000) showed that allowing answer review decreases the examinee’s anxiety, and increases the number of correct responses and the estimated ability level of the examinee. Similarly, Wise and Kingsbury (2000) pointed out that when examinees are allowed to change answers, they are more likely to decrease their anxiety improve their scores and score gains. Lilley & Barker (2003) showed that learners with different cognitive styles are not disadvantaged. Also, CAT has the potential to offer a more consistent and accurate measurement of examinee’s abilities than that offered by traditional CBTs. Georgouli (2004) proposed an intelligent agent for self-assessment which adapts its material to reflect the needs of the individual learner, whether it is for studying or for testing. In addition to the examinee’s achievement in the test, the system would also consider his personality characteristics (Triantafillou et al., 2007a). Taking into consideration the examinee’s knowledge on the domain, background experience, preferences, personal data and mental model, efficient CATs would be produced (Triantafillou et al., 2007b).

Complete Chapter List

Search this Book: