Resolving the Paradox of Overconfident Students with Intelligent Methods

Resolving the Paradox of Overconfident Students with Intelligent Methods

Denis Smolin, Sergey Butakov
Copyright: © 2015 |Pages: 14
DOI: 10.4018/978-1-4666-6276-6.ch010
(Individual Chapters)
No Current Special Offers


The chapter presents a case study of using data mining tools to solve the puzzle of inconsistency between students' in-class performance and the results of the final tests. Classical test theory cannot explain such inconsistency, while the classification tree generated by one of the well-known data mining algorithms has provided reasonable explanation, which was confirmed by course exit interviews. The experimental results could be used as a case study of implementing Artificial Intelligence-based methods to analyze course results. Such analyses equip educators with an additional tool that allows closing the loop between assessment results and course content and arrangements.
Chapter Preview


Introduction briefs the reader into the problem of the objective course material assessment, flaws in the standard statistical solutions to this problem, and the potential areas where some approaches from Artificial Intelligence (AI) domain could be useful.

Problem of student knowledge and skills assessment is as old as the education itself. All educators need a tool to check efficiency and effectiveness of their teaching materials and approach used in the classroom. Various test styles has been around for centuries but they always raise questions if they are appropriate for checking student comprehension of the subject matter.

The area of student testing is much wider than one can assume. A test, as a measurement tool, and a testing algorithm, as a method to interpret test results, are very generic models of any measurement. It is applicable not only for measurement of student skills and knowledge, but also for almost any other activity. Testing consists of two major steps - data collection and interpretation. Interpretation implies some decision making, which is a large domain where methods of AI play important role. On one hand, classical testing is based on mathematical statistics and AI uses many statistical methods. On the other hand, modern tests for knowledge and skills are much more complex, as compared with their classical ancestors. AI equips the interpretation phase of testing with appropriate processing methods. These methods give fruitful results that are impossible to get with classical algorithms. This paper demonstrates an example of such an AI method.

Interpretation phase of the testing process is covered by two main theories in the field:

  • Classical Testing Theory (CTT): Which is closely related to mathematical statistics. For example, Spearman correlation was initially developed for psychological testing (Traub, 1997).

  • Item Response Theory (IRT): A latest theory that expands CTT and is claimed to be more precise for the case of computer-based testing process (Thissen & Mislevy, 2000).

Currently, classical testing theory, which is over 100 years old, is the most popular one. The question of comparative efficiency of these theories is ambiguous and requires detailed examination. Instead of attempting to invalidate the entire theory, the more productive approach is to investigate possible theory inconsistencies - so-called paradoxes, which consist of the cases the theory cannot explain. While applying CTT to student knowledge and skills one can simply find two paradoxes: a student with bad knowledge getting higher exam score and a student with presumably good knowledge of subject matter failing a major test.

CTT answers these questions with the concept of quality, defined with the statistical evidence. A test is of high quality, when it has high validity and reliability. There are different methods for validity and reliability evaluation. The majority of these methods are based on correlation coefficients, which obviously require a representative sample. In practice, it could be difficult to get enough data to prepare such a sample. A test with proved statistical characteristics is called standardized and considered an unbiased measurement instrument. This test is correct for almost all students. All other tests assumed to be tests with indefinite quality. In some cases, it can also be considered a paradox, because the quality of tests can be proved with some other methods. Additionally the paradox may occur, when a student with presumably good knowledge fails high quality tests. It rarely happens and CTT has eliminated these cases from its scope by arguing that there is not enough information to evaluate them. For example, we may be unaware of all the personal circumstances that happened to the student during the exam time.

Complete Chapter List

Search this Book: