Psychometric Post-Examination Analysis in Medical Education Training Programs

Psychometric Post-Examination Analysis in Medical Education Training Programs

Emanuele Fino, Bishoy Hanna-Khalil
DOI: 10.4018/978-1-7998-1468-9.ch012
(Individual Chapters)
No Current Special Offers


Assessment in medical education has changed dramatically over the last two decades. The current, global call for medical practitioners has encouraged medical schools to open their doors and expand their curricula, generating an increasing demand for guidance with regards to the assurance and improvement of the quality of training programs and systems. This chapter provides the reader with an overview of psychometric post-examination analysis. The authors' view is that these are strategic educational assets that can help medical educators to understand and evidence the extent to which assessment data and their interpretation reflect the achievement of learning objectives, and the validity of assessment methods implemented in medical education programs.
Chapter Preview


Assessment in medical education has changed radically over the last two decades (Norcini & McKinley, 2007; Swanwick, Forrest, & O’Brien, 2019). Traditional methods of examinations such as the written papers, based on open-ended questions, and oral examinations are being replaced with standardized written tests of applied knowledge and objective structured clinical exams. These are guided with the intention of making assessment more reliable, valid and acceptable. Recent developments in psychometric theory and access to advanced computational technology have favored such changes, allowing educators to improve methods and tools used to assess students through their academic progression and transition into the medical profession.

The use of psychological measurement in medical education has a long-standing history, originating from the need of educators to make high-stake decisions and provide multiple stakeholders (students, regulatory bodies, health systems, communities) with results in which measurement error is minimized. The vast majority of assessment methods and techniques will ultimately produce indirect observations of latent constructs (Finch & French, 2018), which are assumed to represent part of the fundamental toolkit of knowledge, competences, and skills that a doctor is required to develop (e.g. functional knowledge of basic sciences, diagnostic reasoning, and communication skills). Unfortunately, educational measures differ from the measurement of attributes of physical objects in that they entail a higher degree of variation or error, which in turn need to be inspected and properly addressed to consider the assessment results as reliable and valid, and to allow educators to make confident decisions. This makes the pursuit of precision and the calibration of assessment methods cost or effort dependent, however necessary to satisfy decision-making requirements (Bond & Fox, 2015; Jackson, Jamieson, & Khan, 2007).

Psychometrics represent a set of evidence-based theoretical test models that can help to validate the fundamental assumption that a test score, usually expressed in numeric terms, will represent a certain degree of a medical student’s capacity within a defined domain of scientific knowledge or clinical skills and competencies (Kline, 2014; Norman, 2016). Such models are necessary to clarify and evidence the relation between the measured or observed score, the underlying latent construct that the test is aiming to measure (De Champlain, 2010), and to provide clear and evidence-based recommendations to educators with regards to a process that will determine academic progression and ultimately medical licensing.

It is important to remind the reader that assessment in medical education occurs within a situated system of learning practices and in relation to a curriculum, and the pursuit of a constructive alignment of curriculum and assessment is key in this process (Orsmond & Merry, 2017). The process is meant to be cyclic and guided by the definition and implementation of clear, specific learning outcomes informing all further steps, from the choice of core content to learning, teaching, and assessment methods (Harden, 2007). Assessment will generally serve the purpose of verifying whether medical students have attained – and, to what extent – the desired learning outcomes. Tavakol and Dennick (2011, p. 448) distinguish between two major phases of the assessment process, namely measurement and evaluation. They define measurement as the “process of assigning a numerical value in order to assess the magnitude of the phenomenon being measured”, while drawing upon Tyler's (1949) classic definition, they conceptualize evaluation as “the process of determining to what extent the educational objectives are being realized”. It is at this point that psychometric post-examination analysis comes into place, based on the fundamental aim of evaluating and interpreting the characteristics of a test (e.g. central location and dispersion of scores, internal consistency, construct validity) in relation to the results observed in a group of examinees and the pre-specified learning outcomes of a medical program.

Key Terms in this Chapter

Reliability: The internal consistency of a test and the degree of generalizability of test scores to the ‘universe of admissible observations’.

Facility: The level of facility of a test item or a test overall, i.e. the proportion of test-takers who get an item correct over the whole cohort.

Discrimination: The extent to which a test item or a test overall allows differentiation between top performers and bottom performers.

Objective Structured Clinical Examination: A test of clinical skills that requires learners to rotate through a number of stations, in which each is examined by one or more examiners on a clinical task, sometimes employing either real or simulated patients.

Central Location: Central or typical values measured in a distribution of test scores.

Standard Error of Measurement: The observed degree of measurement variation around a hypothetical ‘true’ score, derived from test results.

Dispersion: Degree of variability measured in a distribution of test scores.

Validity: The extent to which a test effectively measures the construct that it is supposed to measure.

Applied Knowledge Test: A test of basic and functional knowledge, assessing the ‘know’ level of George Miller’s pyramid of competences in medical education.

Complete Chapter List

Search this Book: