Abstract
The clinical utility of a measure involves its ability to support a wide range of decisions that enhance its pragmatism and use. Although several statistics are part of this feature, one centerpiece of this concept is the ability of an instrument to provide cutoff scores that can accurately discriminate between groups that consist of patients and non-patients. This latter aspect leads to such concepts as sensitivity, specificity, positive and negative predictive values and likelihood ratios, accuracy, and receiver operating characteristic curves. This chapter addresses these topics from two perspectives. First, because these features of clinical utility are encompassed as a subfield of statistical decision theory, the authors provide a historical review that links null hypothesis significance testing (NHST), signal detection theory (SDT), and psychological testing. Second, a real-data approach is used to demonstrate these concepts. Additionally, a free software program was developed to present these concepts.
TopIntroduction
Psychometric measures are widely used in neuropsychological assessments. These types of tests are generally designed to capture psychological phenomena in a standardized way, whereby respondents can be asked to solve challenging or puzzling questions, read sentences and indicate their agreement, memorize words or numbers within a certain period of time, or maintain attention to a particular stimulus while inhibiting distractors. These examples also highlight part of the historical evolution of psychological measures, which began with intelligence tests, followed by personality assessments, attitude scales, and other complex cognitive and executive tasks.
All responses that are obtained through these instruments are represented by numbers and are assumed to be expressions of subjacent mental processes, but they are unsuitable for direct assessment. These responses are observable and often called “observed” or “manifested” variables. Conversely, mental processes are only inferred from these manifest variables rather than directly observed. Thus, they are often called “latent variables.” Therefore, the basis of these classes of manifested variables consists of the assumption that they are caused by latent mental processes. Thus, any variation in the response type or level is assumed to act like a dependent variable that is influenced by an independent variable.
For example, in a personality assessment, in which one reply to an item like “I am the life of the party,” “I enjoy being with people”, or “I can make new friends easily” with an endorsement (e.g., I strongly agree), the most probable conclusion is that the respondent is an extravert rather than an introvert. The same reasoning applies when concluding that a depressed participant tends to agree with such items as “I feel helpless sometimes” and “Little excites me these days” (Retzlaff et al., 2002). Several psychometric analyses were developed to support this claim, and further technical reports can be found elsewhere (Anunciação, 2018).
The process of administering psychological tests follows an objective approach, including a fixed or structured procedure, which minimizes subjectivity or bias on the part of the professional who administers the measure. This set of conditions seeks to guarantee that the results obtained for the participant will be the same regardless of who administers the test. Numerical results that are obtained through a testing procedure consequently materialize the relationship between the unobserved mental process and the observed behavior.
The implementation of psychological tests is considered one of the most reliable modes of evaluation, but some theoretical and pragmatic challenges tend to arise. Although the quantities that are derived from psychometric tests are building blocks of theoretical roots and empirical data, they are not a standard unit of measurement, and these results have no meaning per se. For example, mass (represented by kg-kilograms) and time (represented by s-seconds) are measures that remain the same whenever, wherever, and by whomever they are used. In contrast, the results of psychological tests have limited use and consequently limited importance outside their own systems.
Therefore, the interpretation and consequent use of these results depend on a frame of reference in which additional measures or information are available. This particular need is responsible for strengthening the link between psychometrics and statistics. Thus, the field of statistics provides the mechanics by which raw scores that are developed by the test scoring system/procedures can be transformed into a derived score that allows descriptions, comparisons, and a clearer interpretation of performance on the test. Such statistical measures as means, standard deviations, and percentiles are widely used in psychological testing because they produce vital information that merges the group level and/or individual level.
However, based on multiple and different scenarios in which psychological testing produces useful and informative results, a growing number of clinicians use psychometric tools or measures during their daily routine to help clarify and provide further information to diagnose a particular clinical condition of interest, disease, or mental disorder. This clinical use comes with the need to use or even build new statistical measures and models that can explore, describe, and differentiate typical and clinical groups and/or produce scores that can be used in this similar vein.
Key Terms in this Chapter
Negative Predictive Value (NPV): A percentage of individuals with a negative result who actually do not have the clinical condition of interest. PPV and NPV depend on the prevalence of a particular condition or disease in the population.
Specificity: The probability that the test result is negative for those participants with no clinical condition. A test that has high specificity will have few false positives results, but increases the false negative rate. Specificity is also known as True Negative Rate (TNR). Sensitivity and specificity are inversely related.
Liberal Decision Criterion: A cutoff point that maximizes the sensitivity of a test. This criterion is also named lenient.
Clinical Test: A procedure performed to detect the presence of a specific clinical condition or disease. Screening tests are a particular case of tests that offer rapid results. These tests should be easily accessible to target groups, easy to administer, and not expensive. Tests with high sensitivity are often used to screen for disease, as tests with low sensitivity fail to identify many patients with a particular clinical condition or disease.
ROC Curve: A descriptive graph that shows the relationship between sensitivity (TPR) (y-axis) and FPR (1-specificity, x-axis) for every possible cutoff point of a test.
Conservative Decision Criterion: A cutoff point that maximizes the specificity of a test. This criterion is also named strict.
Sensitivity: The probability that the test result is positive for those participants with a clinical condition or a disease. High sensitivity tests produce few false negative results, but also increase the false positive rate. Sensitivity is also known as True Positive Rate (TPR).
Confusion Matrix: A table in a specific format that is used in classification analysis. In this table, two (or more) variables are jointly analyzed. Rows are used to present the test result whereas columns present the clinical condition status or a gold standard result.
Cutoff Point: A specific point in which all values below/above this threshold will be set differently, enabling the discrimination between contrasting groups.
Positive Predictive Value (PPV): A percentage of individuals with a positive result who actually have the clinical condition of interest. Therefore, is a measure of how believable a positive test result.