Engaging Faculty in Examining the Validity of Locally Developed Performance-Based Assessments

Engaging Faculty in Examining the Validity of Locally Developed Performance-Based Assessments

Kathy J. Bohan, Cynthia A. Conn, Suzanne L. Pieper
Copyright: © 2019 |Pages: 39
DOI: 10.4018/978-1-5225-8353-0.ch004
(Individual Chapters)
No Current Special Offers


Locally developed performance-based assessment instruments must provide evidence of validity and reliability supporting their intended interpretation and use. Accrediting bodies, such as the Council for the Accreditation of Educator Preparation (CAEP), require Educator Preparation Programs (EPPs) to provide this evidence in their accreditation self-study. However, faculty may not have the expertise to conduct an effective examination of their assessments. This chapter describes a process for gathering evidence to build a validity argument for locally developed performance-based assessments. Grounded in measurement theory, the Validity Inquiry Process (VIP) guides faculty through a reflective practice approach towards making defensible claims about the use of results from locally developed performance-based assessments. Using this process, faculty can have greater confidence in using their performance-based assessments to provide feedback to their students, as well as offer assurances of program quality or to identify areas for improvement.
Chapter Preview


Some state departments of education and educator preparation providers (EPPs) in the United States are adopting the use of standardized performance-based assessments, such as edTPA. Other EPPs have discretion with using proprietary or EPP-created (i.e., locally developed) assessments. For example, the Council for the Accreditation of Educator Preparation (CAEP), a national accrediting agency, allows EPPs to use both proprietary and locally developed assessments as evidence of teacher candidate competency in relation to building a case to demonstrate that the EPP is meeting CAEP Standards. Proprietary instruments are “created and/or administered by states, research organizations, or commercial test organizations. Typically information about the design of the assessments, and their validation, scoring, and other attributes, is available” (CAEP, 2018, p. 115). With locally developed assessments, “EPPs take responsibility for design, administration and validation of these assessments” (CAEP, 2018, p. 109). Proprietary and locally developed performance-based assessment instruments offer a more relevant measure of students’ knowledge, skills, and dispositions; and thus, are more representative of real-world teaching compared to multiple choice tests and traditional types of assessments (Darling-Hammond, 2010). Since performance-based assessments are generally embedded into coursework, students are actively engaged in their own learning (Suskie, 2009). However, the development of effective performance-based assessment instruments that can consistently provide useful formative and summative feedback for students and program improvement is difficult.

CAEP also asks EPPs to provide evidence for the validity and consistency of data used in their accreditation self-study. Specifically, the CAEP Handbook Initial-Level Programs 2018 states, “for all EPP-created evidence measures, providers should demonstrate the quality of the data, including their validity and reliability in the context of the CAEP Standards” (p. 9). This relates to CAEP Standard 5.2, “the provider’s quality assurance system relies on relevant, cumulative and actionable measures, and produces empirical evidence that interpretations of data are valid and consistent” (CAEP, 2018, p. 22). CAEP reviews locally developed assessments using the CAEP Evaluation Framework for EPP-Created Assessments (2017).

As performance-based assessments become more prevalent in higher education, challenges have emerged. Faculty are not always familiar with procedures for writing learning outcomes, aligning outcomes through a curriculum map and assessment plan, and developing high quality performance-based assessments. Faculty often question the reliability and validity of assessment results. They also may be unsure of how to use results for program improvement. Additionally, faculty may oppose accountability efforts dictated by administration or external entities, or they do not believe they have the time to prioritize assessment responsibilities in their overall workload (Ewell, 2002; Hutchings, 2011). However, well-designed performance-based assessments provide students with a “transparent” understanding of the purpose, the alignment to outcomes or standards, the expected tasks, and the criteria established for rating of the responses (Winkelmes et al., 2015, p. 5). Green and Hutchings (2018) propose the need to better connect assessment with teaching and learning. These connections can be accomplished through “conversations, collaborations, and habits that support ongoing improvements” (p. 39). This chapter describes a facilitated process and provides specific guidance to support teacher education faculty in this work.

Key Terms in this Chapter

Assessment Plan: In conjunction with the mapping of the program’s curriculum, faculty identify course, or program level assessment types, purposes, and timelines for implementing the plan.

Absolute Agreement: For inter-rater agreement, the percentage of times two or more raters give the exact same score on ratings of performance. To compute, divide the number of ratings that are the same for a criterion by the total number of ratings for the criterion. A result of 75% or higher is considered an acceptable level of agreement. Evidence of absolute agreement is more critical for high-stakes decisions.

Warrant: In a validity argument, the merit of the claim about an assessment that is substantiated by empirical evidence and rationales.

Adjacent Agreement: The percentage of times two or more raters give a score within one level of each other on ratings of performance. Evidence of a combination of absolute and adjacent agreement may be acceptable for low-stakes decisions.

Calibration: The process of establishing agreement between raters of an assessment aligned with either ratings by experts or consensus ratings with two or more raters.

Claim: In a validity argument, an assertion or statement made about the assessment.

Curriculum Map: In conjunction with program assessment planning, the courses and any additional program components are sequenced or outlined.

Validity Argument: A statement of the plausibility of the claim(s) made for the interpretation and use of an assessment as backed up by warrants based on evidence.

Complete Chapter List

Search this Book: