Human Evaluation in Large Language Model Testing: Assessing the Quality of AI Model Output

Human Evaluation in Large Language Model Testing: Assessing the Quality of AI Model Output

H. Dharmendra (Department of Commerce, Christ University, India), G. Raghunandan (Department of Commerce, Christ University, India), A. N. Sindhu (Mount Carmel College (Autonomous), India), C. Samanvitha (Department of Commerce, Christ University, India), N. Nethravathi (Acharya Institute of Technology, India), and Dinesh Elango (School of Business, American University of Phnom Penh, Philippines)
Copyright: © 2025 |Pages: 22
DOI: 10.4018/979-8-3693-5380-6.ch022
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

LLMs excel in language tasks, but testing them effectively is tricky. Automated metrics help, but human evaluation is crucial for aspects like clarity, relevance, and ethics. This chapter explores methods and challenges of human LLM testing, including factors like fairness and user experience. The authors discuss a sample evaluation method and highlight ongoing efforts for robust evaluation to ensure responsible LLM development. Finally, they explore the use of LLMs in cybersecurity, showcasing their potential and challenges.
Chapter Preview

Complete Chapter List

Search this Book:
Reset