Using Machine Learning to Locate Evidence More Efficiently: New Roles for Academic Research Librarians

Using Machine Learning to Locate Evidence More Efficiently: New Roles for Academic Research Librarians

Michelle A. Cawley
DOI: 10.4018/978-1-7998-9702-6.ch008
Chapter PDF Download
Open access chapters are freely available for download


Evidence that machine learning can assist article selection and minimize manual screening burden for scholarly research has been documented in the peer-reviewed literature for more than 20 years. Despite the robust evidence and continual technological advances, uptake has been slow among research teams. This chapter discusses the benefits of using machine learning (ML) and other automation tools on bibliographic data and argues that academic librarians are well-positioned to partner with research teams around this application of ML. An overview of the automation approaches used at UNC's Health Sciences Library (HSL) is discussed along with detailed accounts of multiple success stories of when HSL librarians partnered with research teams to locate evidence more efficiently. Finally, a discussion of likely barriers and possible solutions to increase uptake of this technology among academic librarians is provided.
Chapter Preview


AI-enabled approaches, including ML, have been developed, tested, and validated to minimize manual screening of search results for large, complex research questions. This application of ML has been documented in the peer-reviewed literature across multiple domains for several decades (Aphinyanaphongs et al., 2005; Bannach-Brown et al., 2019; Bekhuis & Demner-Fushman, 2012; Cohen et al., 2006; Mostafa & Lam, 2000; O'Mara-Eves et al., 2015; Shemilt et al., 2014; Thomas et al., 2021; Varghese et al., 2018; Wallace et al., 2010), yet application of this technology by research libraries has been nearly non-existent. U.S. Federal agencies, including the U.S. Environmental Protection Agency (U.S. EPA), have successfully applied ML-enabled approaches to their large-scale risk assessments for several years (Cawley et al., 2020), which affords the opportunity to locate relevant evidence in a large set of search results without relying entirely on keywords.

Applications of AI-enabled technology to bibliographic data may include:

  • Clustering or unsupervised learning to assist with search strategy development and to identify a pocket of search results within a large result set to then review manually.

  • Supervised clustering, a form of semi-supervised learning, to prioritize literature to screen manually with the ability to predict recall.

  • Machine learning or supervised learning to predict the probability of an individual article being relevant to a particular research question.

Key Terms in this Chapter

Recall: A measure of how many relevant results were captured out of all possible relevant results. High recall is necessary in comprehensive searches such as those conducted for systematic reviews.

Artificial Intelligence (AI): A computer system able to perform tasks that normally require human intelligence.

Supervised Learning: Type of machine learning that uses a large set of training data (i.e., labeled data) to build the model. Naïve Bayes, k -nearest neighbors, and support vector machines are examples of supervised learning algorithms. Generally referred to as machine learning.

Precision: A measure of how many relevant records were found in a search relative to false positives. High precision searches have a higher percentage of relevant results.

Unsupervised Learning: Type of machine learning, such as clustering, that does not require training data; k -means, nonnegative matrix factorization, and latent Dirichlet allocation are examples of clustering algorithms.

Semi-Supervised Learning: Type of machine learning that uses a small amount of labeled data to train the model.

Machine Learning: A branch of AI that is generally defined as using machines to complete human tasks; often used interchangeably with the term AI.

Complete Chapter List

Search this Book: