Massive Digital Libraries (MDLs)

Massive Digital Libraries (MDLs)

Andrew Philip Weiss (California State University – Northridge, USA)
Copyright: © 2018 |Pages: 11
DOI: 10.4018/978-1-5225-2255-3.ch454
OnDemand PDF Download:
No Current Special Offers


Massive Digital Library (MDL) is a term coined to define a class of digital libraries gathering mass-digitized print books and monographs, which rival the size of brick-and-mortar libraries. Specific examples of MDLs, including Google Books, HathiTrust, DPLA, Internet Archive, et al., are presented. The issues raised by MDLs include the following: mass-aggregation of digital content and the ability to maintain source-material accuracy and veracity; copyright, Fair Use and the mass-digitization of materials not in the Public Domain; and disparities in the level of diversity, especially with regard to Spanish-language, Japanese-language, and Hawaii-Pacific materials. Finally, the impact of MDLs on Digital Humanities, especially with regard to the Google Books digital corpus and the Google Ngram Viewer, will be investigated.
Chapter Preview


To provide a clearer framework for analyzing the growth of digital libraries, Weiss and James have proposed the term Massive Digital Libraries (MDLs), which is based on the size, scope and increasing scalability of digitized book collections. Such MDLs rival the size, breadth, and depth of a physical library’s print holdings, and often reach a scale seen among library consortia collections. (Weiss and James, 2013a, 2013b, 2014, 2015; Weiss, 2016)

The root of the concept begins in late 2004 when Google made its “resounding announcement” to digitize millions of the world's books—including works still under copyright protection—and to place them all online. (Jeanneney, 2005) Jean-Noel Jeanneney, head of Bibliothèque nationale de France at the time, interpreted Google’s planned project as a wake-up call for European countries. Failure to catch up to the American company, he argued, would result in significant problems for non-American organizations.

Twelve years on, it is hard to imagine that Google’s desire to create an online digital library on such a large scale should have come as such a shock. Yet at the time Google caused significant hand-wringing and soul-searching among institutions traditionally charged with producing or preserving cultural artifacts. (Jeanneney; Venkatraman, 2009) In retrospect, the controversy seems almost quaint in comparison to the current crop of issues – especially the current “disruptions” of established economic models by Uber/Lyft, Facebook, Twitter, Spotify, Snapchat, e-readers, et al. and the encroachments on civil rights via electronic digital surveillance and other intrusions of privacy.

A number of mass-digitization projects have grown in the wake of Google’s announcement, including the HathiTrust, Internet Archive, Digital Public Library of America (DPLA), California Digital Library, Texas Digital Library, Gallica, and Europeana. These projects each transcend their roots as localized digital libraries and have simultaneously adapted to and altered the digital landscape. These various MDLs have allowed for and contributed to the ascendancy of our current mass-digitization online culture.

This chapter will describe the characteristics of Massive Digital Libraries (MDLs) and outline their impact upon contemporary information science issues, especially with regard to digital collection metadata, copyright and the diversity of the source collections. Traditionally, libraries have been created to serve particular communities defined by geography, intellectual discipline, or specific end users. However, MDLs in their current trajectories promise–for better and for worse—to transcend such limits.

Key Terms in this Chapter

Massive Digital Libraries (MDLs): Term adopted to describe the mass-digitization of printed books and the mass-aggregation of their metadata into online, full-text-searchable digital collections; some component of open access and use of public domain works defines their collections; diffuse and diverse target end-user groups are also characteristic of MDLs.

Fair Use: Defined by some as the breathing room to allow for the freedom of expression, it is a strong limitation of the extent of copyright law. It is determined by examining four factors: 1) the purpose and character of the use; 2) the nature of the copyrighted work; 3) the amount and substantiality of the portion taken; and 4) the effect of the use upon the potential market.

Digital Humanities (DH): A branch of the humanities incorporating digital search, digitized texts, encoding, and other strategies to move previously print-bound materials and scholarship into computer science related analysis.

Mass-Digitization: The practice of quickly and thoroughly digitizing items on a large scale. In the case of MDLs these efforts involve scanning millions of books with multiple institutional partners across national boundaries.

Digital Corpus: The full-text of millions of books digitized by the MDLs. These provide opportunities for scholars to examine the frequency of terms as they appear in the print corpus.

Public Domain: Works that are no longer under copyright protection. In the United States these works tend to be those published before January 1, 1923 and unpublished works created prior to 1893. U.S. Government publications are public domain.

Ngram Viewer: A tool developed by Google to aid in the visualization of the digital corpus of books. It plots frequencies of terms across a graph, helping users understand how common a word or concept was in the corpus at a specific moment in history.

Complete Chapter List

Search this Book: