Most people think of a library as the little brick building in the heart of their community or the big brick building in the center of a college campus. However, these notions greatly oversimplify the world of libraries. Most large commercial organizations have dedicated in-house library operations, as do schools; nongovernmental organizations; and local, state, and federal governments. With the increasing use of the World Wide Web, digital libraries have burgeoned, serving a huge variety of different user audiences. With this expanded view of libraries, two key insights arise. First, libraries are typically embedded within larger institutions. Corporate libraries serve their corporations, academic libraries serve their universities, and public libraries serve taxpaying communities who elect overseeing representatives. Second, libraries play a pivotal role within their institutions as repositories and providers of information resources. In the provider role, libraries represent in microcosm the intellectual and learning activities of the people who comprise the institution. This fact provides the basis for the strategic importance of library data mining: By ascertaining what users are seeking, bibliomining can reveal insights that have meaning in the context of the library’s host institution. Use of data mining to examine library data might be aptly termed bibliomining. With widespread adoption of computerized catalogs and search facilities over the past quarter century, library and information scientists have often used bibliometric methods (e.g., the discovery of patterns in authorship and citation within a field) to explore patterns in bibliographic information. During the same period, various researchers have developed and tested data-mining techniques, which are advanced statistical and visualization methods to locate nontrivial patterns in large datasets. Bibliomining refers to the use of these bibliometric and data-mining techniques to explore the enormous quantities of data generated by the typical automated library.
Forward-thinking authors in the field of library science began to explore sophisticated uses of library data some years before the concept of data mining became popularized. Nutter (1987) explored library data sources to support decision making but lamented that “the ability to collect, organize, and manipulate data far outstrips the ability to interpret and to apply them” (p. 143). Johnston and Weckert (1990) developed a data-driven expert system to help select library materials, and Vizine-Goetz, Weibel, and Oskins (1990) developed a system for automated cataloging based on book titles (see also Morris, 1992, and Aluri & Riggs, 1990). A special section of Library Administration and Management, “Mining your automated system,” included articles on extracting data to support system management decisions (Mancini, 1996), extracting frequencies to assist in collection decision making (Atkins, 1996), and examining transaction logs to support collection management (Peters, 1996).
More recently, Banerjeree (1998) focused on describing how data mining works and how to use it to provide better access to the collection. Guenther (2000) discussed data sources and bibliomining applications but focused on the problems with heterogeneous data formats. Doszkocs (2000) discussed the potential for applying neural networks to library data to uncover possible associations between documents, indexing terms, classification codes, and queries. Liddy (2000) combined natural language processing with text mining to discover information in digital library collections. Lawrence, Giles, and Bollacker (1999) created a system to retrieve and index citations from works in digital libraries. Gutwin, Paynter, Witten, Nevill-Manning, and Frank (1999) used text mining to support resource discovery.