In this chapter, the authors describe the development and application of language technology for intelligent information access to the content of digitized cultural heritage collections in the form of Swedish classical literary works. This technology offers sophisticated and flexible support functions to literary scholars and researchers. The authors focus on one kind of text processing technology (named entity recognition) and one research field (literary onomastics), but try to argue that the techniques involved are quite general and can be further developed in a number of directions. This way, the authors aim at supporting the users of digitized literature collections with tools that enable semantic search, browsing and indexing of texts. In this sense, the authors offer new ways for exploring the large volumes of literary texts being made available through national cultural heritage digitization projects. Language technology; Computational linguistics; Natural language processing; Literary onomastics; Named entity recognition; Corpus linguistics; Corpus annotation; Digital resources; Text technology; Cultural heritage
Defining The Area
Literary onomastics is a field of inquiry where literature is seen through the names appearing in literary texts. Specific topics may comprise studies of the etymology or symbolism of names, those of how fictional names make the transition into the real world, or of the use and function of names and naming in the works of an individual author, a literary school, genre, or period (Alvarez-Altman & Burelbach, 1987; Svedjedal, 2004; van Dalen-Oskam & van Zundert, 2004; van Dalen-Oskam, 2005).
Literary onomastics is not our field of expertise, but from familiarizing ourselves with the literature in this field we have concluded it is a well-established and lively area of investigation which presupposes that names can be located and recognized with minimum of effort. It is also clear that analysing the use of names requires access to a broad basis of comparison. Ideally, information both within and outside a genre should be readily available, for instance, when studying the use and function of names in late nineteenth-century and early twentieth-century crime fiction. Obviously, it is possible to mark up digital texts manually (as described by Flanders, Bauman, Caton, & Cournane, 1998), but this is a very time-consuming, and consequently costly, endeavor. This is where language technology enters the picture in the form of named entity recognition technology.