Developing Geodetic Search Interface Through Auto-Generation of Geographic Name Authority Datasets

Developing Geodetic Search Interface Through Auto-Generation of Geographic Name Authority Datasets

Parthasarathi Mukhopadhyay, Mondrita Mukhopadhyay
Copyright: © 2022 |Pages: 23
DOI: 10.4018/978-1-7998-8942-7.ch004
(Individual Chapters)
No Current Special Offers


This research study is an attempt to develop a MARC-formatted authority dataset for Indian geo-administrative units given the inadequate coverage of Indian place names in global authority datasets. It starts with an authenticated place names file in CSV format and applies data wrangling tools and techniques to fetch geospatial data and other related datasets from open access data sources to develop a geographic name authority file for Indian place names with geocoordinate data values. Later, this research also demonstrates how that authority dataset can be implemented in an open-source ILS and how retrieval features of a library discovery system can be enhanced through a geodetic search interface by utilizing that authority dataset. The entire methodologies are based on open data, open-source software, and open standards.
Chapter Preview


Academic libraries of developing countries are characterized by retrieval silos, for example, there are OPACs for books and other macro documents, retrieval interfaces for faculty publications and theses in institutional digital repositories, and there are different retrieval silos for subscribed resources from different publishers, and open access resources distributed throughout the world. This situation is compelling users of academic libraries for running from pillar to post to retrieve required resources, and for going through a steep learning curve of knowing different retrieval techniques. Moreover, the use of different metadata schemas (like MARC 21, Dublin Core and other domain-specific schemas), and the use of different text retrieval engines by library software are making retrieval more troublesome for the end users of a typical academic library of the developing bloc of the world. However, the multiple retrieval silos in an academic library of a developing country are gradually being replaced by a single-window search system with the advent of open-source library discovery systems (Adeyemi & Omopupa, 2020; Anuradha, 2018; Creaser et al., 2013). Library OPACs, particularly in academic libraries of India, are increasingly superseded by library discovery systems to manage a central index for both internal (resources organized locally and retrieved globally), and external (subscribed and open access resources not typically part of a local retrieval system) knowledge objects. These discovery systems, apart from serving as a one-stop access to all library materials, may provide many additional utilities like full-text indexing, personalized search interface, deduplication and FRBRization of resources, third-party authority search integration, query forwarding mechanism, and browsing by many selected keys (like call number, document types, license types etc). But, the most striking feature of a library discovery system is the possibilities of integration of information visualization techniques in retrieval interface in comparison to the textual-only search system of a library. One of such visual search interfaces is geodetic search. A geodetic search or GIS-enabled search (or Geographic search) is a sub-domain of Geographic Information System (GIS), and is essentially a kind of land-information system. In such systems locations are determined with the help of a mathematical framework for spatial referencing of all land data to recognizable positions on the Earth’s surface (Abresch et al., 2008; Jones & Purves, 2008; Seeger, 1999). Geodetic search mechanism helps users go beyond text-only search and may prove an effective process in the retrieval of documents and datasets where places or geographic names are the main foci e.g. information resources in the domain of geography, mineralogy, geology, travel guides etc. A geodetic search framework in the bibliographic domain need to support additional elements like – a) standard mechanisms to describe and encode geocoordinate values (latitude/longitude or boundary values) of the places dealt with in the document; b) mechanisms for integrating geocoordinate values; c) integrating geocoordinate values with the map service in use; and d) exposing geodetic datasets in the search interface to filter search results (Mukhopadhyay & Mukhopadhyay, 2018). It means the development of a geodetic search, in addition to the prevalent textual library search, requires - a) bibliographic records with properly encoded geocoordinate values for places; b) a framework to index resources with geocoordinate values; and c) mechanisms to extend it to end-user interface to support interactive geographic search. This study sets these three requirements as objectives, and aims to develop - a) a geographic name authority dataset for India (geo-administrative divisions) with required geocoordinate values encoded as per the specification of MARC 21 format for authority data by using data carpentry methods; and b) a geodetic search interface in an open-source library discovery system by utilizing that dataset.

Key Terms in this Chapter

Koha: The first, and the most feature-rich open-source integrated library system. It is presently in use by libraries of different types and sizes across the globe. Koha supports many globally agreed-upon domain-specific open standards in the bibliographic data universe.

Library Carpentry: Applications of data carpentry tools, techniques, and principles in managing bibliographic data such as data structuring, data wrangling, data reconciliation, data visualization, and so on.

Geodetic Search: In the context of a library retrieval system, it refers to a map-based search interface that allows users to pinpoint geographical locations in an interactive map display during retrieval of resources. A major prerequisite of geodetic search is the presence of geocoordinate data values in bibliographic records for the places that are dealt with in the records.

REST/API: An API (Application Programming Interface) is a mechanism or a set of rules that guides intercommunication between application or devices. REST (Representational State Transfer) is an architectural style for modeling API based operations for creating, reading, updating, and deleting records.

VuFind: An open-source library discovery system (with considerable user base world-wide) that apart from supporting many advanced features also extends facilities to develop geodetic search features for end-users.

Data Wrangling: It refers to the processes of converting messy data into clean data (also known as data remediation, data cleaning, or data munging). The steps and processes of data wrangling may vary from project to project, but it centres around a set of four basic activities - cleaning, enriching, integration, and transforming of raw data into value-added datasets.

JSON: It stands for JavaScript Object Notation. It is a light-weight data interchange format that supports on-the-fly textual data transfer from one system to another in Unicode-compliant environment.

Bounding Box: Georeferencing of an area (also expressed as bbox data) with two longitudes (westernmost longitude and easternmost longitude) and two latitudes (northernmost latitude and southernmost latitude). MARC 21 formats for bibliographic data and authority data supports encoding of bbox data in tag 034, and Dublin Core accommodates it in DC.Coverage.spatial metadata element.

OpenRefine: An open-source data carpentry software (previously Google Refine) available under BSD license for different platforms (Windows, Linux, Mac).

GREL: General Refine Expression Language (previously known as Google Refine Expression Language) is a simple scripting language to support data organization, data transformation, and data queries in a data carpentry software (e.g., OpenRefine).

Complete Chapter List

Search this Book: