The Importance of Authoritative URI Design Schemes for Open Government Data

Alexei Bulazel (Tetherless World Constellation, Rensselaer Polytechnic Institute, Troy, NY, USA), Dominic DiFranzo (Tetherless World Constellation, Rensselaer Polytechnic Institute, Troy, NY, USA), John S. Erickson (Tetherless World Constellation, Rensselaer Polytechnic Institute, Troy, NY, USA) and James A. Hendler (Tetherless World Constellation, Rensselaer Polytechnic Institute, Troy, NY, USA)
DOI: 10.4018/IJPADA.2016040101
A major challenge when working with open government data is managing, connecting, and understanding the links between references to entities found across multiple datasets when these datasets use different vocabularies to refer to identical entities (i.e.: one dataset may refer to Microsoft as “Microsoft”, another may refer to the company by its SEC filing number as “0000789019”, and a third may use its stock ticker “MSFT”.) In this paper the authors propose a naming scheme based on Web URLs that enables unambiguous naming and linking of datasets and, more importantly, data elements, across the Web. They further describe their ongoing work to demonstrate the implementation and authoritative management of such schemes through a class of web service they refer to as the “instance hub”. When working with linked government data, provided either directly from governments via open government programs or through other sources, the issue of resolving inconsistencies in naming schemes is particularly important, as various agencies have disparate conventions for referring to the same concepts and entities. Using linked data technologies the authors have created instance hubs to assist in the management and linking of entity references for collections of categorically and hierarchically related entities. Instance hubs are of particular interest to governments engaged in the publication of linked open government data, as they can help data consumers make better sense of published data and can provide a starting point for development of linked data applications. In this paper the authors present their findings from the ongoing development of a prototype instance hub at the Tetherless World Constellation at Rensselaer Polytechnic Institute (TWC RPI). The TWC RPI Instance Hub enables experimentation and verification of proposed URI design schemes for open government data, especially those developed at TWC in collaboration with the United States program. They discuss core principles of the TWC RPI Instance Hub design and implementation, and summarize how they have used their instance hub to demonstrate the possibilities for authoritative entity references across a number of heterogeneous categories commonly found in open government data, including countries, federal agencies, states, counties, crops, and toxic chemicals.
The Importance Of Unambiguous Naming

An emerging issue for providers and consumers of open government data is the creation and interconnection of names for common entities referenced in datasets. For example, one dataset may refer to the state of New York as “NY” while another may use the Federal Information Processing Standards (FIPS) code of “36” in reference to the state The lack of common naming schemes across datasets hinders developers attempting to build applications or to otherwise make insights through data “mashups,” wherein datasets are combined and analyzed together to show a broader context.

The process of finding common references to entities across multiple datasets is further complicated by unintentional naming clashes that may arise. For example “NY” might refer either to the city or state of New York. It would be wrong to make inferences about the entire state from data about the city, and visa-versa. The name “NY” is not sufficiently unique and does not provide us with enough information to accurately link multiple datasets together. One solution would be to name these common concepts using Uniform Resource Identifiers (URIs), the fundamental identification scheme for resources on the Web.3

