Article Preview
TopIntroduction
Business data is high-value data (Policy paper G8 Open Data Charter and Technical Annex, 2013; Spanish Ministry of Finance et al., 2012) and has a high reuse potential both in national and in cross-border settings. Business data may be of different types, such as basic data about a company (e.g. legal name, address, representative, establishment date and company type), company identifiers and annual balance sheets. Despite its great potential (Deloitte Analytics paper, 2013; Yiu, C., 2012), business data – similar to other types of valuable data – is still locked in business registers and company databases.
In this vein, the European Commission published the directive for interconnection of business registers of the Member States (2012/17/EU) (European Parliament & Council of the European Union, 2012) which calls the European public administrations to open up basic business data, such as the name and legal form of the company, the registered office of the company and the Member State where it is registered, the registration number of the company and information on winding-up or insolvency proceedings.
The real value of Open Government Data (OGD) (Study on Business Models for Linked Open Government Data - BM4LOGD, 2013; Advisory Board on Public Sector Information, 2006; Australian Government & Office of the Australian Information Commissioner, 2011; Pollock R., 2006; Dekkers, M., et al. 2006; Lathrop, D., & Ruma, L., 2010) including business data, is revealed when different datasets are integrated. But data integration is far from straight-forward. Three main problems are commonly come across: different, often incompatible, licenses under which OGD is published, incompatible file formats (Bizer, C., 2009), and semantic interoperability conflicts both at the schema and at the data level (Peristeras, V., et al., 2008).
In this work, we focus on the last two. Basic business data coming from different registers is most likely not to be semantically interoperable and can therefore not be integrated and reused within and across borders. This happens due to the lack of common identifiers and semantics, e.g. vocabularies and controlled lists used for describing the data, the absence of commonly agreed metadata and the multilingualism issue (Ding, L., et al. 2012; Alani, H., et al., 2007). Additionally, different registers may publish their datasets in different, incompatible, file formats, e.g. CSV and PDF (Sunlight Foundation, 2013).
We therefore concentrate on the use of widely-accepted, reusable and extendible vocabularies in publishing basic business data. We employ Linked Data technologies as a means of publishing and retrieving this data in both human- and machine-readable formats (Berners-Lee, T., 2006; Berners-Lee, T., 2009; Heath, T., & Bizer, C., 2011). This way, Linked Data acts as an enabler of data interoperability and integration, and also provides open standards for data identification and representation (Shadbolt, N., et al., 2012, Sheridan, J., & Tennison, J., 2010).