Opportunities and Challenges of Big Data in Public Sector

Opportunities and Challenges of Big Data in Public Sector

Anil Aggarwal (University of Baltimore, USA)
Copyright: © 2016 |Pages: 13
DOI: 10.4018/978-1-4666-9649-5.ch016


Data has always been the backbone of modern society. It is generated by individuals, businesses and governments. It is used in many citizen-centric applications, including weather forecasts, controlling diseases, monitoring undesirables etc. What is changing is the source of data. Advances in technology are allowing data to be generated from any devise at any place in any form. The challenge is to “understand”, “manage” and make use of this data. It is well known that government generates unprecedented amount of data (ex: US census), the question remains: can this data be combined with technology generated data to make it useful for societal benefit. Governments and non-profits, however, work across borders making data access and integration challenging. Rules, customs and politics must be followed while sharing data across borders. Despite these challenges, big data application in public sector are beginning to emerge. This chapter discusses areas of government applications and also discusses challenges of developing such systems.
Chapter Preview

Big Data Development Process

Big data system development process is similar to other systems, with the difference being the scale and nature of computing. The following steps are necessary for a project of this magnitude to succeed:

  • Collect

  • Manage/architecture

  • Process

  • Act

Each of these steps has its challenges. By some estimates, 90% of the data generated by devices is not useful. Therefore, the first step is to identify relevant data which requires functional expertise. Managing data would require data integration of structured and unstructured data. Several non-relational databases (key-value, key-document, graph) are emerging that can be used. However, the processing of this structure can be quite challenging. Typical processing is based on massive parallel processing that typically uses a HADOOP like structure. This requires creating clusters and replicating them. The task is to create the appropriate cluster to balance load. Once data is stored and defined it needs to be processed. Traditional languages like SQL are not appropriate. The challenge is to develop new languages that can help in processing. Once data is processed, the next step is to understand the output and apply it.

The next section describes opportunities in the public sector.

Complete Chapter List

Search this Book: