Overview of Big Data in Healthcare

Overview of Big Data in Healthcare

Mohammad Hossein Fazel Zarandi (Amirkabir University of Technology (Tehran Polytechnic), Iran) and Reyhaneh Gamasaee (Amirkabir University of Technology (Tehran Polytechnic), Iran)
DOI: 10.4018/978-1-5225-2515-8.ch016
OnDemand PDF Download:


Big data is a new ubiquitous term for massive data sets having large, more varied and complex structure with the complexities and difficulties of storing, analyzing and visualizing for further processes or results. The use of Big Data in health is a new and exciting field. A wide range of use cases for Big Data and analytics in healthcare will benefit best practice development, outcomes analysis, prediction, and surveillance. Consequently, the aim of this chapter is to provide an overview of Big Data in Healthcare systems including two applications of Big Data analysis in healthcare. The first one is understanding disease outcomes through analyzing Big Data, and the second one is the application of Big Data in genetics, biological, and molecular fields. Moreover, characteristics and challenges of healthcare Big Data analysis as well as technologies and software used for Big Data analysis are reviewed.
Chapter Preview


The last decade has experienced significant advances in the amount of data which is generally generated and stored in almost everyday activities, as well as the capability of utilizing technology to analyze and comprehend that data. The massive amount of data generated in healthcare systems is identified as Big Data and the ability to analyze that data is named Big Data analytics. Big Data supports businesses in various industries to become more productive and efficient. One of those industries in which Big Data is very useful is healthcare. Although health data is not always Big Data, there are some types of health data sets which are categorized as Big Data. Those data sets include massive data obtained from high-volume laboratory information system, electronic medical record (EMR), biomedical and biometrics data, test usage data, and gene expression data. In addition to profit boosting and decreasing squandered overhead, healthcare Big Data is being used to predict epidemics, cure disease, improve quality of life and avoid preventable deaths. Both of the existing health data and the behavioral data could significantly increase opportunities to forecast long-term health conditions. Big Data also provides better diagnostics techniques, disease prevention, and enhance access and decrease healthcare costs.

The applications of Big Data are not limited to proposing more efficient healthcare systems. Big Data and analytics are applied in the healthcare industry showing very encouraging results. Integration and digitizing of Big Data as well as impressively using it, provide many advantages for physician offices, hospitals, and care organizations. Those advantages include disease diagnosis at its earlier stages providing the opportunity of easily treatment, controlling health conditions of individuals and groups, and detecting health care fraud promptly and effectively. Big Data and analytics support healthcare systems to achieve their goals. Two main categories for the “Big Data in Healthcare” have been reviewed in literature as follows.

The first category includes three important issues [IMIA]: (I) Big Data extracted from the health system such as health and medication history, lab reports, and pathology results, where these analyzes are aimed at improving physicians understanding of disease outcomes and their risk factors, decreasing health system costs, and enhancing its efficiency; (II) Massive data sets of biological and molecular fields are known as “Omics” data, genomics, proteomics, microbiomics, and metabolomics, where the goal of analyzing these data sets is to comprehend the mechanisms of diseases and expedite the medical treatments; (III) Data collected from social media along with the signs and behaviors of people who use Internet and software applications, for improving their health conditions (Hansen et al., 2014).

The second category includes five important issues according to [Big Data in healthcare]:

(I) Big Data which is used to manage healthcare delivery costs; (II) Big Data which is used for clinical decision making; (III) Big Data for extracting clinical information. (IV) Big Data for demographical analysis by investigating behavior and consumer category; (V) Big Data which is used as support information; this type of Big Data application is not categorized in one of the above categories (Groves et al., 2013; Hermon & Williams, 2014). Several researchers have studied those categories related to Big Data in healthcare.

As the first example, Archenaa and Anita (2015) reviewed the role of government in increasing quality of healthcare systems by reducing its costs which lies in the first category. This cost reduction is due to analyzing clinical Big Data of healthcare systems. For instance, diseases are detected at the earlier stages and the best medicine is prescribed by analyzing clinical Big Data and using genetic makeups. This leads to reduction of readmission rates, and as a result, it causes cost reduction for patients (Archenaa & Anita, 2015). Moreover, Hay et al. (2013) used statistical techniques such as Boosted Regression Tree method to predict the risk of disease outbreak in different geographical locations. That study is classified in the first category of Big Data analysis in healthcare.

Key Terms in this Chapter

Healthcare Systems: The collection of people, resources, and organizations whose task is to deliver services related to health of patients to them.

Grid Computing Systems: A distributed computing method used for sharing resources collaboratively.

Graphical Processing Units: Computing method which changes memory for rapidly creating images in a device used to show them.

Distributed Systems: Network of independent computers whose users utilize them as a single system connected with a middleware service.

Healthcare Analysis: Analyzing healthcare data collected from different health resources including claims and cost data, clinical data, research and development data, and patient behavior data ( Fan et al., 2014 ).

MapReduce: A program which is used for processing Big Data by utilizing a distributed model on a cluster.

Big Data: The massive amount of data which is identified by four characteristics including high volume, velocity, variety, and veracity.

Complete Chapter List

Search this Book: