Leveraging Applications of Data Mining in Healthcare Using Big Data Analytics: An Overview

Leveraging Applications of Data Mining in Healthcare Using Big Data Analytics: An Overview

Mohammad Hossein Tekieh, Bijan Raahemi, Eric I. Benchimol
DOI: 10.4018/978-1-5225-2515-8.ch015
(Individual Chapters)
No Current Special Offers


Big data analytics has been introduced as a set of scalable, distributed algorithms optimized for analysis of massive data in parallel. There are many prospective applications of data mining in healthcare. In this chapter, the authors investigate whether health data exhibits characteristics of big data, and accordingly, whether big data analytics can leverage the data mining applications in healthcare. To answer this interesting question, potential applications are divided into four categories, and each category into sub-categories in a tree structure. The available types of health data are specified, with a discussion of the applicable dimensions of big data for each sub-category. The authors conclude that big data analytics can provide more advantages for the quality of analysis in particular categories of applications of data mining in healthcare, while having less efficacy for other categories.
Chapter Preview


While collecting, storing, and managing large amounts of digitized data are now technically feasible and affordable, only some useful information is still extracted from a small portion of the gathered data. To discover more information, strong analytical tools are needed for processing and analyzing the collected data, currently on the order of petabytes (Han, Kamber, & Pei, 2011). Data analysis algorithms have also been developed to be able to handle big data collections. In addition, scalable and flexible software technologies have been introduced and are being improved to provide a suitable ecosystem to implement big data algorithms. The package comprising all these new components such as the technologies, algorithms, and methods is known as “big data analytics”.

Data mining, as a strong analytical tool, has been applied to large amounts of digitized data collected in various fields – including healthcare – over the past decades. With the introduction of big data analytics, researchers are working to enhance data mining techniques to make the algorithms more scalable and faster. However, whether this enhancement resolves the existing limitations of data analysis studies in the field of healthcare remains unknown. It is necessary to first determine if all “health data” fit into the definition of “big data”, before claiming big data analytics as the solution to overcome the limitations of health data analysis.

In this chapter, the authors investigate whether applications of data mining in healthcare can be leveraged by big data analytics by answering the following questions:

  • 1.

    What are the applications of data mining in healthcare?

  • 2.

    What are the different types of health data?

  • 3.

    What are the characteristics of “big data”?

  • 4.

    Is health data a form of “big data”?

  • 5.

    Are all types of health data relevant in each application of data mining in healthcare?

  • 6.

    To what extent do big data analytics enhance the quality of research in each application of data mining in healthcare?

In the introductory section, the application of data mining in healthcare is summarized, and the different types of health data and dimensions of big data are reviewed. Next, the methodology of achieving the above objective is presented and discussed in detail. Finally, the chapter will be concluded by summarizing the answers to the research questions.



Whether healthcare data can be considered “big data” is controversial. The phrase “health data” does not refer to a specific type or source of data. Some health data is gathered for specific research studies, but the majority is collected routinely without having pre-defined research questions in mind (Benchimol et al., 2015). There are many types of health data being collected routinely using various approaches, which will be presented later in this section. Often, the only characteristic they share is being related to the healthcare of patients. Each data type has its own characteristics and is collected for a specific reason, such as administration of a healthcare system. Since the majority of health data is not originally collected for research studies, they cannot necessarily be applicable for all types of data analysis studies. However, these health data instances can be valuable sources of information, and to which descriptive and predictive analytical tools such as data mining techniques can be applied to conduct novel analyses.

Key Terms in this Chapter

Administrative Data: The data collected routinely due to operating a healthcare administration system, such as hospital’s billing system and insurance claims. This data is generated automatically by the associated system.

Clinical Data: The data collected routinely due to providing healthcare clinical services, usually gathered in Electronic Medical Records (EMR) systems. This data is generated by clinicians at bed-side, in addition to laboratory technicians as matter of conducting lab tests.

Public Health: Analyzing the practice of medicine associated to diseases in a population leading to set of instructions to raise public awareness on different health issues.

Population Health: Investigating the cause, trends, and patterns of diseases in a population with the aim of improving the health of an entire human population.

Big Data Analytics: The platform to process and analyze scalable, fast-streaming, and multi-formatted data using MapReduce techniques and related technologies such as Hadoop and Spark and their associated eco systems.

Data Mining: Data Mining is the process of extracting hidden, implicit, novel, and useful information from large volume of data. It has emerged as a unique combination of several fields of science and technology including statistics, database systems, computer programming, machine learning, and artificial intelligence. Data mining spans a wide range of applications in medicine and population health (study of drug implications, disease outbreak), bioinformatics (protein interactions, gene sequence analysis), engineering (intrusion detection and network security, flow classification, Web mining), business (fraud detection, decision support systems, risk analysis, forecasting market trend), and environmental studies (flood prediction).

Health Data: A data gathered as a matter of running a healthcare system, providing healthcare services, or conducting health research. Most health data are generated and collected routinely without a pre-defined research question, divided to administrative and clinical data.

Big Data: Big data is a collection of data sets so large and complex that it becomes difficult to process using traditional analytics algorithms. The challenges include capture, creation, storage, search, sharing, analysis, and visualization. The 4 V’s of big data are Volume (large volumes of data), Velocity (speed of data generation), Variety (structured, unstructured), and Veracity (trust and integrity).

Complete Chapter List

Search this Book: