Prediction of Cancer Disease Using Classification Techniques in Map Reduce Programming Model

Prediction of Cancer Disease Using Classification Techniques in Map Reduce Programming Model

M. A. Saleem Durai (VIT University, India), Anbarasi M. (VIT University, India) and Jaiti Handa (VIT University, India)
Copyright: © 2018 |Pages: 20
DOI: 10.4018/978-1-5225-2863-0.ch007
OnDemand PDF Download:
No Current Special Offers


As the volume of data is increasing with time the primary issue is how to store and process such data and get useful information out of it. Analysis of classification algorithms and MapReduce programming model has led to the conclusion that the distributed file system and parallel computing attributes of MapReduce are good for designing classifier model. The major reason for it is parallel processing of data in which data is divided and processed in parallel and the output from each is reduced further for a single output. In this paper, we are going to study how to use MapReduce model to build classifier model. We are using cancer dataset to predict if a person has cancer or not by using Naive Bayes and KNN classification algorithms. We have compared them on the basis on computational time and the factors like sensitivity, specificity, and accuracy. In the end, we would be able to compare these two algorithms and tell which one works better on MapReduce programming model
Chapter Preview

1. Introduction

“The amount of data that was generated during the last couple of years accounts to 95% of the total data” With the advancement of new technologies, increased usage of social networking sites, need of storing the data for analysis purpose is of utmost importance. Until yesterday the data you stored on the servers in your company was simply data, then suddenly a term emerges Big Data; this term refers to each and every bit of data you have stored till date. It includes even the URL’s you have been marking till date. In short, every piece of data doesn’t matter if it is structured or unstructured is collectively big data.

1.1 What Contributes to Big Data?

Data stored in Black Box: It is deployed in aircrafts. It keeps a record of all conversations of the crew, also records the information about the performance of the aircraft time to time. Data generated by social media: People on social networking sites share pictures, messages, voice messages, every second post has been generated and all such information needs to be stored. Stock Exchange: The stock market takes a new turn every minute, all the information regarding the “buying’’ and “selling” of shares needs to tracked and stored. It further allows us to understand the market well.

1.2 Uses of Big Data

Understanding the Market and Customer Requirements: It is one of the biggest areas where big data is used. The market researchers will understand the market and help big firms and business act according to the trends in the market and the customer requirements for example it may help Wal-Mart to estimate which product to sell and at what cost according to the need and demand in the market.

Improve quality of healthcare service: Big data can be used to predict certain diseases like cancer, heart attacks etc. This will further assist our doctors to help provide quality of services (Ebenezer et al., 2015). It has the power to decode a DNA in seconds and help us to understand disease patterns.

Helps sportsperson to understand better and perform better: there are certain tools that use analytics using videos to understand how each player performs and to understand pattern each player follows. Such tools have already been used in analyzing performance of players in basketball, tennis.

Weather forecasting as well as disaster forecasting: Analytics has been used to forecast the weather conditions using big data. It is been also used by scientists to predict disasters which further helps in saving human lives and prepare for such disasters in advance.

Helps in development of smart devices and to increase their performance: recently invented the Google’s self-driving car uses big data to operate and understand what should be done at certain circumstances; it uses big data to identify humans, cars, buildings and to act according to the situation.

1.3 Big Data Technologies

In order to have accurate analysis and concrete decision we need to know the big data technologies which will in turn reduce the operation cost and efficiencies (Beakta et al., 2015; Lakshmi et al., 2016). Different vendors provide various technologies which include the once provided by IBM, Amazon, etc. Operational Big Data Systems like Mongo DB come under this that provides operational capabilities. The cloud computing architectures came into existence, help in huge computations with higher efficiency and lowering the cost. The systems like NoSQL take advantage of it.

1.3.1 Analytical Big Data

Analytical big data is the method of analyzing big data that may include understanding of market, demand and trends and help business to increase profit, quality of customer services as they are able to understand customers’ needs and demands. The basic advantage of it is to help organizations to make good business decisions. The data scientists can understand the huge transactional data.

Complete Chapter List

Search this Book: