Data Classification and Prediction

Data Classification and Prediction

Pudumalar S (Thiagarajar College of Engineering, India), Suriya K S (Thiagarajar College of Engineering, India), and Rohini K (Thiagarajar College of Engineering, India)
DOI: 10.4018/978-1-5225-4044-1.ch008
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

This chapter describes how we live in the era of data, where every event in and around us creates a massive amount of data. The greatest challenge in front of every data scientist is making this raw data, a meaningful one to solve a business problem. The process of extracting knowledge from the large database is called as Data mining. Data mining plays a wrestling role in all the application like Health care, education and Agriculture, etc. Data mining is classified predictive and descriptive model. The predictive model consists of classification, regression, prediction, time series analysis and the descriptive model consists of clustering, association rules, summarization and sequence discovery. Predictive modeling associates the important areas in the data mining called classification and prediction.
Chapter Preview
Top

Introduction

The greatest challenge in front of every data scientist is making this raw data, a meaningful one to solve a business problem. Data is the beginning point of all data mining process. The raw data or the collected data cannot use directly to build the business models. Hence processing added value to the data called information. The information is the processed data which is stored and managed in the large database. The process of extracting knowledge from the large database is called as Data mining. Data mining software analyses relationships and patterns in stored transaction data based on open-ended user queries. In the data mining Major elements are listed follows 1) Extract, make over and load transaction data onto the data warehouse system. 2) Store and manage the data in a multidimensional database system. 3) Provide data access to information technology professional and business analysts. 4) Analyze the data by application software. 5) Present the data in a useful format, such as a graph or table. Data mining is classified predictive and descriptive model. The predictive model consists of classification, prediction, regression, and time series analysis. The descriptive model is consist of clustering, summarization, association rules and sequence discovery. Predictive modeling associates the important areas in the data mining called classification and prediction. Applications of predictive modeling include customer retention management, cross-selling, direct marketing, and credit approval which are notable by the nature of the variable being predicted. “Why classification is important?” The classification problem attempts to learn the relationship between a set of feature variables and a target variable of interest. For example, the bank manager has massive customer’s data, which consists of customer details and who all are applying for the loan. The manager will classify the customer data and easily identify the customers who all are in the risk and safe condition which is called as classification. The classified data are used to create a pattern to forecast the future condition of the customers, which is called as a prediction. Now a day’s data classification and prediction holds promise in many fields to enhance efficiency and reduces the time complexity of the application. Classification and Prediction can be performed only when the data comes in the following steps, data pre-processing includes data cleaning, replace missing values, data relevance, data transformation, and data reduction. Most classification algorithms typically have two phases:

  • 1.

    Training Phase: In this phase, a training model is constructed from the training instances. Intuitively, this can be understood as a summary mathematical model of the labeled groups in the training data set.

  • 2.

    Testing Phase: In this phase, the training model is used to determine the class label (or group identifier) of one or more unseen test instances.

Complete Chapter List

Search this Book:
Reset