Integrated Data Mining and Business Intelligence

Integrated Data Mining and Business Intelligence

S M Monzurur Rahman (United International University, Bangladesh), Md Faisal Kabir (United International University, Bangladesh) and Muhammad Mushfiqur Rahman (Samsung R&D Institute Bangladesh Ltd, Dhaka, Bangladesh)
Copyright: © 2014 |Pages: 20
DOI: 10.4018/978-1-4666-5202-6.ch114
OnDemand PDF Download:
$30.00
List Price: $37.50

Chapter Preview

Top

Background

Today, the collection and analysis of data is integral to the strategic performance of an organization. When performance does not meet targeted expectations organizations must be in a position to analyse performance data to gain insight on how relevant strategies can be improved. They need to find out why they are having problems, what the cause is and the optimal approach for improvement.

Due to the volumes and complexity of data collected across organizations today manual approaches to analysis are often not effective. The data may be spread across the organisation in a diversity of systems and formats. Analytical systems need to be able to integrate this diversity and provide a comprehensive view of the business. Automated analytic systems are required to help sift through large volumes of data to find interesting patterns that would not be possible manually. Data mining techniques provide, well-proven, techniques to help automate analysis. Data Mining helps transform data into actionable information and provide the insight required to improve strategy (Han, J. & Kamber, M., 2001). What is commonly known as Data Mining is a part of a small process called “Knowledge Discovery in Databases”? According to Piatetsky-Shapiro and Frawley (Piatetsky-Shapiro & Frawley, 1991), “Knowledge Discovery in Databases is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data”.

Knowledge Discovery is a cyclic process. The process starts with the selection of key data from a data warehouse, data mart or the amalgamation of data from other sources (Edelstein H. 1999). When large data bases are available it is often necessary to sample the data to get smaller manageable portions of data. Usually the data requires manual pre-processing or transforming to some degree prior to analysis. Pre-processing may involve tasks like cleaning up the data, remove noisy, missing data, or grouping data. Transformation usually involves the creation of new derived attributes. e.g. Trends or Moving Averages but also may involve filtering, ordering, editing, and normalisation. Data visualization is also important at this point to gain a basic understanding of the data.Data mining is the next part of this process, according to Bradley et al., (Bradley, Fayyad, & Mangasarian, 1998): “Data mining is the step in the KDD process concerned with the algorithmic means by which patterns or models (structures) are enumerated from the data under acceptable computational efficiency limitations.” Data Mining can be used in a number of ways:

  • Exploration: The aim is to gain new insight on your business and discover new patterns in your data. We may discover a small segment of customers that are not satisfied and discover their locality is not covered well by service technicians.

  • Confirmation: You may have a hunch that a pattern exists and need to test for its existence.

  • Prediction: The aim is to build a model that can predict the likelihood of business events. The model reads in new records and generates a score for that event. For examples, predicting if a customer will churn next month, or respond to marketing campaign, or predict when a system is going to fail.

  • What if Analysis: Once you have a predictive model it can be used to explore different scenarios by entering hypothetical data.

Data Mining can be used to help solve the following kinds of problems:

Key Terms in this Chapter

DynaMart: It is the technology to store queried data locally or use data queried directly from data warehouse or databases.

Validation: It is the process of assessing how well data mining models perform against real data. It is important that to validate data mining models by understanding their quality and characteristics before to deploy them into a production environment.

Business Intelligence: Business intelligence (BI) is the ability of an organization to collect, maintain, and organize knowledge. BI technologies provide historical, current and predictive views of business operations. Common functions of business intelligence technologies are reporting, online analytical processing (OLAP), analytics, process mining, complex event processing, business performance management, benchmarking.

ETL: Extract, Transform and Load (ETL) refers to a process in database usage and especially in data warehousing that involves Extracting data from outside sources, Transforming it to fit operational needs and loading it into the end target (database, more specifically, operational data store, data mart or data warehouse).

Data Warehouse: A data warehouse is a database used for reporting and data analysis. It is a central repository of data which is created by integrating data from multiple disparate sources. Data warehouses store current as well as historicsal data and are commonly used for creating trending reports for senior management reporting such as annual and quarterly comparisons.

Data Mining: Data mining is the process of analyzing data from different perspectives and summarizing it into useful and actionable information. Data mining software is one of a number of analytical tools for analyzing data.

OLAP: Online analytical processing which provide multi-dimensional views of various kinds of business activities or data. OLAP tools enable users to interactively analyze multidimensional data from multiple perspectives. OLAP consists of three basic analytical operations: consolidation (roll-up), drill-down, and slicing and dicing.

Cloud Computing: Cloud computing is a distributed computing model for enabling convenient, on-demand network access to a shared pool of configurable and reliable computing resources (e.g., networks, servers, storage, applications, services) that can be rapidly provisioned and released with minimal consumer management effort or service provider interaction.

Complete Chapter List

Search this Book:
Reset