Data mining is the process of extracting previously unknown information from large databases or data warehouses and using it to make crucial business decisions. Data mining tools find patterns in the data and infer rules from them. The extracted information can be used to form a prediction or classification model, identify relations between database records, or provide a summary of the databases being mined. Those patterns and rules can be used to guide decision making and forecast the effect of those decisions, and data mining can speed analysis by focusing attention on the most important variables.
Key Terms in this Chapter
Data Visualization: A technology for helping users see patterns and relationships in large amounts of data by presenting the data in graphical form.
Explanatory Variables: Used interchangeably and refers to those variables that explain the variation of a particular target variable; also called driving, or descriptive, or independent variables.
Information Quality Decay: The quality of some data goes down when facts about real world objects change over time, but those facts are not updated in the database.
Neural Networks: Also referred to as artificial intelligence (AI), which utilizes predictive algorithms.
Pattern Recognition: The act of taking in raw data and taking an action based on the category of the data. It is a field within the area of machine learning.
Data Mining: Also known as knowledge discovery in databases (KDD), data mining is the process of automatically searching large volumes of data for patterns. Data mining is a fairly recent and contemporary topic in computing.
Predictive Analysis: The use of data mining techniques, historical data, and assumptions about future conditions to predict outcomes of events.
Information Retrieval: The art and science of searching for information in documents, searching for documents themselves, searching for metadata which describe documents, or searching within databases, whether relational stand-alone databases or hypertext networked databases such as the Internet or intranets, for text, sound, images, or data.
Segmentation: Another major group that comprises the world of data mining involving technology that identifies not only statistically significant relationships between explanatory and target variables, but determines noteworthy segments within variable categories that illustrate prevalent impacts on the target variable.
Machine Learning: Concerned with the development of algorithms and techniques which allow computers to “learn.”