Search the World's Largest Database of Information Science & Technology Terms & Definitions
InfInfoScipedia LogoScipedia
A Free Service of IGI Global Publishing House
Below please find a list of definitions for the term that
you selected from multiple scholarly research resources.

What is Over-Fitting

Encyclopedia of Artificial Intelligence
A common problem in Machine Learning where the training data can be explained well but the model is unable to generalize to new inputs. Over-fitting is related to the complexity of the model: any data set can be modelled perfectly with a model complex enough, but the risk of learning random features instead of meaningful causal features increases.
Published in Chapter:
Functional Dimension Reduction for Chemometrics
Tuomas Kärnä (Helsinki University of Technology, Finland) and Amaury Lendasse (Helsinki University of Technology, Finland)
Copyright: © 2009 |Pages: 6
DOI: 10.4018/978-1-59904-849-9.ch100
Abstract
High dimensional data are becoming more and more common in data analysis. This is especially true in fields that are related to spectrometric data, such as chemometrics. Due to development of more accurate spectrometers one can obtain spectra of thousands of data points. Such a high dimensional data are problematic in machine learning due to increased computational time and the curse of dimensionality (Haykin, 1999; Verleysen & François, 2005; Bengio, Delalleau, & Le Roux, 2006). It is therefore advisable to reduce the dimensionality of the data. In the case of chemometrics, the spectra are usually rather smooth and low on noise, so function fitting is a convenient tool for dimensionality reduction. The fitting is obtained by fixing a set of basis functions and computing the fitting weights according to the least squares error criterion. This article describes a unsupervised method for finding a good function basis that is specifically built to suit the data set at hand. The basis consists of a set of Gaussian functions that are optimized for an accurate fitting. The obtained weights are further scaled using a Delta Test (DT) to improve the prediction performance. Least Squares Support Vector Machine (LS-SVM) model is used for estimation.
Full Text Chapter Download: US $37.50 Add to Cart
More Results
Artificial Neural Networks for Business Analytics
Occurs when a mathematical model describes random error or noise instead of the real underlying relationships within a dataset, which artificially produces desirable goodness of fit metrics for training data, but produces poor metrics for testing data.
Full Text Chapter Download: US $37.50 Add to Cart
Artificial Neural Networks and Data Science
Occurs when a mathematical model describes random error or noise instead of the real underlying relationships within a dataset, which artificially produces desirable goodness of fit metrics for training data, but produces poor metrics for testing data.
Full Text Chapter Download: US $37.50 Add to Cart
AI Methods for Analyzing Microarray Data
A situation where a model learns spurious relationships and as a result can predict training data labels but not generalize to predict future data.
Full Text Chapter Download: US $37.50 Add to Cart
Artificial Neural Networks and Their Applications in Business
Occurs when a mathematical model describes random error or noise instead of the real underlying relationships within a dataset, which artificially produces desirable goodness of fit metrics for training data, but produces poor metrics for testing data.
Full Text Chapter Download: US $37.50 Add to Cart
Global Induction of Classification and Regression Trees
The problem existing in supervised learning when a classifier or regressor perfectly predicts the training data but performs much worse on unseen, testing set. Problem can emerge when machine learning algorithm fits noise or insufficient data, i.e., when it learns on irrelevant facts or generalizes from specific cases.
Full Text Chapter Download: US $37.50 Add to Cart
eContent Pro Discount Banner
InfoSci OnDemandECP Editorial ServicesAGOSR