Outlier Detection in Multiple Linear Regression

Outlier Detection in Multiple Linear Regression

Divya D. (Adi Shankara Institute of Engineering & Technology, India) and Bhraguram T.M. (Adi Shankara Institute of Engineering & Technology, India)
Copyright: © 2014 |Pages: 9
DOI: 10.4018/978-1-4666-5202-6.ch158

Chapter Preview



Outlier detection as a branch of data mining has many important applications, and deserves more attention from data mining community. Outliers are normally treated as noise that needs to be removed from a dataset (Ben, 2005). Hawkins (1980) gives the definition for outliers as an outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism. Outliers can be caused by different situations. Removing and detecting outliers is very important in data mining, for example error in large databases can be extremely common, so an important property of a data mining algorithm is robustness with respect to outliers in the database.

Removal of outliers is needed for the successful execution of a particular algorithm. Many techniques employed for detecting outliers are fundamentally identical but with different names chosen by the authors. For example, authors describe their various approaches as outlier detection, novelty detection, anomaly detection, noise detection, deviation detection or exception mining (Victoria & Jim, 2004). In the case of clustering algorithms, there may be data points that do not belong to any of the clusters which considered as an outlier. In this case outliers need to be removed for the successful execution of the clustering algorithms. But in some other cases, outlier detection techniques may lead to the discovery of important information in the data. This is because of the fact that “one person’s noise is another person’s signal” (Varma & Rajesh, 2011). Outliers may be result of variability that is inherent in the data. The manager’s salary in a company could naturally stand out as an outlier since it may be extremely higher other employees’ salaries. But this outlier should not be removed since it is an important part of the company’s payroll. Outlier detection strategies can also be used for data cleaning as a step which is used to clean any data before any traditional mining algorithm is applied to the data.

Many of the researches in outlier detection have focused on datasets that consists of one type of attribute, i.e. only numerical attributes or ordinal attributes that can be directly mapped into numerical values, or only categorical attributes. For example we may have data containing only categorical attributes; it is assumed that the categorical attributes could be easily mapped into numerical values. However, there are cases, where mapping categorical attributes to numerical attributes is difficult (Anna & Michael, 2010).

Today, business is expanding at a rapid pace with changing needs. Business plays a vital role in the capital formation of a country, and people consider it the life blood of a growing economy. Therefore, it is very important to manage business effectively and efficiently. One of the major issues encountered by fund managers today is not just the procurement of funds but also their meaningful deployment to generate maximum returns. Sources of funds are generally the same across all business but then why is it that some businesses are able to do better than the rest? If the logic behind the outstanding performance is a viable business idea, why is it that some companies still fail to achieve success even with ample funds and the right business idea? (KirtiMadan, 2007)The above discussion clearly implies that there is something beyond financial success of business besides great ideas and good geographic presence; this implies the importance of working capital management (WCM) in determining the firm’s success. Working capital is the proportion of company’s total capital which is employed in short term operation.

Key Terms in this Chapter

Outlier Detection: Points of deviation.

High Dimensional Datasets: Data set having multiple dimensions.

Threshold: Limiting Value.

Multiple Linear Regression: Regression model contains multiple variables.

Multivariate Outlier Detection: Outlier detection contains multiple variables.

Working Capital Management: Management of current assets and current liabilities.

Complete Chapter List

Search this Book: