Outlier Detection in Logistic Regression

Outlier Detection in Logistic Regression

A. A. M. Nurunnabi (SLG, University of Rajshahi, Bangladesh), A. B. M. S. Ali (CQUniversity, Australia), A. H. M. Rahmatullah Imon (Ball State University, USA) and Mohammed Nasser (University of Rajshahi, Bangladesh)
DOI: 10.4018/978-1-4666-1830-5.ch016


The use of logistic regression, its modelling and decision making from the estimated model and subsequent analysis has been drawn a great deal of attention since its inception. The current use of logistic regression methods includes epidemiology, biomedical research, criminology, ecology, engineering, pattern recognition, machine learning, wildlife biology, linguistics, business and finance, et cetera. Logistic regression diagnostics have attracted both theoreticians and practitioners in recent years. Detection and handling of outliers is considered as an important task in the data modelling domain, because the presence of outliers often misleads the modelling performances. Traditionally logistic regression models were used to fit data obtained under experimental conditions. But in recent years, it is an important issue to measure the outliers scale before putting the data as a logistic model input. It requires a higher mathematical level than most of the other material that steps backward to its study and application in spite of its inevitability. This chapter presents several diagnostic aspects and methods in logistic regression. Like linear regression, estimates of the logistic regression are sensitive to the unusual observations: outliers, high leverage, and influential observations. Numerical examples and analysis are presented to demonstrate the most recent outlier diagnostic methods using data sets from medical domain.
Chapter Preview

Logistic Regression Model Formulation

Regression analysis deals how the values of the response (dependent variable) change with the change of one or more explanatory (independent) variables. It is appealing because it provides a conceptually simple method for investigating functional relationship among variables (Chatterjee and Hadi, 2006). In any regression problem the key quantity is the mean value of the outcome (dependent or response) variable, given the value of the explanatory (independent) variable(s), E (Y|X). In linear regression, we assume that this mean is expressed as an equation linear in X (or some transformations of X or Y) such as. (1) Hence , (2), (3) where X is an matrix containing the data for each case with , Y is an vector of response, is the vector of regression parameters and is the error vector. Main difference between linear regression and logistic regression is that the outcome (response) variable is categorical (binary, ordinal or nominal). In case of logistic regression, we use the quantity to represent the conditional mean of Y given X. The specific form of the logistic regression model is ; (4), (5) where . This form gives an S-curve configuration. The well-known ‘Logit’ transformation in terms of is. (6) Hence, in logistic regression, the model in Equation (3) stands as

Complete Chapter List

Search this Book: