Predicting Student Retention by Linear Programming Discriminant Analysis

Predicting Student Retention by Linear Programming Discriminant Analysis

Jaan Ubi, Evald Ubi, Innar Liiv, Kristina Murtazin
Copyright: © 2014 |Pages: 11
DOI: 10.4018/ijtem.2014070104
(Individual Articles)
No Current Special Offers


The goal of the paper is to predict student retention with an ensemble method by combining linear programming (LP) discriminant analysis approaches together with bootstrapping and feature salience detection. In order to perform discriminant analysis, we linearize a fractional programming method by using Charnes-Cooper transformation (CCT) and apply linear programming, while comparing with an approach that uses deviation variables (DV) to tackle a similar multiple criteria optimization problem. We train a discriminatory hyperplane family and make the decision based on the average of the histograms created, thereby reducing variability of predictions. Feature salience detection is performed by using the peeling method, which makes the selection based on the proportion of variance explained in the correlation matrix. While the CCT method is superior in detecting true-positives, DV method excels in finding true-negatives. The authors obtain optimal results by selecting either all 14 (CCT) or the 8 (DV) most important student study related and demographic dimensions. They also create an ensemble. A quantitative course along with the age at accession are deemed to be the most important, whereas the two courses resulting in less than 2% of failures are amongst the least important, according to peeling. A five-fold Kolmogorov-Smirnov test is undertaken, in order to help university staff in devising intervention measures.
Article Preview

2. Multiple Criteria Linear Programming (Mclp) Appoach For Two Class Discriminant Analysis

2.1. Problem Description

Linear discriminant analysis is used in order to find a hyperplane that separates the two sets of students, in the best way achievable, thus we have:


In order to do that, we find the correct and the erroneous distance that each data point has to the hyperplane (see Figure 1). Correct distances are denoted by β-s and erroneous distances by α-s, which we add to each equation. The signs of β and α, will depend on whether the student will actually drop out, or graduate. Thus our objective is to simultaneously maximize the sum of β-s and to minimize the sum of α-s and we arrive at the following set of equations (note that for one student, only either α or β will be different from zero, depending on the accuracy of the prediction – or both will be zero, if the student lies on the discriminatory hyperplane):

Figure 1.

Separating hyperplane determining the correct and erroneous distances of data points. The color of the boundary line indicates whether the person has been successfully identified as either graduating or dropping out (black – successful identification; red – incorrect assessment) and the color inside the star indicates whether the person has really graduated or dropped out (blue – graduated; white – dropped out).


Note that some training set cases will indeed be placed on the hyperplane due to the nature of the linear programming methods used. The results still do only have a binary interpretation, though, as the validation and testing cases virtually never are laying on the hyperplane.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 8: 2 Issues (2018)
Volume 7: 2 Issues (2017)
Volume 6: 2 Issues (2016)
Volume 5: 2 Issues (2015)
Volume 4: 2 Issues (2014)
Volume 3: 2 Issues (2013)
Volume 2: 2 Issues (2012)
Volume 1: 2 Issues (2011)
View Complete Journal Contents Listing