An Expanded Assessment of Data Mining Approaches for Analyzing Actuarial Student Success Rate

An Expanded Assessment of Data Mining Approaches for Analyzing Actuarial Student Success Rate

Alan Olinsky (Department of Mathematics, Bryant University, Smithfield, RI, USA), Phyllis Schumacher (Department of Mathematics, Bryant University, Smithfield, RI, USA) and John Quinn (Department of Mathematics, Bryant University, Smithfield, RI, USA)
Copyright: © 2016 |Pages: 23
DOI: 10.4018/IJBAN.2016010102
OnDemand PDF Download:


One way to enhance the likelihood that more university students will graduate within the specific major that they begin with is to attract the type of students who have typically (historically) done well in that field of study. This paper expands upon a study that utilizes data mining techniques to analyze the characteristics of students who enroll as actuarial students and then either drop out of the major or graduate as actuarial students. Several predictive models including logistic regression, neural networks and decision trees are obtained using input variables describing academic attributes of the students. The models are then compared and the best fitting model is determined. The regression model turns out to be the best predictor. Since this is a very well understood method, it can easily be explained. The decision tree, although its underpinnings are somewhat difficult to explain, gives a clear and well understood output. In addition, the non-predictive method of cluster analysis is applied in order to group these students into distinct classifications based on the values of the input variables. Finally, a new approach to modeling in SAS®, called Rapid Predictive Modeler (RPM), is described and utilized. The results of the RPM also select the regression model as the best predictor.
Article Preview

Literature Survey

In a previous paper (Schumacher et al., 2010), data mining techniques were applied in a study that investigated the likelihood that incoming college freshmen majoring in Actuarial Mathematics (AM) will graduate in this major. The study applied data mining to an earlier investigation which predicted success using only traditional logistic regression. The original study contained data spanning seven years of incoming university freshmen who started as AM majors in the years 1995-2001 (Smith and Schumacher, 2006).

Data mining applications in education are not limited to higher education. One such investigation is described in Sen et al. (2012), where four techniques (neural networks, support vector machines, decision trees, and logistic regression) were utilized to predict high school placement test results for 8th graders in Turkey. In this case, the decision tree was the best predictor while logistic regression was the least accurate. However, there have also been many investigations of issues in higher education involving data mining methods. For example, in one comprehensive paper (Davis et al., 2007), predictive models were generated for three important educational concerns: student retention, student enrollment and donor giving. In another study (Herzog, 2006), used logistic regression, decision trees and neural nets to predict student retention and degree completion time for new and transfer students. Similarly, student retention was analyzed through six-year graduation predictive models which were developed with the use of various data mining techniques (Campbell, 2008). Delen (2010) utilized four individual models (artificial neural networks, decision trees, support vector machines and logistic regression) along with ensemble techniques to predict student attrition. The data consisted of five years of first-year student enrollment. The support vector machines resulted in the best prediction with the decision tree being the next accurate. Meanwhile, Zhang et al. (2010) also applied data mining techniques to investigate student retention in college. They considered three techniques: Naive Bayes, Support Vector Machines and Decision Trees. They found that the Naïve Bayes algorithm had the highest prediction accuracy for those students dropping out. In Lin (2012), various machine learning algorithms were applied to data consisting of information for eight years of first year students in another study of college student retention. The five most accurate predictive models were shown to be decision trees, of which the best technique for these data was alternative decision tree (ADT).

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 4: 4 Issues (2017)
Volume 3: 4 Issues (2016)
Volume 2: 4 Issues (2015)
Volume 1: 4 Issues (2014)
View Complete Journal Contents Listing