Assessing Data Mining Approaches for Analyzing Actuarial Student Success Rate

Assessing Data Mining Approaches for Analyzing Actuarial Student Success Rate

Alan Olinsky (Bryant University, USA), Phyllis A. Schumacher (Bryant University, USA) and John Quinn (Bryant University, USA)
DOI: 10.4018/978-1-60960-102-7.ch010
OnDemand PDF Download:
List Price: $37.50


One way to enhance the likelihood that more students will graduate within the specific major that they begin with is to attract the type of students who have typically (historically) done well in that field of study. This chapter details a study that utilizes data mining techniques to analyze the characteristics of students who enroll as actuarial students and then either drop out of the major or graduate as actuarial students. Several predictive models including logistic regression, neural networks and decision trees are obtained. The models are then compared and the best fitting model is determined. The regression model turns out to be the best predictor. Since this is a very well understood method, it can easily be explained. The decision tree, although its underpinnings are somewhat difficult to explain, gives a clear and well understood output. Not only is the resulting model a good one for predicting success in the major, it also allows us the ability to better counsel students.
Chapter Preview


In a previous paper (Schumacher et al., in press). data mining techniques were applied in a study that investigated the likelihood that incoming college freshmen majoring in Actuarial Mathematics (AM) will graduate in this major. The study applied data mining to an earlier study which predicted success using only traditional logistic regression. The original study contained data spanning seven years of incoming university freshmen who started as AM majors in the years 1995-2001 (Smith and Schumacher, 2006).

There have been other recent studies utilizing the various techniques of data mining applied to issues within higher education. For example, in one comprehensive paper (Davis et al., 2008), predictive models were generated for three important educational concerns: student retention, student enrollment and donor giving. In another study (Herzog, 2006), used logistic regression, decision trees and neural nets to predict student retention and degree completion time for new and transfer students. Similarly, student retention was analyzed through six-year graduation predictive models which were developed with the use of various data mining techniques (Campbell, 2008). Furthermore, data mining methods, including neural nets and random forests were applied to an investigation (Vandamme et al., 2007) of academic success among first year college students. A data mining approach to predicting the disposition of admitted students as enrollees or nonenrollees was completed in another investigation (Antons and Maltz, 2006). There are additional papers involving applications of data mining within a university setting which are cited in the previous study (Schumacher et al., in press).

Since the goal of the original study was to predict whether or not a student graduated in the major, this became the target variable. The input variables included gender, math and verbal SAT scores, percentile rank in high school class and percentile rank on a department mathematics placement test. These variables were chosen from among data available from the admissions department collected from incoming students because they were know to be relevant to forecasting the student’s grade point average in their concentration (Smith and Schumacher, 2005). The variable high school rank in class did have more missing values than high school GPA, which was available and could have been used. However, rank in class had previously been shown to be a better predictor of student success in the 2005 study and so was used in the 2006 and the 2010 studies as well.

Complete Chapter List

Search this Book: