Predicting the Academic Performance of Students Using Utility-Based Data Mining

Predicting the Academic Performance of Students Using Utility-Based Data Mining

Sidath R. Liyanage, K. T. Sanvitha Kasthuriarachchi
DOI: 10.4018/978-1-7998-0010-1.ch004
(Individual Chapters)
No Current Special Offers


Data mining in education has become an important topic in the sphere of influence of data mining. Mining educational data encompasses developing models, plotting data, and utilizing machine learning algorithms to derive patterns on educational data by attempting to uncover hidden patterns, create information for hidden relationships using educational statistics, and perform many more operations that are unfeasible using traditional computational tools. This research aims to identify the main factors that influence the academic performance of learners in tertiary education system in Sri Lanka. A conceptual framework and an analytical framework on factors affecting the academic performance was constructed with this aim. The analytical framework was then validated with the data collected from technology learners in a tertiary educational institute.
Chapter Preview


Data mining is devoted to examine high capacity data repositories to generate information and determine knowledge. Data mining techniques are applied to discover data patterns, organize information, discover relationships among data, and to structure association rules that are difficult to be carried out using classical search algorithms. Therefore, data mining is considered as a very beneficial technique to derive patterns and to make predictions in various real world settings including education.

Educational Data mining supports to find patterns in data and to make predictions in various aspects of education to support decision makers in the education industry (Alejandro Pena-Ayala, 2014). Modeling the learners’ performance is an important application of educational data mining. The modeled results, patterns and the relationships can help educationalists to make important decisions pertaining to students’ education. Data mining can be applied to identify factors that affects the performance of the learners and levels of the learner’s performances. This knowledge can then be used to determine how the courses should be adapted to different learning styles of students while fulfilling the learning outcomes of the course. Discovered patterns can also be used to provide feedback to learners leading to improved performance by the students.

It has been found that the factors affecting the students’ performance can be identified by applying proper learning analytic methods (Chatti, Dyckhoff, Schroeder and Ths, 2012). This potential has led to development of the new domain of Educational Data Mining (EDM). The main objective of using data mining in education is to enhance the process of educational practices and teaching and learning materials through more effective decision making with the insights produced by EDM. Identification of factors to determine the learning of students by analyzing their behaviors. Different interactive course materials can be used for conveying adaptive learning and facilitating personalized learning materials or practices. Analysis of educational records from different facets can recognize significant pointers for assessing the educational standings and for making important insights about the interactions of students’, teachers and fellow students.

The educational data is gathered in many different manners, named, databases, online e-leaning data, Learning Management System (LMS) data of educational institutes and using survey instruments. The internet is allowing the students to corporate with the university and execute educational activities via the web. In distance learning or e-learning, students are able to perform collaborative learning by building relationships with learners’ community, subject experts and educational facilitators.

Mining educational data can be useful for the analysis of students’ data and derive predictions (Romero C, Ventura S, 2013), (Romero C, Ventura S., 2007), (Chatti, Dyckhoff, Schroeder and Ths, 2012). There are more studies which are based on building and modeling the students, visualizing students’ data, students’ groupings, social network usage analysis of the students, and analysis of students’ distance learning data that makes decision making easier for course facilitators. Even though the prediction of students’ performance is also another important application area of EDM, only a few studies has addressed this issue (Chatti, Dyckhoff, Schroeder and Ths, 2012). Many similar studies have used statistical methods rather than data mining for the analysis of performance data.

In predicting the students’ performance by identifying the main factors affect to the performance using different data mining algorithms. In this study, Naive Bayes algorithm, Decision Tree, Support Vector Machine and Random Forest algorithm algorithms are used to perform the mining activities to derive the prediction models. Based on the model given by most accurate algorithm, the significant variables which can determine the performance of the students are identified. Then, the variables were tested for their impact to the target variable. The results of the prediction will be helpful in deriving important decisions in the education industry in many more ways.

Key Terms in this Chapter

Attribute Selection: The process of selecting a subset of attributes for building a model.

Classification: The process in which the objects are understood and group into classes.

Outliers: Are values that stays in an abnormal distance from other values in a random sample from a population.

Data Interpretation: The process of doing collection, analysis and present the data.

Case Deletion: This is a method of imputation in when all cases with a missing value are deleted.

Precision: Positive predictive values.

Cluster: Grouping a set of objects into objects in the same group are similar.

Data Mining: Collection of methods use for analysis of data.

Knowledge Discovery in Databases: An iterative process of extracting knowledge from raw data.

Educational Data Mining: Application of data mining techniques on educational data.

Kappa Statistic: This measures inter-rater agreement for qualitative attributes.

K Means Algorithm: A mining algorithm for clustering.

Cross Validation: Division of data set into training and testing set and validate the results of them.

Complete Chapter List

Search this Book: