Article Preview
Top1. Introduction
In the internet age or the dot com age, a huge amount of data transformation is observed. There is an exponential rise in data generation. According to an IBM Marketing Cloud study, the numbers of internet users are multiplying at a fast rate. In 2014, the number of internet users was 2.4 billion. In 2016, it was raised to 3.4 billion. In 2017, 300 million new users joined – this aggregated to 3.8 billion internet users (as of April 2017). The figures conveyed that the number of internet users increased by 42% in past three years (Schultz, 2017). Do we ever wonder, what is happening every day on the internet? Each day data is produced by the users of social media, Twitter is flooded with 656 million tweets per day, each day around 4 million hours of data is uploaded on YouTube, Instagram is populated with 67,305,600 posts each day. In 2015, there were 1.44 billion Facebook users and in the beginning of 2016, the number grew to 1.65 billion. At present, the figure has crossed over to 2 billion users (Schultz, 2017).
This huge amount of data generated every minute can be used to produce meaningful information which can be used to serve the nation in a better way. Due to large volume of data is generated and stored in databases, traditional approaches and database tools are no more adequate for analyzing such a huge, voluminous amount of data. The biggest problem for the educational institutes is the storage of huge volume of data which is generated and how to utilize this data for improving the intake and retention of students, improving academic programs, facilities, services and management (Abaidullah, Ahmed, & Ali 2015; Delavari, Phon-Amnuaisuk, & Beikzadeh, 2008; Goyal & Vohra, 2012). Higher education institutes implement various “conventional and unconventional” strategies based on “qualitative and quantitative” approaches, which keep them away from achieving their quality targets (Abaidullah et al., 2015; Delavari et al., 2008). The approaches used by the educational institutions are mainly based on their consistent formats and reports in the form of student feedback. These methods have the shortcoming to unfold the hidden information like student performance, admission intake, predicting student’s weak areas (Abaidullah et al., 2015).
The hidden information from the large dataset can be best unfolded by a data analysis methodology known as data mining techniques (Luan, 2002; Han, Pei, & Kamber, 2011). Data mining is now widely used in higher education, due to its latent qualities or abilities that may be developed and lead to future success or usefulness to educational institutes, a rising field known as educational data mining has evolved (Kumar & Chadha, 2011; Romero & Ventura, 2007). The education data mining community website www.educationaldatamining.org defines data mining as follows: “Educational Data Mining is an emerging discipline, concerned with developing methods for exploring the unique and increasingly large-scale data that come from educational settings and using those methods to better understand students, and the settings which they learn in” (Baker & Yacef, 2009). Educational data mining is about making predictions for the educational entities like students, faculties, staff and management with the objective to cater quality education among the students (Baradwaj & Pal, 2012).