Predicting Academic Performance of Immigrant Students Using XGBoost Regressor

Predicting Academic Performance of Immigrant Students Using XGBoost Regressor

Selvaprabu Jeganathan, Arun Raj Lakshminarayanan, Nandhakumar Ramachandran, Godwin Brown Tunze
DOI: 10.4018/IJITWE.304052
Article PDF Download
Open access articles are freely available for download


The education sector has been effectively dealing with the prediction of academic performance of the Immigrant students since the research associated with this domain proves beneficial enough for those countries where the ministry of education has to cater to such immigrants for altering and updating policies in order to elevate the overall education pedagogy for them. The present research begins with analyzing varied educational data mining and machine learning techniques that helps in assessing the data fetched form PISA. It’s elucidated that XGBoost stands out to be the ideal most machine learning technique for achieving the desired results. Subsequently, the parameters have been optimized using the hyper parameter tuning techniques and implemented on the XGBoost Regressor algorithm. Resultant there is low error rate and higher level of predictive ability using the machine learning algorithms which assures better predictions using the PISA data. The final results have been discussed along with the upcoming future research work.
Article Preview


For quite some time, the education sector has been effectively dealing with the prediction of academic performance of the Immigrant students. There are various studies that have employed the traditional statistical approaches that confronts the drawback of over-fitted frameworks, constraints in handling large participants and predictors as well as ineffectiveness in selecting any non-linearity’s. The international assessment program PISA (Programme for International Student Assessment) run by OECD (Organization for Economic Co-operation and Development) emphasizes towards studying the students' learning outcomes. The OECD (Organization for Economic Cooperation and Development) has been responsible for collecting voluminous data since 2000 worldwide which being the stratified samples of 15-year-old students and forms the basis for PISA (Programme for International Student Assessment. It resembles “world's premier yardstick that effectively assesses the quality, equity and efficiency of school systems” (OECD. PISA 2012). The PISA research on the other hand is responsible towards fetching detailed information pertaining reading skills, Mathematical computations and science literacy of 15-year-old students. It also covers up the student’s motivation, learning patterns, personal outlook or viewpoints along with overall school and family atmosphere (Akyüz & Pala,2010; Kamaliyah et al., 2013). PISA goes a step ahead in evaluating the basic knowledge and learning skills acquired by the students in their prior school system and to what extent they can implement it in reality (Bautier & Rayou, 2007). The targeted students’ needs to undergo a cognitive test that involves testing of their mathematical, reading and science skills.

In addition, there is a questionnaire wherein the students need to feed information pertaining to their social and financial status and their attitude, inspiration and general outlook towards education patterns/phases. These tests and questionnaire are being published by the OECD (Organization for Economic Co-operation and Development) and has been excessively highlighted in various literatures. Employing different immigration policies and success levels of integrating immigrants has been the prime focus of this program followed by the obvious cultural, social, religious and historical exchange (Kunz et al., 2016). Various immigrants holding varying socio–economic background is fascinated by other countries lifestyle and flexible immigration policies (Entorf & Minoiu, 2005; Hochschild & Cropper P, 2010). Most of the countries offer socio–economic endowment which tends to be a significant parameter towards educational upliftment and progress of students (Alyssa & Verena, 2015). OECD assures that the entire data collection is highly reliable, validated and publicly available (OECD, 2009; 2012). Also, since the collected data set is accessible across various countries, there is availability of highly rich database for the purpose of educational data mining and machine learning applications. For gaining access to PISA data and the evaluated results for research purpose, interested countries are willing to pay huge amount for the same purpose (Musik et al., 2016). Though according to Rutkowski et al. (Rutkowski et al., 2010), there are lot many researchers who prevent indulging in such high quality freely accessible datasets as it involves unavoidable technical complexities. The overindulgence of data in accordance with the value extraction practices aids in the process of decision making in the education domain. One of the attention seeking education domain is EDM that is associated with data mining, Machine learning and statistics applications for comprehending student’s outlook and elevating their learning background. (Romero & Ventura, 2010). Though EDM is underexplored because of modest data processing approaches, inadequate coherent education big datasets and restrictive big data tools (Buckingham et al., 2013; Koprinska et al., 2015). Prior learnings have employed learning analytics and traditional statistical modeling techniques like linear/logistic regression having a linear decision surface but are optimal just for correlated variables. Also, there is an absence of proven paradigm that can optimize performance prediction (Xing et al., 2015). Failure in determining the appropriate independent variables and data distribution functions can lead to unsatisfactory results.

Complete Article List

Search this Journal:
Volume 19: 1 Issue (2024)
Volume 18: 1 Issue (2023)
Volume 17: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 16: 4 Issues (2021)
Volume 15: 4 Issues (2020)
Volume 14: 4 Issues (2019)
Volume 13: 4 Issues (2018)
Volume 12: 4 Issues (2017)
Volume 11: 4 Issues (2016)
Volume 10: 4 Issues (2015)
Volume 9: 4 Issues (2014)
Volume 8: 4 Issues (2013)
Volume 7: 4 Issues (2012)
Volume 6: 4 Issues (2011)
Volume 5: 4 Issues (2010)
Volume 4: 4 Issues (2009)
Volume 3: 4 Issues (2008)
Volume 2: 4 Issues (2007)
Volume 1: 4 Issues (2006)
View Complete Journal Contents Listing