Article Preview
Top1. Introduction
With an increasing interest in open educational resources, the web-based learning has become a commonplace in higher education institutions and organisations. There is plethora of different terms used in literature to describe the online learning delivery platforms like Virtual Learning Environment (VLE), or Learning Management Systems (LMS), or Massive Open Online Courses (MOOC). In the remaining of this paper, we will use the term VLE for designating all the E-learning environments. These modern trends of these platforms is credited to their ability to provide an open, online, high quality and low-cost educational content on a large scale more efficiently (Almatrafi & Johri, 2018). The VLEs attract not only the educational and pedagogical communities, but also scientists from various disciplines such as Philosophy, Educational science, Machine Learning, Statistics, and Computers sciences. Despite the potential and high associated with the VLE, retention rate over all are typically very low. Studies on distance learning assert that the percentage of learners who completed the course is only 22% (Reich, 2014), or even 7% reported by (Jiang & Kotzias, 2016). Such percentages seriously doubt the reliability and the efficacy of the VLE (Kloft, et al., 2014).
This in turn can motivate researchers and scientists to analyse and exploit the reasons of the high withdrawals; hence, dropout prediction can be an important aspect in such environment. An early prediction can help the different stakeholders for several reasons. Teachers can anticipate possible issues with learners and adapt their courses or leaning strategies to improve engagement. In addition, instructors or courses’ designers can use the predictive model to make decisions about the curricular design and to personalize interventions. Furthermore, learners can receive information about their learning progress, which allow them to reflect on how they are doing and improve their performance.
Rigorous efforts were put for modelling advanced tools for monitoring the learners’ progress. These tools often use the VLE learners’ characteristics as an input and the predicting the learners’ course withdrawal as an output. In this context, various Machine Learning techniques have been successfully applied to obtain statistically high dropout prediction accuracy. They mainly focused on gathering the learners’ characteristics and on applying several techniques in order to process this form of data.
In the VLE context, many datasets served to build models aimed at predicting the aforementioned outcomes, like the KDD Cup (Kuzilek, et al., 2017), OULAD (Kuzilek, et al., 2017), the Student Academic Performance dataset (Bharara, et al., 2018), etc. The existing methods for data preprocessing mainly employed the intrinsic statistical characteristics of the data features (the learners’ descriptors), in order to prepare an effective data, before the training step. Nevertheless, these features enclosed a semantical characteristic that has an important impact on the extracted knowledge’ quality. The abstraction of the learners’ descriptors is performed through the establishment of the indicators. In the work done by (Popescu, 2009), the behavioral indicators referred to the relative frequency of these learner actions, the amount of time spent on a specific action type and the order of performing these actions. According to (Bousbia, et al., 2013), an indicator describes the learners’ navigation behaviors based on their low-level navigation traces (links followed, clicks, etc.). Based on the above-mentioned definitions, we infer that, the model of behavioral indicators is seen as a meta-knowledge of the traces’ observations. Many behavioral indicators are proposed in literature like navigation type (Bousbia, et al., 2013), disorientation (Adda, et al., 2016), concentration rate (Ammor, et al., 2013), collaborative level (Bouzayane & Saad, 2017), contribution rate (Wong, et al., 2015) and effort level (Papanikolaou, 2015). These indicators offered a considerable representative power; they provided new semantically coherent features’ set, which is efficient not only for optimizing the predictability of learners’ dropout but also for understanding the impact of the indicators on the learners’ commitment to the course completion.