A New Learning Path Model for E-Learning Systems

This work presents a new approach to the learning path model in e-learning systems. The model uses data from the database records from an e-learning system and uses graphs as representation. In this work, the authors show how the model can be used to represent visually the learning paths, behavior analysis, help to suggest group formation for collaborative activities, and thus assist the teacher in making decisions. To validate the practical utility of the model, the authors created two tools, one to visualize the learning paths and another to suggest groups of students for collaborative activities. Both tools were tested in a real environment, presenting useful results. The authors carried experiments with students from three programs: physics, electrical engineering, and computer science. Experiments show that it is possible to use the proposed learning path to analyze student behavior patterns and recommend group formation with positive results.


INTRoDUCTIoN
Analytical tools and models that allow understanding students' behavior can infer individual or collective patterns and improve students' experiences (Saito & Watanobe, 2020). These mentioned tools and models can also help teachers monitor the learners' actions in a Learning Management System (LMS) (Baneres et al., 2019;Weiand et al., 2019).
In this context, new information generated from LMS data can facilitate the teaching and learning process. LMS collects data about users that can help define the learner's profile, learner's behavior, and identify their difficulties and needs. One way of accompanying learners is to observe the actions they perform on the system, and these actions can result in paths known as Learning Paths (LP).

36
A, B, C, and D have been accessed in that order, then the edges of the path are AB, BC, and CD. The following are more details about how this work uses the graph representation.
For each resource or activity created, the LMS saves a new record named instance in the database. Also, LMS stores the order in which the instances must appear. This order corresponds to the path suggested by the teacher, the first instance is the starting point, and the last is the end of the proposed learning sequence. The model also uses data from the sections where teachers group resources and activities. If the teacher later reorders the resources and activities provided, the model automatically adapts to the new configuration. The data presented here were selected because they allow the visual representation of learning paths as well as can be used with Machine Learning and Data Mining tools, which is part of another study being conducted.
The model has assigned vertices that store the following information: • pos: The position of the vertex in the sequence defined for resources and activities.
• cmid: A unique identifier of each instance, this information maintains the relation between vertex and instance of the LMS. • module: The resource/activity type such as 'forum' or 'assign'. • module_name: Module name in the translated language, preceded by the value of the pos attribute.
• name: The title of the resource/activity established by the teacher for the instance.
• section: The topic number which the vertex belongs to. • section_name: The title of the topic. The teacher can define it. Otherwise, the name "Topic" plus the section attribute will be assigned. • value: Number of interactions (V), indicates the total interactions (visualization, file sending, posting, among other actions). The visual representation uses the value as vertex weight.
The edges contain the following information: • source: The identifier of the starting instance of the student's interaction. This attribute references the pos attribute of a vertex. • target: The identifier of the arrival instance of student's interaction, also indicates the pos value of the accessed vertex. Thus, it is possible to identify the origin and destination of the edge, which makes the graph directed. • value: The number of times the learner goes from source vertex to the target vertex, it indicates the edge weight. • type: Informs the edge classification. The model has defined three possible taxonomies: return, standard, and advance; these types will be detailed later. • is_max_value: The boolean attribute that indicates the most used edge among all that originates from source vertex. • is_min_value: The boolean attribute that indicates the least used of all the edges that start from source vertex. • is_max_value_advance: Boolean attribute, if the edge is of the type 'advance', it indicates if it, of all the edges of the type 'advance' starting from the source vertex, is the one that has the highest value. • is_max_value_return: Boolean attribute, if the edge is of the type 'return', it indicates if to it of all the edges of the type 'return' starting from the vertex of origin, is the one that has the higher value.
As stated in the model, the edges are classified into three types: standard, advance, and return, described below. Given the edge (v i ,v j ), where i and j are the positions of the vertex (pos value), the model has: • standard edge: It represents navigation from a resource/activity to the immediate successor, that is, given a vertex v i , and a vertex v j , the edge that part from v i , and points to v j is 'standard' type if j=i+1. • advance edge: It indicates navigation from one resource/activity to the other later than the immediate successor, that is, given a vertex v i , and a vertex v j , the edge that starts from v i , and points to v j is 'advance' type if j>i+1. • return edge: It indicates navigation of a resource/activity to a previous one, in this case, given a vertex v i , and a vertex v j , the edge that starts from v i , and points to v j is 'return' type if j<i.
When students follow the trajectory defined by the teacher, they walk from vertex v i to v j where always j=i+1. Given this, there can only be at most one standard edge for each vertex v i . When the vertex represents the last instance, it has no standard edge and no advance edge, and when the vertex represents the first instance, it has no return edge. The proposed model is dynamic, whenever a student starts from vertex v i to v j , the algorithm adds 1 to the value attribute of the edge (v i ,v j ).
With information about the edges, the model can calculate the proportion of the amount of each type of edge to the total of edges, as well as the dispersion measure, which indicates how dispersed the student's navigation is.
The calculation of edge ratio (prop) is given by the ratio of the number of edges of a given type x by the total of all edges n, according to, where x ∈ {advance, standard, return}and n>0: The average of the edges' length gives the dispersion measure (disp). The length of an edge e is the difference between origin and destination vertices positions and is given by length e =|i -j|, where i and j are the position of the vertices bounded by the edge (v i ,v j ), and n is the total number of edges of the graph. The dispersion measure can be calculated by considering all edges of the graph or by type of edge. Equation represents the dispersion calculation, considering all the edges of the graph, whereas represents the calculation of the dispersion by edge type. In x ∈{advance, standard, return} and edges x > 0: From the model information, some metrics were created that could be used in analysis with data mining and machine learning tools, as well as statistical analysis. Thus, the following metrics were created: • total dispersion: it is the measure of dispersion given by (2), that is, the calculation considers all edges of the LP. • advance dispersion: dispersion measure given by (3), considering for the calculation only the advance edges. • return dispersion: dispersion measure given by (3), considering for the calculation only the return edges.
• standard dispersion: it will always have a value of 1 if the number of edges is greater than 0, or 0 if there are no edges in the LP. • advance standard deviation: the standard deviation of advance edges dispersion. • advance variance: the variance of advance edge dispersion.
• average access to vertices: average access/interaction with all resources and activities available in the learning environment. • return proportion: the ratio between the number of edges of the return type and the total number of edges of the graph. This value was defined as a percentage. Therefore, proportion metrics always vary in value from 0 to 100. • return standard deviation: the standard deviation of the return edges dispersion. • return variance: the variance of the return edges dispersion. • advance proportion: the ratio between the number of advance edges and the total number of graph edges. • standard proportion: the ratio between the number of standard edges and the total number of graph edges. Figure 1 shows the visual representation of the LP model proposed in this work. Each vertex has a color that represents a type of resource/activity, and the diameter is proportional to the number of students' interactions (V), so it is possible to observe more clearly the access number to each instance. The values in the edges are from value attribute, they specify the edge thickness, and the arrow indicates the direction of the path. The green edges are the 'standard' edges and indicate the teacher's LP. The blue edge is the 'advance' edge. The red edges represent the 'return' edges. In the graph, it is possible to verify that the student is possibly finding it difficult to answer the assignment (third vertex).
As the model usages the data stored by the LMS, it is possible to follow the evolution of the student's LP over time. The model can be used to analyze data from a finished class or during classes because the model is flexible and allows the representation of data that can be delimited by a time interval.
For the feasibility study of this research, Moodle was selected since it is one of the LMSs most used in several educational institutions (Cerezo et al., 2014). The model is also being applied to an online judge, which will be presented in a future work.

Figure 1. A representation of the Learning Path Model
It was considered the implementation of plugins for Moodle so that the teacher will benefit from the tools developed directly in the LMS. Figure 2 represents the work architecture used. Students interact with the LMS, which records actions. In sequence, a plugin extracts the LPs from the data of the database. Finally, plugins use the LPs and present the result to the teacher. In this work, two plugins are presented, the first shows the LPs visually, and the second recommends groups for collaborative activities.
Given this, this paper has the following research question: How can Learning Paths help to improve the teaching-learning process in LMS?
This work deal with the application of the model in two activities: behavior analysis and group formation. Thus, this study has two questions derived from the main one: 1. How can Learning Paths help in student behavior analysis? 2. How can Learning Paths help in the group formation for collaborative activities?

LITERATURE REVIEw
A Systematic Review of Literature (SRL) was conducted to investigate how the LMSs organize and use the LP . The findings are summarized as follows.

Figure 2. Work architecture
It is possible to verify, through comparative Table 1, the relation between this work proposal with the strongly related works. Columns with trace mean it could not possibly identify the information in publications.
Two works represent the teacher and student paths at the same graph. However, Adesina and Molloy (2011) can show only one student LP per time, whereas Schröck et al. (2010) had manually created the map. In the study approach, the teacher also can observe the entire class or groups, and the tool automatically created the visual representation. Graphs structuring the LP data can generate possibilities for the creation of new tools, such as eGraph (Cerezo et al., 2014), LPGraph , M-Cluster (Ramos et al., 2017), the application of CbKST in Moodle (Sitthisak et al., 2013) or the selection of the shortest LP (Nurjanah & Fiqri, 2017). As this work will show group formation as one of the applications of the LP model proposed, next, some works about groups in LMS are presented.

Groups in Learning Management Systems
The following works present an overview of the different forms of student grouping in LMS. Jagadish (2014) used the KNN (K-Nearest Neighbor) to perform grouping of students in Moodle. Abnar et al. (2012) proposed applicated genetic algorithms and Likert scale, in their approach the teacher can choose a set of different attributes and classify them based on their impact on the groups' formation. Yathongchai et al. (2013) used the decision tree, data mining, and Hartigan indexing techniques to form student groups in Moodle based on their grades and behaviors. Montazer and Rezaei (2012) approached the creation of a called Hybrid Clustering Method (HCM) of grouping that consists of the junction of the methods K-Means and Fuzzy C-Means (Table 2).
The approach of Jagadish (2014) performs group formation explicitly, it collects data through forms/questionnaires. The works of Abnar et al. (2012), Montazer and Rezaei (2012), and Yathongchai et al. (2013) form the groups implicitly, the data sources used to characterize the profiles of the students were the logs extracted from the Moodle platform.

MATERIALS AND METHoDS
An SRL of LP in LMS  and group formation  and two field studies are conducted. Data from an already closed class of an LMS were collected, a distance course in Applied Informatics from a program of Degree in Physical Education, with five classes and 124 students, during the second academic semester of 2013, where the interactions recorded in the LMS database were studied. From analyzes performed, the representation of the model in the directed graph format was constructed. The number of interactions weights the vertices, and the edges have as weight the number of times the student has covered them. Next, two tools for Moodle in the plugin format to analyze the proposed model were developed, to validate the model and demonstrate its possible applications.
The first tool, called Learning Path Graph (LPGraph) , generates the LP model and creates a visual representation of the LP. The second tool is called Moodle Cluster or M-Cluster (Ramos et al., 2017). It uses the model data in conjunction with the K-Means algorithm to suggest groupings for collaborative activities. The tools created particularly seek to help the teachers in their activities of behavior analysis and grouping of students. Data from 82 students were used to test the tool during development (training data).
Subsequently, experiments with three programs were conducted: Physics, Electrical Engineering (EE), Computer Science (CS). Each experiment lasted a semester. In 2015/2, the LPGraph was tested with 113 students, and in 2016/2, we tested the M-Cluster with one class of 40 students. The 153 students signed a consent form accepting to participate in the experiments. In all, 206 students participated from tool development tests and 153 from experiments.

LPGRAPH
This section presents a Moodle plugin called LPGraph developed using the LP model. The experiments showed the analysis and how the teacher can select students and the time interval to generate a visual representation. Figure 3 shows the LPGraph plugin. LPGraph has data selection, selected data identification, graph options, the graph representing the LP, percentage of edges, and, finally, a list of resources and activities.
Teachers can select which data they want to use to generate the graph. Below the graph, there are the edges proportions. The proportions are calculated dynamically. When the teacher modifies the "Paths" option of the graph, the bar graph will adjust to the number of edges displayed. Within the bars have percentages of each type of edge, and the axis presents the absolute value. Below the bar chart is the list of resources and activities.

Implementation of Visual Representation
The representation has two levels. Level 1 represents course topics or units (Figure 4), and Level 2 displays the graph representing the LP ( Figure 5). Colored circles represent the vertices. Inside the circles appear three information: • Order: the position of the vertex within the sequence (pos attribute).
• Module name: type of resource or activity such as 'Forum' or 'Assignment.' • V: number of instances views. Indicates how many times students access the vertex. The edges are colorful and follow the idea of the proposed model. The thickness of the edge is proportional to this value.
Regarding the options for viewing the paths, the teacher can show or hide resources and activities that are not accessed. Figure 6 shows all resources and activities, including vertices not accessed by students. Figure 7 shows only vertices accessed from the same course.
As many edges were generated, it was necessary to create some options to limit the number of edges displayed and thus, to facilitate the understanding of the graph. The options for displaying the LP are: • Show all: Displays all edges of the graph. • Most used: for each vertex, only the most used edge. This option presents the main LP. • Most used by Type: for each vertex, the graph shows only the highest value edge of each type that leaves it. In this case, there are at most three edges per vertex. • Least used: for each vertex, only the lower weighted edge of the vertex is displayed.

Experiment with LPGraph
The LPGraph plugin was used in two Moodle courses: Introduction to Computer Science (CS1), with one class of Physics (28 students), and one of EE (42 students), and Discrete Mathematics (DM) (43 students) from a class of CS. Courses are in the topic format, which divided their content into modules according to the topic issue, ordered by the teachers. The objective of the first module, both CS1, and DM, was to address general guidelines such as teaching plan, presentation of the course, and use of a mobile application that was made available to students. Also, by default, the News forum was presented in the first module, where the teacher can communicate information about the course, such as the disclosure of notes, change of place/time of classes, or a new evaluation date.
About resources and activities, Moodle courses used: Forum, File, Page, Quiz, Assignment, and URL. Each module contained an exclusive question forum for the topic material. The resource File provides lesson content, exercise list, and guidelines for practical activities. The resource Page and the URL served to deliver links to external features. The Quiz and the Assignment are activities used to evaluate students. Figure 3 presents the options available to the teacher to specify the source of the data to be analyzed (Data selection). It can choose by group (class), users, and the range of data.
Next, the results of the experiment are presented. Figures show the "Most used" LP option to facilitate the visualization of the graphs.

Results observed During the Experiment
This section presents an analysis of the visual representation of LPs. The teacher can visualize the entire LP, but this analysis was based on students' actions taken only on 'Topic 1' because it is impossible to show all topics here. Figure 8, 9, and 10 present the LPs of the courses CS1 and DM partially, according to the available settings, with the selection of the class in the Group option for CS1 courses and showing only the most used paths. Figures present only a piece related to Topic 1 of each course. It is interesting to realize that Figures 8 and 9 are from the same course (CS1), analyzed during the same time interval, but each class presents a different behavior. The class of Physics has navigation more concentrated, which can be observed by the wider edges. It is also possible to conclude that vertices 10 (Lesson01 -Variables, and Sequential Structure -Python) and 12 (LabCod01 -Python) are essential for learners who access the questionnaire, indicated by vertex 14 (Quiz 1 -Python). In this case, it became clear that the studies of the class of Physics (Figure 9) focus on Python programming language.
In both classes, the learners interact much more with the questionnaires. Although the Physics program has fewer students than the EE program, the number of interactions with vertices of type File is similar. In the end, for the activity of the questionnaire, the class grade average is slightly higher for the Physics class, which in general interacted more, considering the number of students, they are 28 students against 42 of the EE class.
However, comparing the grade of Topic 1 between the two classes, the lowest grades were of the Physics class students. They returned to previous resources more frequently, which is indicated by return edges, comparing to the other edges' types. This scenario seems to suggest that the students had difficulties in understanding the topic concept and, therefore, had to go back to the previous content more times. The tendency to return to earlier resources remains higher during the course for Physics class. At the end of the course, analyzing the final grades, the uppermost grades are in the EE program.
Thus, for the analyzed classes, a higher proportion of the number of red edges indicates that there is a high chance of the students be experiencing learning difficulties. Figure 10 presents a partial view of the LP of the Discrete Mathematics class. It shows that the class usually accesses resources directly but little access to external links (URL). In Topic 1 were 27 access (vertex 11) and 16 accesses (vertex 12), although the class had 43 students enrolled. The teacher, in possession of this information, could assess why students do not use these resources effectively. A student can view the same resource multiple times, that is, at most, 27 students accessed the vertex 11 (URL) if each of them accessed only once. Therefore, the teacher can also identify if students are using an available resource/activity.
LPGraph was also used to analyze student LP individually. In this case, The LPs of some students with high and low grades at the 'Topic 1' were observed. It was noticed that for the first case, students' paths contained few edges, and therefore were visually cleaner. Also, the most used edges are predominantly the 'advance' edges. When analyzing the LP with students with lower grades, it was observed that there are more edges than LP with students with higher grades. Also, they are more dispersed and that when selecting the option to visualize only the most used tracks, the amount of return edges is higher. Together with this information, a small number of edges may also indicate a tendency to evasion. Figure 11 shows a student LPs, from the CS1 course, with grade close to ten (max grade) at the end of Topic 1 and the proportion of the number of edges per type. Figure 12 shows the LPs of a student with a low grade for Topic 1. It also shows a more significant number of dispersed edges, and when viewing only the most used edges, the proportion of edges of type 'return' tends to increase.
Students with the lowest grades have more 'return' edges, and have limited access to activities, for example, the instance '15 Quiz'. Students whose grade is high have more 'advance' edges and interact more with activities. Both kinds of students accessed instances of file or forum occasionally, and their dispersion is opposite, students with higher grades often access instances close to each other, whereas students with the lowest grades navigate between instances that are far each other.
After the end of the course, a correlational analysis was performed between the attributes created from the model and the students' grades in Topic 1 and in the final grade. The Table 3 shows only the significant correlations (ρ> 0.3) found. The analysis was performed considering only the students with the 10 highest and 10 lowest grades in each class, as well as in the visual analysis, this allows to better analyze the differences between students with opposite performances. Thus, the relationship between the attributes and the grades can be better perceived.
Students with better grades tended to use the advance edges more. The metrics related to the advance edges confirm this observation, although they are more evident in the DM course. It is possible that the fact of the EE and Physics programs share the same environment may have generated lower correlation values. For example, while the EE class answered the C language quiz, the Physics class answered the Python language quiz, but both activities were visible for both programs. The average access to resources and activities is positively related to the grades, contributing to the results found in other works in the literature. When observing the negative correlations of the total and return dispersions, a more dispersed student may not have good grades depending on the type of dispersion. From an educational point of view, the advance dispersion can be understood as a student's quest to satisfy his/her curiosity about what subjects will come next and this dispersion is positively related to the grades. The metrics related to the return edges are positively correlated to the grades. Although the access to previous content is related to possible difficulties of the student, that such difficulties can be overcome by means of a content review before carrying out an evaluative activity. Figure 12. Student LP of the EE class, whose grade is low. This student accessed the quiz just only once. Figure 11. Student LP of the Physics class, whose grade is high. This student uses more resources/activities.

M-CLUSTER
Groups are a set of individuals that forms and change themselves for multiple purposes. The model was used to develop the Moodle Cluster (M-Cluster) (Ramos et al., 2017) for group recommendation. The learning path model is the user model of M-Cluster. It uses the K-Means algorithm and three similarity metrics, which are the distances: Euclidian, Manhattan, and Cosine, along with attributes (vertex access average, amount, dispersion, and variance of standard, advance and return edges), obtained from the proposed LP model. The Euclidean distance was used because it is one of the most used metrics for calculating the center of the clusters, whereas the Manhattan distance is a simple version of the Euclidean distance. The cosine distance is a popular metric in recommendation systems in general. The tool used these similarity metrics because vectors represent the attributes and because they are metrics used in some works related to this research.
M-Cluster generates homogeneous groupings (groups with a similarity between LPs). The tool suggests to the teacher three results of group formation, and he/she can choose which one is more suitable for his/her students in a specific activity. The tool represents this grouping in two ways: the descriptive list (Figure 13), and the bubble chart ( Figure 14).
The descriptive list is an unordered list in which a group identification followed by a list of names of the students belonging to the group. The bubble chart displays the bubble groups, and each bubble shows the names of students belonging to the same group.

Experiment with M-Cluster
It was conducted field research to collect training data from an LMS for two semesters, from 82 students of two classes already closed, one from the CS program and the other from the Computer Engineering program, where it was analyzed the interaction of group activities. From analyzes performed, the degree of similarity grouped students, calculated using the attributes extracted from LP.
Thus, the formations of the students' group were analyzed using many cluster algorithms of data mining and machine learning techniques to generate the suggestions of groups of students. Then the best attributes (mentioned before) from LP for grouping were selected. Subsequently, with new data, it was performed validation in a Discrete Mathematics course, of a class of 40 students in progress in blended mode.

Results observed During the Experiment
Two activities were performed in the course to corroborate the indicated groups. In the first activity, students chose their group partners, and in the second, the groups' formation followed according to the tool suggestions. The teacher validated and visualized the groups suggested by the tool.  Figure 15 shows the individual improvement between activities of the analyzed students (n=40). Dropouts do not appear in the chart. The experiments showed that 75% of the students matched or exceeded their grades in the second activity concerning the grades achieved in the first activity. When analyzing groups, from the total number of formed groups, 30% were identical pairs to those of the first activity (these same students obtained good results in both activities). These results are not conclusive yet, but they are a starting point for deeper future analyzes.

RESULTS AND DISCUSSIoN
The studies were empirical, in which the results show that it is possible to use the proposed model to perform actions such as group formation and behavior analysis. The visual tool developed from the proposed model can show the student's and class' behaviors. The model was applied in group formation for collaborative activities, and the teacher of the course analyzed the suggested groupings and evaluated the results positively.
With LPGraph, the teacher can visually observe which resources and activities the student, group, or entire class are using more. Teachers can also track students' paths along time. The LP are dynamic, and LPGraph can generate visualizations for different configurations. In this way, our approach can make real-time analysis too. In the experiments it was possible to check some observations on student behavior using the proposed model, which confirms the first question. It was analyzed Topic 1 of two courses, CS1 and DM, and compare the different characteristics of the LPs such as access, the proportion between the type of edges, and dispersions. The tool can analyze a class in progress or an already closed class.
M-Cluster was able to suggest groups of students who performed the same or above the activity where students chose their partners. This observation confirms our second question. Nevertheless, it is important to conduct other experiments in different situations to verify the completeness of the approach. The study performed groups' formation of students with the similarity of LPs. So, groups formed are homogeneous. The tool, in some cases, grouped, for the second activity, the same students who chose themselves for the first activity.

CoNCLUSIoN
This work presented a new representation model of LP and its application in group formation and behavior analysis. A tool called LPGraph that identifies and represents the LPs of students who use Moodle was created. Another developed tool from the model was the M-Cluster, which makes grouping suggestions by applying the K-Means algorithm with attributes generated from the proposed model.

Figure 15. Discrete Mathematics students' grades improvement
LPGraph deals with the visual representation of the proposed LP model to assist the teachers in monitoring the learning process of their students. M-Cluster was implemented considering future improvements to allow more types of groupings, such as homogeneous, heterogeneous, and hybrid. Thus, the teacher can choose which type of grouping to use for a given collaborative activity. It is possible to cite as contributions of this work: • Grouping of the students according to their LP. • Identification of LP-based attributes in graph format since until then, they were not observed in the literature on the group recommendation of students in LMS.
New researches can use the presented model for many applications besides those applied in this work. Among the most relevant future works, it is possible to mention: • Formation of both homogeneous and heterogeneous groups, giving the teacher the option to group individuals with complementary LP. • Integration with the techniques of collaborative learning. • Checking of which types of paths are most likely to improve student performance, and • Contribution to the creation of adaptive LMS.
The tools have some limitations. LPGraph does not store LPs, it only generates them in realtime. The tool was tested in courses with topic format only. The M-Cluster still does not recommend heterogeneous groups. The study analyzed only the data of students who agreed to participate in the research, limiting the number of participants. Despite the above limitations, this work was concerned with developing an LP model in a way that could serve as a basis for other research. Based on what was observed, it is intended for future work to automate the analysis of LPs, reducing the effort required by the teachers to understand the behavior of their students. With this, it will be possible to provide diagnoses and suggestions for decision making. Currently, a study on the relationship between LPs and the achievement goals orientations theory is being conducted.