Visualizing Historical Patterns in Large Educational Datasets

Visualizing Historical Patterns in Large Educational Datasets

Tiago Martins (Instituto Superior Técnico / University of Lisbon, Lisbon, Portugal), Daniel Gonçalves (INESC-ID / Instituto Superior Técnico / University of Lisbon, Lisbon, Portugal) and Sandra Gama (INESC-ID / Instituto Superior Técnico / University of Lisbon, Lisbon, Portugal)
DOI: 10.4018/IJCICG.2018010103
OnDemand PDF Download:
No Current Special Offers


With the increase in the number of students worldwide, it has become difficult for teachers to track their students or even for institutions themselves to identify anomalies in degrees and courses. The sheer amount of data makes such an analysis a daunting task. A possible solution to overcome this problem is the use of interactive information visualization. In this article, the authors developed a visualization that allows users to explore and analyze large datasets of academic performance data allowing the analysis and discovery of temporal evolution patterns for courses, degrees and professors. The authors applied the techniques to fourteen years of data for all students, courses, and degrees of a Portuguese engineering college. The system's usability and usefulness were tested, confirming its ability to allow analysts to efficiently and effectively understand patterns in the data.
Article Preview


Education levels have been increasing across the world. This is true for traditional learning in brick-and-mortar schools and colleges, and even more so given the rise of new technologies in distance education such as Massive Online Open Courses (MOOC), supported by learning management systems (LMS) and course management systems (CMS). The number of students has, thus increased throughout the years and with it, the need for new tools to be able to understand of effective the learning experience is.

Platforms such as CMS and LMS allow the easy storage of records related to student grades, attendance in classes and approvals in a course. Analyzing this data may allow the detection of students with problems or even issues with the content and structure of the courses themselves (Baepler, 2010). The analysis of this data can not only allow a deeper understanding of eventual problems, but also helps alleviate the problems of a reality where students and professors are less knowledgeable of one another, either due to in-presence teaching massification or in on-line, distance learning settings, which make it difficult to perceive their problems and to track down their causes (May, 2011). This can lead to deeper reflections about the contents being taught and the development of strategies to minimize failure and lower dropout rates

Given the amount of data that these systems collect over multiple years, analyzing it may be a very difficult task and require high cognitive effort. A possible solution that allows the understanding and identification of relevant patterns in data is Information Visualization (Desai, 2014). By using interactive Information Visualization (InfoVis), one can explore different facets of the data and cope with the high complexity of very large datasets by using filtering and selection mechanisms. The use of derived measures that embody higher-level trends can provide precious different perspectives from the original information (Mazza, 2009).

We propose a solution where InfoVis is used to represent large datasets of academic performance data. A major challenge in this regard is scalability. While most previous approaches, as described in the Related Work section, deal with data from a single course or degree, we want to be able to address the problem at the school level. As such, we devised a set of minimalistic views which, while sacrificing some more advanced analysis, make up for that shortcoming with the possibility for long-term analysis of all degrees and courses of an entire university-level school. An additional hurdle, the heterogeneity of curricula between degrees and even inside the same degree through time (as curricular restructuring takes place) also contribute to make this a challenging domain.

As a proof of concept, we used the data from Instituto Superior Técnico (IST), the engineering school of the University of Lisbon, encompassing fourteen years and all undergraduate and MSc level degrees. To that end, we used official information sources made available through the school’s information system, based on the FenixEdu platform that provides an API for certain data to be extracted.

Given the large dataset, studying the evolution of a degree or course over the years or discovering patterns and trends in the data could still become a time-consuming process without prior data processing (Ali, 2012). Thus, to make this data amenable to visualization, the entire dataset was pre-processed, cleaned and re-structured, as will be described below. The resulting dataset, with several thousand entries, was then successfully visualized by our solution, allowing interesting patterns to be found, as shown by usability and usefulness tests, performed with the help of IST’s Statistics and Prospective Unit.

We, thus, developed a system that allows the analysis of the academic path of students and the evolution of degrees and courses over the years. This system may help professors, course coordinators and the School itself with assessing relevant issues that can be timely solved and improve the teaching-learning process.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 13: 2 Issues (2022): Forthcoming, Available for Pre-Order
Volume 12: 2 Issues (2021): 1 Released, 1 Forthcoming
Volume 11: 2 Issues (2020)
Volume 10: 2 Issues (2019)
Volume 9: 2 Issues (2018)
Volume 8: 2 Issues (2017)
Volume 7: 2 Issues (2016)
Volume 6: 2 Issues (2015)
Volume 5: 2 Issues (2014)
Volume 4: 2 Issues (2013)
Volume 3: 2 Issues (2012)
Volume 2: 2 Issues (2011)
Volume 1: 2 Issues (2010)
View Complete Journal Contents Listing