Sequence Clustering Techniques in Educational Data Mining

Sequence Clustering Techniques in Educational Data Mining

Qi Guo, Ying Cui, Jacqueline P. Leighton, Man-Wai Chu
DOI: 10.4018/978-1-7998-3476-2.ch005
(Individual Chapters)
No Current Special Offers


Digital technology has profound impacts on modern education. Digital technology not only greatly improves access to quality education, but it also can automatically save all the interactions between students and computers in log files. Clustering of log files can help researchers better understand students and improve the learning program. One challenge associated with log file clustering is that log files are sequential in nature, but traditional cluster analysis techniques are designed for cross-sectional data. To overcome this problem, several sequence clustering techniques are proposed recently. There are three major categories of sequence clustering techniques: Markov chain clustering, sequence distance clustering, and sequence feature clustering. The purpose of this chapter is to introduce these sequence clustering techniques and discuss their potential advantages and disadvantages.
Chapter Preview


Throughout the modern history of education, educators always attempt to make education more realistic, more performance-based, and more like one-on-one coaching (Mayrath, Clarke-Midura, & Robinson, 2011). Instead of letting students remember knowledge, educators want them to apply knowledge to solve real world problems. Instead of simply giving students a single test score, educators want to give students unique feedback, and recommend them to learn what they need to learn. However, in reality, educators simply do not have enough time and resources to achieve these goals. Allowing students to solve real world problems can be costly and risky. Observing students solving problems requires lots of instructors, and the observation process may interfere students. One-on-one coaching is not affordable when an instructor needs to teach a large class with students from diverse backgrounds.

Key Terms in this Chapter

Long Short-Term Memory Network: An artificial neural network architecture that can process sequential information as input and/or output, and have the capacity to retain information in earlier part of a long sequence.

Sequential Pattern Mining: A data mining technique that aims to identify commonly occurring subsequences from sequential data.

Discrete Markov Chain: A probabilistic model that assumes a categorical variable’s future state depends only on the current state. After controlling for the current state, the variable’s future state is independent of its states.

Markov Chain Clustering: A sequence clustering approach that attempts to cluster sequences by assigning them to K unique Markov chains.

Dynamic Time Warping: An algorithm to measure the distance between two numerical sequences.

Sequence Distance Clustering: A sequence clustering approach that first computes a sequence distance matrix, and then applies hierarchical clustering to the distance matrix.

Sequence Feature Clustering: A sequence clustering approach that first extracts sequential features from each sequence, and then apply traditional cluster analysis algorithms to the sequential features.

Complete Chapter List

Search this Book: