Grouping of Questions From a Question Bank Using Partition-Based Clustering

Grouping of Questions From a Question Bank Using Partition-Based Clustering

DOI: 10.4018/978-1-7998-3772-5.ch002
OnDemand PDF Download:
List Price: $37.50
10% Discount:-$3.75


During automatic test paper generation, it is necessary to detect percentage of similarity among questions and thereby avoid repetition of questions. In order to detect repeated questions, the authors have designed and implemented a similarity matrix-based grouping algorithm. Grouping algorithms are widely used in multidisciplinary fields such as data mining, image analysis, and bioinformatics. This chapter proposes the use of grouping strategy-based partition algorithm for clustering the questions in a question bank. It includes a new approach for computing the question similarity matrix and use of the matrix in clustering the questions. The grouping algorithm extracts n module-wise questions, compute n × n similarity matrix by performing n × (n-1)/2 pair-wise question vector comparisons, and uses the matrix in formulating question clusters. Grouping algorithm has been found efficient in reducing the best-case time complexity, O (n× (n-1)/2 log n) of hierarchical approach to O (n × (n-1)/2).
Chapter Preview

Terminology Used

The terminology used is presented in the table below -

Table 1.
Terminology used for question clustering
Subject (S)S is a subject/paper offered in different semesters of a course.
Modules/UnitsFor each subject, there is a university pre-
scribed syllabus which consists of different modules/units.
Question Bank (QB)QB is a database which stores module wise questions with its details such as question- no, question-content, question-type, question- marks and question-answer-time
QQ is the total number of questions stored under a module
titi refers to the total number of questions in
which term i appears
f reqijf reqij is the frequency of term i in question j
frequency (max freqij)
max freqij is the maximum frequency of a
term in question j
term frequency (tfij)tfij refers to the importance of a term i in
question j. It is calculated using the formula:
tfij = freqij/max freqij
Inverse Document Frequency (id fi)id fi refers to the discriminating power of term
i and is calculated as: id fi=log2 (Q/ti)
tf-idf weighting (Wij)It is a weighting scheme to determine weight
of a term in a question. It is calculated using the formula: Wi j= tfij × id fi
Ti (question qi)
A set of terms extracted from each question
by performing its tokenization, stop word removal, taxonomy verb removal and stemming
Theshold, δUser input threshold value to find the similarity

Complete Chapter List

Search this Book: