Amplifying Participant Voices Through Text Mining

Amplifying Participant Voices Through Text Mining

Jonathan S. Lewis
Copyright: © 2020 |Pages: 17
DOI: 10.4018/978-1-7998-1173-2.ch011
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Text mining presents an efficient, scalable method to separate signals and noise in large-scale text data, and therefore to effectively analyze open-ended survey responses as well as the tremendous amount of text that students, faculty, and staff produce through their interactions online. Traditional qualitative methods are impractical when working with these data, and text mining methods are consonant with current literature on thematic analysis. This chapter provides a tutorial for researchers new to this method, including a lengthy discussion of preprocessing tasks and knowledge extraction from both supervised and unsupervised activities, potential data sources, and the range of software (both proprietary and open-source) available to them. Examples are provided throughout the paper of text mining at work in two studies involving data collected from college students. Limitations of this method and implications for future research and policy are discussed.
Chapter Preview
Top

Background

Text mining is a form of data mining that can “turn text into numbers,” thus facilitating the efficient processing of large amounts of text data where traditional qualitative methods are impractical or inefficient (Miner et al., 2012, p. 30). In an educational context, text mining can be used to investigate a range of qualitative data provided by students through the teaching, research, and administrative functions of an institution (Zilvinskis & Michalski, 2016).

Key Terms in this Chapter

Stemming: The automatic process of reducing natural language to its most common root form.

Unsupervised Learning: A text mining algorithm, such as cluster analysis, that can analyze natural language automatically, without needing to be trained and refined by a researcher.

Disambiguate: The identification of word meaning in context.

Lemmatization: The automatic process of reducing natural language to a common and linguistically valid root form.

Corpus: A collection of documents for use in text mining analysis.

Include-Word List: Also known as a dictionary, an include-word list is a collection of words and phrases that can be used for basic indexing, or more complex clustering and classification analyses.

Keyword in Context (KWIC): A visual tool that displays every instance where a selected word or phrase appears, along with a certain number of words that appear before and after.

Cluster Analysis: A process of grouping similar words or documents.

Supervised Learning: A text mining algorithm, such as email spam filtering, that is developed and refined through the use of a training data set.

Inverse Document Frequency: A mathematical transformation (and weight) that aims to balance the frequency with which a word appears across all documents, with the goal of determining the relative importance of the word.

Stop-Word List: A collection of words that is excluded from text mining analysis.

Tokenization: The automatic process of searching natural language to identify recognizable and distinct words.

Dendrogram: A tree graph that attempts to reproduce the relative distance between all items (i.e., words) included in a cluster analysis.

Sentiment Analysis: A process of uncovering subjective mental and emotional states in natural language.

Complete Chapter List

Search this Book:
Reset