Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

LSA in the Classroom

Walter Kintsch, Eileen Kintsch

Source Title: Applied Natural Language Processing: Identification, Investigation and Resolution

DOI: 10.4018/978-1-60960-741-8.ch009

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

LSA is a machine learning method that constructs a map of meaning that permits one to calculate the semantic similarity between words and texts. We describe an educational application of LSA that provides immediate, individualized content feedback to middle school students writing summaries.

Chapter Preview

Top

Introduction

The development of ever more efficient machine learning systems during the past decades has the potential to revolutionize computer applications in education. These systems are capable of learning, without supervision, the meaning of words from a large linguistic corpus, as well as the meaning of sentences and texts composed with these words. As we shall show, certain restrictions apply, but this work has already reached a sufficient level of maturity with several educational applications currently in use. Examples of the kind of systems we have in mind are Latent Semantic Analysis (LSA) (Landauer, McNamara, Dennis, & Kintsch, 2007), the topics model (Griffiths, Steyvers, & Tenenbaum, 2007), and the holograph model (Jones & Mewhort, 2007). We shall limit our discussion here to LSA, the method most widely used in education. The following section briefly summarizes the LSA method, but an example of an educational application of LSA will be the main focus of this chapter, concluding with a brief discussion of the limitations of this approach.

LSA was introduced by Landauer and Dumais in a seminal paper in 1997 (Landauer & Dumais, 1997). LSA was originally developed in the context of information retrieval, but Landauer and Dumais realized the potential of the method for modeling a wide variety of semantic phenomena. LSA infers word meanings from analyzing a large linguistic corpus. An example of a widely used corpus is the TASA corpus that consists of 44k documents a high-school graduate might have been exposed to during his or her lifetime. The total corpus comprises 11M word tokens, about 90k different words. This is a rich corpus, but the only information LSA actually uses consists of which words co-occurred which other words in each document. Sentence structure, syntax, discourse structure and so on are all neglected. Nevertheless, there is a great deal of information remaining, which LSA makes good use of.

The input to LSA consists of a huge matrix, listing the frequencies with which each word occurs in each document. This is an extremely sparse matrix, with most cells filled with 0’s, because most words co-occur with only relatively few other words. The problem with such a matrix is that words whose meanings are quite unrelated do co-occur in the same document. Thus, although the raw word vector has all the right information in it, it is drowned in a sea of irrelevancies. What we want is the latent structure underlying the co-occurrence data, disregarding the noise inherent in the data. This latent structure is what LSA computes. LSA first uses a weighting scheme that de-emphasizes semantically uninformative words. For instance function words like “the,” “of,” or “but” play a very important role in comprehension in that they allow us to construct the syntactic structure of a sentence, specifying which role each word plays in a sentence. But since these high-frequency function words occur with many different words, they carry little weight semantically. LSA then uses a well-known mathematical technique called singular-value decomposition to reduce the dimensionality of the original matrix to about 300 dimensions. Dimensionality reduction achieves a two-fold purpose: It gets rid of much of the irrelevant noise in the corpus data, revealing its latent structure, and it fills in the original, sparse matrix, relating the main meaning-bearing words to each other, whether they had co-occurred in the corpus or not. As a result, in LSA each word in the corpus and each document is represented by a vector of 300 numbers. These numbers have no meaning by themselves, but together they define a semantic space – a high-dimensional map of meanings. Just as in a familiar two-dimensional map we can locate any two points with respect to each other and measure the distance between them, we can locate word meanings and document meanings in this 300-dimensional space and measure their distance. A useful measure of the similarity of two words, or of a word and a document, is the cosine between their vectors. Words that are unrelated have a cosine of 0 (or even a small negative value), and the more similar they are, the higher their cosine; identical words have a cosine of 1. Introductions to how LSA actually works can be found in Landauer and Dumais (1997) and Landauer et al. (2007).

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

LSA in the Classroom

Abstract

Introduction

Complete Chapter List