Speechfind: Advances in Rich Content Based Spoken Document Retrieval

Wooil Kim; John H.L. Hansen

doi:10.4018/978-1-59904-879-6.ch017

Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Speechfind: Advances in Rich Content Based Spoken Document Retrieval

Wooil Kim, John H.L. Hansen

Source Title: Handbook of Research on Digital Libraries: Design, Development, and Impact

DOI: 10.4018/978-1-59904-879-6.ch017

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

This chapter addresses a number of advances in formulating spoken document retrieval for the National Gallery of the Spoken Word (NGSW) and the U.S.-based Collaborative Digitization Program (CDP). After presenting an overview of the audio stream content of the NGSW and CDP audio corpus, an overall system diagram is presented with a discussion of critical tasks associated with effective audio information retrieval that include advanced audio segmentation, speech recognition model adaptation for acoustic background noise and speaker variability, and information retrieval using natural language processing for text query requests that include document and query expansion. Our experimental online system entitled “SpeechFind” is presented which allows for audio retrieval from the NGSW and CDP corpus. Finally, a number of research challenges as well as new directions are discussed in order to address the overall task of robust phrase searching in unrestricted audio corpora.

Chapter Preview

Top

Introduction

The focus of chapter is to provide an overview of the SpeechFind online spoken document retrieval system, including its subtasks, corpus enrollment, and online search and retrieval engines (Hansen, Huang, Zhou, Seadle, Deller, Gurijala, et al., 2005, http://www.ngsw.org) and the Collaborative Digitization Program (CDP, http://cdpheritage.org). The field of spoken document retrieval requires an interdisciplinary effort, with researchers from electrical engineering (speech recognition), computer science (natural language processing), historians, library archivists, and so forth. As such, we provide a summary of acronyms and definition of terms at the end of this chapter to assist those interested in spoken document retrieval for audio archives.

The problem of reliable speech recognition for spoken document/information retrieval is a challenging problem when data are recorded across different media, equipment, and time periods. NGSW is the first large-scale repository of its kind, consisting of speeches, news broadcasts, and recordings that are of significant historical content. The U.S. National Science Foundation recently established an initiative to provide better transition of library services to digital format. As part of this Phase-II Digital Libraries Initiative, researchers from Michigan State University (MSU) and University of Texas at Dallas (UTD, formerly at Univ. of Colorado at Boulder) have teamed to establish a fully searchable, online WWW database of spoken word collections that span the 20th century. The database draws primarily from holdings of MSU’s Vincent Voice Library (VVL) that includes +60,000 hours of recordings.

In the field of robust speech recognition, there are a variety challenging problems that persist, such as reliable speech recognition across wireless communications channels, recognition of speech across changing speaker conditions (e.g. emotion and stress [Bou-Ghazale & Hansen, 2000; Hansen, 1996; Sarikaya & Hansen, 2000] and accent [Angkititrakul & Hansen, 2006; Arslan & Hansen, 1997]), or recognition of speech from unknown or changing acoustic environments. The ability to achieve effective performance in changing speaker conditions for large vocabulary continuous speech recognition (LVCSR) remains a challenge, as demonstrated in recent DARPA evaluations focused on broadcast news (BN) vs. previous results from the Wall Street Journal (WSJ) corpus.

One natural solution to audio stream search is to perform forced transcription for the entire dataset, and simply search the synchronized text stream. While this may be a manageable task for BN (consisting of about 100 hours), the initial offering for NGSW will be 5000 hours (with a potential of +60,000 total hours), and it will simply not be possible to achieve accurate forced transcription since text data will generally not be available. Other studies have also considered Web-based spoken document retrieval (SDR) (Fujii & Itou, 2003; Hansen, Zhou, Akbacak, Sarikaya, & Pellom, 2000; Zhou & Hansen, 2002). Transcript generation of broadcast news can also be conducted in an effort to obtain near real-time close-captioning (Saraclar, Riley, Bocchieri, & Goffin, 2002). Instead of generating exact transcripts, some studies have considered summarization and topic indexing (Hori & Furui, 2000; Maskey & Hirschberg, 2003; Neukirchen, Willett, & Rigoll, 1999), or more specifically, topic detection and tracking (Walls, Jin, Sista, & Schwartz, 1999), and others have considered lattice-based search (Saraclar & Sproat, 2004). Some of these ideas are related to speaker clustering (Moh, Nguyen, & Junqua, 2003; Mori & Nakagawa, 2001), which is needed to improve acoustic model adaptation for BN transcription generation. Language model adaptation (Langzhou, Gauvain, Lamel, & Adda, 2003) and multiple/alternative language modeling (Kurimo, Zhou, Huang, & Hansen, 2004) have also been considered for SDR. Finally, cross and multilingual-based studies have also been performed for SDR (Akbacak & Hansen, 2006; Navratil, 2001; Wang, Meng, Schone, Chen, & Lo, 2001).

Key Terms in this Chapter

LVCSR: Large Vocabulary Continuous Speech Recognition

Word Error Rate: (WER): A performance measure for speech recognition that includes substitution errors (i.e., miss-recognition of one word for another), deletion errors (i.e., words missed by the recognition system), and insertions (i.e., words introduced into the text output by the recognition system).

Mel Frequency Cepstral Coefficients: (MFCC): A standard set of features used to parameterize speech for acoustic models in speech recognition

NGSW: The National Gallery of the Spoken Word – National Science Foundation (NSF in USA) supported Digital Libraries Initiative consortium of Universities to establish the first nationally recognized, fully searchable online audio archive.

Broadcast News: (BN): An audio corpus consisting of recordings from TV and radio broadcasts used for developing/performance assessment of speech recognition systems

Out-of-Vocabulary: (OOV): In speech recognition, the available vocabulary must first be defined. OOV refers to vocabulary contained in the input audio signal, which is not part of the available vocabulary lexicon, and therefore will always be miss-recognized using automatic speech recognition.

Managing Gigabytes (MG): One of the two general purpose-based systems available for text search and indexing. See the textbook by Witten, Moffat, and Bell (1999) for extended discussion.

SDR: Spoken Document Retrieval

Collaborative Digitization Program (CDP): A consortium of libraries, universities, and archives working together to establish best practices for transitioning materials (e.g., audio, image, etc.) to digital format.

ASR: Automatic Speech Recognition

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Speechfind: Advances in Rich Content Based Spoken Document Retrieval

Abstract

Introduction

Key Terms in this Chapter

Complete Chapter List