Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Unstructured Environmental Audio: Representation, Classification and Modeling

Selina Chu, Shrikanth Narayanan, C.-C. Jay Kuo

Source Title: Machine Audition: Principles, Algorithms and Systems

DOI: 10.4018/978-1-61520-919-4.ch001

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Recognizing environmental sounds is a basic audio sigFnal processing problem. The goal of the authors’ work is on the characterization of unstructured environmental sounds for understanding and predicting the context surrounding of an agent or device. Most research on audio recognition has focused primarily on speech and music. Less attention has been paid to the challenges and opportunities for using audio to characterize unstructured audio. The authors’ research investigates issues in characterizing unstructured environmental sounds such as the development of appropriate feature extraction algorithm and learning techniques for modeling backgrounds of the environment.

Chapter Preview

Top

Introduction

Unstructured audio is an important aspect in building systems that are capable of understanding their surrounding environment through the use of audio and other modalities of information, i.e. visual, sonar, global positioning, etc. Consider, for example, applications in robotic navigation, assistive robotics, and other mobile device-based services, where context aware processing is often desired. Human beings utilize both vision and hearing to navigate and respond to their surroundings, a capability still quite limited in machine processing. The first step toward achieving recognition of multi-modality is the ability to process unstructured audio and recognize audio scenes (or environments).

By audio scenes, we refer to a location with different acoustic characteristics such as a coffee shop, park, or quiet hallway. Differences in acoustic characteristics could be caused by the physical environment or activities of humans and nature. To enhance a system's context awareness, we need to incorporate and adequately utilize such audio information. A stream of audio data contains a significant wealth of information, enabling the system to capture a semantically richer environment. Moreover, to capture a more complete description of a scene, the fusion of audio and other sensory information can be advantageous, say, for disambiguation of environment and object types. To use any of these capabilities, we have to determine the current ambient context first.

Most research in environmental sounds has centered mostly on recognition of specific events or sounds. To date, only a few systems have been proposed to model raw environment audio without pre-extracting specific events or sounds. In this work, our focus is not in the analysis and recognition of discrete sound events, but rather on characterizing the general unstructured acoustic environment as a whole. Unstructured environment characterization is still in its infancy. Current algorithms still have difficulty in handling such situations, and a number of issues and challenges remain. We briefly describe some of the issues that we think make learning in unstructured audio particularly challenging:

•
One of the main issues arises from the lack of proper audio features for environmental sounds. Audio signals have been traditionally characterized by Mel-frequency cepstral coefficients (MFCCs) or some other time-frequency representations such as the short-time Fourier transform and the wavelet transform, etc. We found from our study that traditional features do not perform well with environmental sounds. MFCCs have been shown to work relatively well for structured sounds, such as speech and music, but their performance degrades in the presence of noise. Environmental sounds, for example, contain a large variety of sounds, which may include components with strong temporal domain signatures, such as chirpings of insects and sounds of rain. These sounds are in fact noise-like with a broad spectrum and are not effectively modeled by MFCCs.
•
Modeling the background audio of complex environments is a challenging problem as the audio, in most cases, are constantly changing. Therefore the question is what is considered the background and how do we model it. We can define the background in an ambient auditory scene as something recurring, and noise-like, which is made up of various sound sources, but changing over time, i.e., traffic and passers-by on a street. In contrast, the foreground can be viewed as something unanticipated or as a deviation from the background model, i.e., passing ambulance with siren. The problem arises when identifying foreground existence in the presence of background noise, given the background also changes with a varying rate, depending on different environments. If we create fixed models with too much prior knowledge, these models could be too specific and might not do well with new sounds.

In this chapter, we will try to answer these problems. The remainder of the chapter will be organized as follows: We will review some related and previous work. The next section afterwards, the MP algorithm is described and MP-based features are presented. The following section reports on a listening test for studying human abilities recognizing acoustic environments. In the background modeling section, we present a framework that utilizes semi-supervised learning to model the background and detect foreground events. Concluding remarks are drawn in the last section.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Unstructured Environmental Audio: Representation, Classification and Modeling

Abstract

Introduction

Complete Chapter List