Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Modeling Grouping Cues for Auditory Scene Analysis Using a Spectral Clustering Formulation

Luís Gustavo Martins, Mathieu Lagrange, George Tzanetakis

Source Title: Machine Audition: Principles, Algorithms and Systems

DOI: 10.4018/978-1-61520-919-4.ch002

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Computational Auditory Scene Analysis (CASA) is challenging problem for which many different approaches have been proposed. These approaches can be based on statistical and signal processing methods such as Independent Component Analysis or can be based on our current knowledge about human auditory perception. Learning happens at the boundary interactions between prior knowledge and incoming data. Separating complex mixtures of sound sources such as music requires a complex interplay between prior knowledge and analysis of incoming data. Many approaches to CASA can also be broadly categorized as either model-based or grouping-based. Although it is known that our perceptual-system utilizes both of these types of processing, building such systems computationally has been challenging. As a result most existing systems either rely on prior source models or are solely based on grouping cues. In this chapter the authors argue that formulating this integration problem as clustering based on similarities between time-frequency atoms provides an expressive yet disciplined approach to building sound source characterization and separation systems and evaluating their performance. After describing the main components of such an architecture, the authors describe a concrete realization that is based on spectral clustering of a sinusoidal representation. They show how this approach can be used to model both traditional grouping cues such as frequency and amplitude continuity as well as other types of information and prior knowledge such as onsets, harmonicity and timbre-models for specific instruments. Experiments supporting their approach to integration are also described. The description also covers issues of software architecture, implementation and efficiency, which are frequently not analyzed in depth for many existing algorithms. The resulting system exhibits practical performance (approximately real-time) with consistent results without requiring example-specific parameter optimization and is available as part of the Marsyas open source audio processing framework.

Chapter Preview

Top

Introduction

Inspired by the classic book by Bregman on Auditory Scene Analysis (ASA) (Bregman, 1990) a variety of systems for Computational Auditory Scene Analysis (CASA) have been proposed (Wang and Brown, 2006). They can be broadly classified as bottom-up systems (or data-driven) where the flow of information is from the incoming audio signal to higher level representations or top-down systems (or model-based) where prior-knowledge about the characteristics of a particular type of sound source in the form of a model is utilized to assist the analysis. The human auditory system utilizes both of these types of processing. Although it has been argued that computational CASA systems should also utilize both types (Slaney, 1998) most existing systems fall into only one of the two categories. Another related challenge is the integration of several grouping cues that operate simultaneously into a single system. We believe that this integration becomes particularly challenging when the CASA system has a multiple stage architecture where each stage corresponds to a particular grouping cue or type of processing. In such architectures any errors in one stage propagate to the following stages and it is hard to decide what the ordering of stages should be. An alternative, which we advocate in this chapter, is to formulate the entire sound source formation problem from a complex sound mixture as a clustering based on similarities of time-frequency atoms across both time and frequency. That way all cues are taken into account simultaneously and new sources of information such as source models or other types of prior-knowledge can easily be taken into consideration using one unifying formulation.

Humans, even without any kind of formal music training, are typically able to extract, almost unconsciously, a great amount of relevant information from a musical signal. Features such as the beat of a musical piece, the main melody of a complex musical arrangement, the sound sources and events occurring in a complex musical mixture, the song structure (e.g. verse, chorus, bridge) and the musical genre of a piece, are just some examples of the level of knowledge that a naive listener is commonly able to extract just from listening to a musical piece. In order to do so, the human auditory system uses a variety of cues for perceptual grouping such as similarity, proximity, harmonicity, common fate, among others.

In the past few years interest in the emerging research area of Music Information Retrieval (MIR) has been steadily growing. It encompasses a wide variety of ideas, algorithms, tools, and systems that have been proposed to handle the increasingly large and varied amounts of musical data available digitally. Typical MIR systems for music signals in audio format represent statistically the entire polyphonic sound mixture (Tzanetakis and Cook, 2002). There is some evidence that this approach has reached a “glass ceiling” (Aucouturier and Pachet, 2004) in terms of retrieval performance. One obvious direction for further progress is to attempt to individually characterize the different sound sources comprising the polyphonic mixture. The predominant melodic voice (typically the singer in western popular music) is arguably the most important sound source and its separation and has a large number of applications in Music Information Retrieval.

The proposed system is based on a sinusoidal modeling from which spectral components are segregated into sound events using perceptually inspired grouping cues. An important characteristic of clustering based on similarities rather than points is that it can utilize more generalized context-dependent similarities that can not easily be expressed as distances between points. We propose such a context-dependent similarity cue based on harmonicity (termed “Harmonically-Wrapped Peak Similarity” or HWPS). The segregation process is based on spectral clustering methods, a technique originally proposed to model perceptual grouping tasks in the computer vision field (Shi and Malik, 2000). One of the main advantages of this approach is the ability to incorporate various perceptually-inspired grouping criteria into a single framework without requiring multiple processing stages. Another important property, especially for MIR applications that require analysis of large music collections, is the running time of the algorithm which is approximately real-time, as well as the independence of the algorithm from recording-specific parameter tuning.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Modeling Grouping Cues for Auditory Scene Analysis Using a Spectral Clustering Formulation

Abstract

Introduction

Complete Chapter List