An Improved Approach to Audio Segmentation and Classification in Broadcasting Industries

Jingzhou Sun, Yongbin Wang

Source Title: Journal of Database Management (JDM) 30(2)

DOI: 10.4018/JDM.2019040103

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Audio segmentation and classification are the basis of audio processing in broadcasting industries. A Dual-CNN (Dual-Convolutional Neural Network) method is proposed in this article in which it is possible to pre-train a CNN with unlabeled audio data so as to deal with the scarcity of labeled data. Auto-encoders (including an encoder and a decoder) are utilized, thus the name “Dual.” In the first place, audio sampling points and the derived STFT (Short-Time Fourier Transform) spectrograms pass through their own CNNs. Fusion of the extracted features is then performed. Finally, the merged features are sent to a fully connected network and the classification results are produced via Softmax. Being one of the segmentation-by-classification approaches, our solution also presents a novel smoothing method (SEG-smoothing) in order to deliver the best result of segmentation. A series of experiments have been conducted and their result verifies that the proposed approach for segmentation and classification outperforms alternative solutions.

Article Preview

Top

Introduction

Over the years, broadcasting stations have accumulated a large amount of unlabeled audio content for programs. These valuable resources can be saved, indexed, and retrieved for later use by means of information technology. Efficient information retrieval calls for the help of labels that attach meaning to the data. According to an insider from China Radio International (CRI), as of July 2018, the total amount of all the audio content from CRI has exceeded 55 Terabytes, which corresponds to a 530,000 hours of audio playback. It is often impossible for human to accomplish such a tedious, time-consuming annotation task without the assistance from automatic or semi-automatic labeling techniques. Therefore, the potential of the audio content cannot be fully utilized.

Audio segmentation and classification are key techniques to successful completion of audio (data) labeling, or audio annotation, in that the classification results provide a starting point for efficient audio annotation. This topic has attracted researchers mainly from the AI (artificial intelligence) and signal processing communities (Castán et al., 2015).

The primary goal of automatic audio segmentation is to provide boundaries to delimit portions of audio with homogeneous acoustic content (Shin, Chang, & Kim, 2010). In the meantime, audio classification aims to help identify the semantic meaning of each portion derived from audio segmentation, whereas, it is, by no means, an easy task for audio streams such as broadcast news, containing single type classes and mixed type of classes (e.g., speech with music and speech with noise) (Xie, Fu, Feng, & Luo, 2011; Cheong, Oh, & Lee, 2004).

This article presents an audio classification solution to news broadcasting, which features coupling of audio segmentation and classification, namely segmentation-by-classification, see Background. A Dual-CNN (Dual-Convolutional Neural Network) was introduced to perform classification on clips with a fixed length. Unlike others, it can make use of both (a small amount of) labeled and (a large number of) unlabeled data for the training of CNNs. A novel smoothing method, SEG-smoothing, was then applied to the classification result, thus yielding portions of audio with homogeneous acoustic content. For performance evaluation of our proposed approach, a series of experiments involving Dual-CNN and other alternatives using datasets from Beijing People’s Broadcasting Station and GTZAN, have been conducted. The results verify that, in terms of classification accuracy and segmentation error rate, Dual-CNN outperforms alternative solutions.

The remainder of the article is organized as follows. We presented related work in Section Background. This is followed by a detailed introduction to the Dual-CNN in Section the Dual-CNN Approach. The following section describes a smoothing method for audio segmentation. An array of experiments and related analysis for performance evaluation of the Dual-CNN is given in Section Evaluation. We conclude our work and identify future research in Section Conclusions and Future Work.

Background

We present, in this section, related work by which our proposed research was most inspired. In the beginning, we present two predominant categories for segmentation, i.e. segmentation-and-classification and segmentation-by-classification, and then review the state-of-the-art in related fields. This is followed by a discussion on the application of deep learning techniques, CNN and autoencoders, to deal with audio classification. We further this research by facilitating audio segmentation and classification in the broadcasting domain by a combination of both techniques. For a comprehensive and fair comparison, we investigate five approaches that are either classical in the field or share most features in common with our proposed work.

Top

Segmentation And Classification

Audio segmentation/classification systems can be divided into two classes depending on how segmentation is performed (Castán, Ortega, Miguel, & Lleida, 2014).

Complete Article List

Search this Journal:

Reset

Volume 35: 1 Issue (2024)

Volume 34: 3 Issues (2023)

Volume 33: 5 Issues (2022): 4 Released, 1 Forthcoming

Volume 32: 4 Issues (2021)

Volume 31: 4 Issues (2020)

Volume 30: 4 Issues (2019)

Volume 29: 4 Issues (2018)

Volume 28: 4 Issues (2017)

Volume 27: 4 Issues (2016)

Volume 26: 4 Issues (2015)

Volume 25: 4 Issues (2014)

Volume 24: 4 Issues (2013)

Volume 23: 4 Issues (2012)

Volume 22: 4 Issues (2011)

Volume 21: 4 Issues (2010)

Volume 20: 4 Issues (2009)

Volume 19: 4 Issues (2008)

Volume 18: 4 Issues (2007)

Volume 17: 4 Issues (2006)

Volume 16: 4 Issues (2005)

Volume 15: 4 Issues (2004)

Volume 14: 4 Issues (2003)

Volume 13: 4 Issues (2002)

Volume 12: 4 Issues (2001)

Volume 11: 4 Issues (2000)

Volume 10: 4 Issues (1999)

Volume 9: 4 Issues (1998)

Volume 8: 4 Issues (1997)

Volume 7: 4 Issues (1996)

Volume 6: 4 Issues (1995)

Volume 5: 4 Issues (1994)

Volume 4: 4 Issues (1993)

Volume 3: 4 Issues (1992)

Volume 2: 4 Issues (1991)

Volume 1: 2 Issues (1990)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference