Audio Tampering Forensics Based on Representation Learning of ENF Phase Sequence

Audio Tampering Forensics Based on Representation Learning of ENF Phase Sequence

Chunyan Zeng, Yao Yang, Zhifeng Wang, Shuai Kong, Shixiong Feng
Copyright: © 2022 |Pages: 19
DOI: 10.4018/IJDCF.302894
Article PDF Download
Open access articles are freely available for download

Abstract

This paper proposes an audio tampering detection method based on the ENF phase and BI-LSTM network from the perspective of temporal feature representation learning. First, the ENF phase is obtained by discrete Fourier transform of ENF component in audio. Second, the ENF phase is divided into frames to obtain ENF phase sequence characterization, and each frame is represented as the change information of the ENF phase in a period. Then, the BI-LSTM neural network is used to train and output the state of each time step, and the difference information between real audio and tampered audio is obtained. Finally, these differences were fitted and dimensionally reduced by the fully connected network and classified by the Softmax classifier. Experimental results show that the performance of this method is better than the state-of-the-art approaches.
Article Preview
Top

1 Introduction

With the rapid development of Internet communication technology, people get a lot of multimedia information on the Internet every day. Digital audio, as an essential information carrier, occupies a large part of the multimedia content shared and transmitted on the Internet. The emergence of many audio editing software makes the editing operation of digital audio very convenient, and the malicious editing and application of audio by some criminals may lead to some serious consequences (Qamhan et al., 2021). Therefore, there is a growing need for effective editing detection methods, especially when audio is used as an evidence in courtroom trials and in political campaigns or commercial applications.

There are two kinds of tamper detection methods for digital audio (Zakariah et al., 2018). One is the active detection method, which requires embedding watermark and signature in audio in advance to realize audio protection and detection. The other is the passive detection method, which does not need to embed additional information in advance and directly uses standard features contained in digital audio to perform tamper detection.

In recent years, there have been many studies on passive detection of audio tampering. The audio features used by these passive detection methods include audio statistical features such as voice pitch and formant (Chen et al., 2016; Xie et al., 2018; Yan et al., 2019a), Recording Device Information(Zeng et al., 2020; Zeng et al., 2021), Speaker information(Wang et al., 2020; Wang et al., 2021; Zeng et al., 2018), background noise (Pan et al., 2012) and Electronic Network Frequency (ENF) (Grigoras, 2005; Hua et al., 2016; Rodríguez et al., 2010). ENF is the power line transmission frequency (50 or 60HZ), and ENF is embedded in the audio in the form of buzzing when it is recorded (Hajj-Ahmad et al., 2018). According to the random fluctuation of ENF around the nominal frequency (50 or 60HZ) (Cooper, 2009), audio forensics can be conducted, including timestamp verification (Hua, 2018; Hua et al., 2014), content tampering detection (Esquef et al., 2014; Rodríguez et al., 2010), recording position positioning (Yao et al., 2017; Zheng et al., 2017).

Complete Article List

Search this Journal:
Reset
Volume 16: 1 Issue (2024)
Volume 15: 1 Issue (2023)
Volume 14: 3 Issues (2022)
Volume 13: 6 Issues (2021)
Volume 12: 4 Issues (2020)
Volume 11: 4 Issues (2019)
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing