Article Preview
Top1 Introduction
With the rapid development of Internet communication technology, people get a lot of multimedia information on the Internet every day. Digital audio, as an essential information carrier, occupies a large part of the multimedia content shared and transmitted on the Internet. The emergence of many audio editing software makes the editing operation of digital audio very convenient, and the malicious editing and application of audio by some criminals may lead to some serious consequences (Qamhan et al., 2021). Therefore, there is a growing need for effective editing detection methods, especially when audio is used as an evidence in courtroom trials and in political campaigns or commercial applications.
There are two kinds of tamper detection methods for digital audio (Zakariah et al., 2018). One is the active detection method, which requires embedding watermark and signature in audio in advance to realize audio protection and detection. The other is the passive detection method, which does not need to embed additional information in advance and directly uses standard features contained in digital audio to perform tamper detection.
In recent years, there have been many studies on passive detection of audio tampering. The audio features used by these passive detection methods include audio statistical features such as voice pitch and formant (Chen et al., 2016; Xie et al., 2018; Yan et al., 2019a), Recording Device Information(Zeng et al., 2020; Zeng et al., 2021), Speaker information(Wang et al., 2020; Wang et al., 2021; Zeng et al., 2018), background noise (Pan et al., 2012) and Electronic Network Frequency (ENF) (Grigoras, 2005; Hua et al., 2016; Rodríguez et al., 2010). ENF is the power line transmission frequency (50 or 60HZ), and ENF is embedded in the audio in the form of buzzing when it is recorded (Hajj-Ahmad et al., 2018). According to the random fluctuation of ENF around the nominal frequency (50 or 60HZ) (Cooper, 2009), audio forensics can be conducted, including timestamp verification (Hua, 2018; Hua et al., 2014), content tampering detection (Esquef et al., 2014; Rodríguez et al., 2010), recording position positioning (Yao et al., 2017; Zheng et al., 2017).