Speech Content Authentication Scheme based on High-Capacity Watermark Embedding

Speech Content Authentication Scheme based on High-Capacity Watermark Embedding

Fang Sun (Xinyang Normal University, College of Computer and Information Technology, Xinyang Henan, China), Zhenghui Liu (Xinyang Normal University, College of Computer and Information Technology, Xinyang Henan, China) and Chuanda Qi (Xinyang Normal University, College of Mathematics and Information Science, Xinyang Henan, China)
DOI: 10.4018/978-1-7998-2454-1.ch020


The existed content authentication schemes based on digital watermark have some shortcomings. In order to solve the problems, a speech content authentication scheme based on high-capacity watermark embedding is proposed, and the high-capacity embedding method is discussed. Firstly, speech signal is framed and segmented, and the samples of each segment are scrambled. Secondly, DCT is performed on the scrambled signal, and low-frequency coefficients are selected as the watermark embedding domain. Lastly, frame number is mapped to a sequence of integers and embedded into the domain based on the embedding method. Theoretical analysis and experimental evaluation results show that the proposed algorithm is inaudible, robust to desynchronous attacks, enhances the embedding capacity, and improves the security of watermark system.
Chapter Preview


Digital signal has replaced traditional simulated signal to become the most popular information carrier in communication. However, it’s easy to edit, attack and forge the digital information for the increasingly rich of the multimedia editing tools.

In real life, digital speech signals are likely to cause attacker’s interest and be maliciously attacked. For attacked signal, the expressed meaning is different to the original one. If recipient regards the attacked signal is an authentic one and acts according to the requirements, it may cause serious consequences (Zhang et al., 2015). Fortunately, the forensic technology based on digital watermarking (Akhaee et al., 2010; Pun & Yuan, 2013; Peng et al., 2013, Lei et al., 2013) gives a method to verify the authenticity of speech signal.

Digital audio watermarking schemes are usually used for protecting audio copyright (Xiang et al., 2006; Wang et al., 2011; Yamamoto & Iwakini, 2009; Salma et al., 2010; Wang, Healy & Timoney, 2011) and have been achieved an outstanding progress in recent years. In (Wang, Shi, Wang & Yang, 2016) authors proposed a robust audio watermarking method based on invariant exponent moments and synchronization code technique. Watermark generated by binary image is embedded into host audio. Experiment results demonstrated that the scheme is resistance against most attacks, and the binary image extracted from watermarked signal after being attacked is similar to the original one. If the scheme is used for authentication, most attacks will not be detected. In Bai et al. (2011), authors given the audio watermarking scheme based on SVD–DCT with the synchronization code technique. Binary image as watermark is embedded into the high-frequency band of the SVD–DCT block blindly. The scheme is robust against various common signal processing attacks. So, if the schemes (Wang, Shi, Wang & Yang, 2016; Bai et al., 2011) is used for authentication, most attacks will not be detected.

As a carrier to transmit information, the meaning of digital speech signal to express should be intact and authentic. For audiences and users, if they consider the attacked signal as the original one and act according to the instructions of the attacked signal, it may cause serious consequences. So, for digital speech signals, the method used for speech forensics is indispensable, which can be achieved by using digital watermark (Liu et al., 2016). Outstanding progress has been achieved in recently, while they are unsuitable for speech authentication (Liu, Huang, Sun, & Qi, 2016).

By using detection of multiple compression and encoder’s identification, Korycki (2014) proposed an authentication scheme for compressed audio recordings. The compressed recordings are authenticated by evaluation of statistical features extracted from MDCT coefficients and other parameters obtained from compressed audio files, used for training selected machine learning algorithms. Although the scheme enhanced the robustness and the effectiveness, it needs a large number of training data.

Complete Chapter List

Search this Book: