Article Preview
TopIntroduction
Digital audio watermarking technique, which is a technical method to protect the digital audio signals, has been paid more attention and achieved an outstanding progress in recent years. Digital audio watermarking schemes can be divided into two types according to the purpose of application. The first kind is robustness audio watermark scheme used for protecting audio copyright (Xiang, et al., 2006; Yamamoto & Iwakini, 2009; Salma, et al., 2010; Vivekananda, et al., 2011; Wang, Healy, et al., 2011; & Wang, Ma, et al., 2011). The second kind is used for authenticating the veracity and integrity of audio content (Wang & Fan, 2010; Chen & Zhu, 2008; Jiang, 2010).
Currently, for robustness audio watermark schemes and audio content authentication schemes, there are a lot of research results. For speech signals, there are a lot of research results in the speaker recognition and identification (Khan, et al., 2010; Herbig, et al., 2012; Sahidullah & Saha, 2012; Navarathna, et al., 2013). However, the speech content authentication schemes are rarely (Park et al., 2007). It’s known that speech signals are different to audio signals in some ways (such as sampling rate). Generally, some audio content authentication schemes cann’t be used for speech content authentication directly. Comparing with audio signals, speech signals are more likely to cause attacker’s interest and be attacked. If the attacked signals are not detected, the authentication client will consider the attacked signals are veracity, which will cause serious consequences. So, the research of speech content authentication is more realistic meaning and practical value.
Considering the security requirements of watermark system and the purpose of practical application of speech (audio) content authentication, there are some shortcomings for some existing watermark schemes:
1. For some watermark algorithms, watermark bits are embedded in public and fixed frequency points, which bring that attackers can tamper watermark in frequency domain easily. For example, the watermark embedded based on discrete Fourier transform, for the frequency points of discrete Fourier transform are public and fixed, it is vulnerable to malicious attacks and insecure in the practical application process. The detailed discussion of this is described in (Xie et al., 2006).
2. For the scheme proposed in Chen et al., (2007), the features used to generate watermark bits are extracted during compression. For speech signals uncompressed or compressed based on other speech codecs (not based on codebook-excite linear prediction), the scheme is powerless. And watermark bits are embedded in the least significant bits (LSBs), which is very fragile to common signal processing operations. So, for the common signal processing operation, the scheme will regard it as hostile attack. In practical application, for the convenience of storage, the format of special requirements and many other reasons, speech signal will inevitably be subjected to a certain degree of common signal processing operations. In this situation, the schemes (Chen et al., 2007; Chen et al., 2010) are unsuitable.
3. For the content-based audio content authentication algorithm (Wang & Fan, 2010), on the one hand, the features used to generate watermark are public, and attackers can get the features easily. On the other hand, the watermark embedding and extraction methods are known to attackers. It is possible for attackers to extract the watermark embedded. Then, for one watermarked audio frame, attackers search to find another audio content having the same features and watermark extracted and to substitute the frame, which will not be detected at the certified authority. We call this attack as feature-analysed substitution attack.