Deep Learning Approach for Protecting Voice-Controllable Devices From Laser Attacks

Deep Learning Approach for Protecting Voice-Controllable Devices From Laser Attacks

Vijay Srinivas Srinivas Tida, Raghabendra Shah, Xiali Hei
Copyright: © 2022 |Pages: 18
DOI: 10.4018/978-1-7998-7323-5.ch008
(Individual Chapters)
No Current Special Offers


The laser-based audio signal injection can be used for attacking voice controllable systems. An attacker can aim an amplitude-modulated light at the microphone's aperture, and the signal injection acts as a remote voice-command attack on voice-controllable systems. Attackers are using vulnerabilities to steal things that are in the form of physical devices or the form of virtual using making orders, withdrawal of money, etc. Therefore, detection of these signals is important because almost every device can be attacked using these amplitude-modulated laser signals. In this project, the authors use deep learning to detect the incoming signals as normal voice commands or laser-based audio signals. Mel frequency cepstral coefficients (MFCC) are derived from the audio signals to classify the input audio signals. If the audio signals are identified as laser signals, the voice command can be disabled, and an alert can be displayed to the victim. The maximum accuracy of the machine learning model was 100%, and in the real world, it's around 95%.
Chapter Preview


Human interaction with devices such as Amazon Echo, Google Home, Apple HomePod, and Xiaomi AI became more prominent which can be helpful to users to control their smart home appliances, adjust the temperature, home security systems, online shopping, making phone calls, and many other tasks. In recent days most smartphones are equipped with Siri, Google Now, Cortana which provide users more flexible interface to control many IoT systems (Yuan Gong 2018) (Abdullah et al. 2019). Further advancements of these devices made not only normal individuals interact but also disabled and elderly people purely rely on them (H.Stephenson n.d.) (C. Martin n.d.). Security of these devices has become more important because of more sensitive information available from the user like payment information, car device control, etc. However, development in machine learning played a crucial role in handling a large amount of data with less effort and helped to have a better user experience. Even though they rapidly developed but these devices have a major security problem by taking the audio samples in the nearby environment and process it whether intentionally or unintentionally (Maheshwari n.d.)(Ramirez, M., and C. 2007). Speech recognition devices mainly consist of hardware components that show some non-ideal characteristics. There is the possibility of other kinds of attacks and these things can be broadly classified further. Attackers exploit these non-ideal characteristics of the devices to steal sensitive information or control the device. Attackers can attack voice-controllable systems using various sources like laser light(Sugawara et al. 2020), long-range attacks (Roy et al. 2018), ultrasonic waves(G. Zhang et al. 2017), solid materials(Q. Yan et al. 2020), electromagnetic interference signals(Kune et al. 2013)(Tu, Yazhou and Tida, Vijay Srinivas and Pan, Zhongqi and Hei 2021), etc. In the paper(Sugawara et al. 2020), they proposed how to attack various voice assistant systems from far away distances using laser light as the source of medium. In (Giechaskiel and Rasmussen 2020) clearly explained how the working of various sensors are manipulated using out-of-band signal injections. To avoid unsolicited access to the voice assistant systems, they proposed various hardware or software solutions. They briefly discussed various attacks and recommended certain solutions by taking all cross research areas into account. Further in (C. Yan et al. 2020) analog sensor security is explained through analyzing the security properties in a meaningful way. This work makes a systematic process to analyze the security properties of sensors which can provide a better understanding of devices helps to prevent future attacks. In (Tu et al. 2018), the authors clearly explained how inertial sensors can malfunction through means of using out-of-band acoustic signals. Also proposed two solutions to handle these attacks utilizing digital amplitude adjusting and phase pacing. In (Tu et al. 2019) temperature sensor measurement manipulation can be demonstrated by adversary attacks made using hardware used in devices like operational amplifiers, instrumental amplifiers, etc. Also, they showed the defense method using the prototype design of a low-cost anomaly detector. Modern smartphones are susceptible to various attacks which can be seen in (Kasmi and Lopes Esteves 2015). They took electromagnetic signals as a source to induce the attack by exploring the properties of electronic devices and proposed a notable solution using a new silent remote voice command injection method. In medical field applications, these kinds of attacks pose a serious threat which can be seen in (Rasmussen et al. 2009). Here the authors tried to solve the attacks using proximity-based access control technique with the help of ultrasonic distance bounding protocols. Electromagnetic interference attacks on sensor devices constitute a major threat in this physical world and the detection of these signals has more importance which can be observed in (Y. Zhang and Rasmussen 2020). They proposed a simple technique by measuring the value on the sensor when it’s at rest should be zero volts with the help of a little amount of extra hardware. Deep learning approaches showed higher efficiency towards the usage of voice-based assistant devices, however, attackers use the vulnerabilities in the model to perform unsolicited activities. In (Abdullah H, Warren K, Bindschaedler V, Papernot N n.d.) proposed how the attacks can be performed and proposed the future research directions to avoid these attacks. In (Chen Y, Zhang J, Yuan X, Zhang S, Chen K, Wang X n.d.) mentioned the above problems clearly by classifying them into various classes like out-of-band signal attack, adversarial attack, etc with various solutions. They provide some insights on how we align the research related to security in the Image Recognition System (IRS) as the base. In (Z. Xu et al. 2021) proposed an inaudible attack method for increased distance of 2.5m using Electromagnetic Interference on smart speakers with the help of non linear property of microphones. By using deep learning algorithms will help to protect these devices to some extent without having additional costs. Since voice assistant devices contains microphone and more sensitive information is present inside it, adding more hardware components might cause more problems. Instead of solving problems the added components might create new problems. Since many devices are already purchased it is difficult to change the physical characteristics of the devices more efficient deep algorithms are necessary. Although there are cases where attackers used white box knowledge of the deep learning models which can create audio samples that can be understood by the voice assistant devices but difficult for humans to interpret (Carlini et al. 2016; Yuan et al. 2018). A typical block diagram of the Voice Controlled Systems (VCS) is shown in Figure1 how the machine learning model will make to process the malicious signal can be seen. In the first step, the input signal might be in the form of the recorded human voice or any other manipulated signals from various sources which can be considered as a spoofing process. The second step making the voice system accept the malicious command by hacking the operating system. After that, the malicious analog signals are converted to a digital signal in the third step, and in the final step, the machine learning model will execute the adversarial command by deception process (Yuan Gong 2018).

Complete Chapter List

Search this Book: