Who would have thought that simply typing on a laptop keyboard could leak one’s password, with an accuracy rate as high as 95%!

Recently, a paper published by scholars from three universities, including Durham University in the UK, has pointed out that even the most advanced artificial intelligence models can reconstruct user input passwords and sensitive information solely based on the sound of keystrokes on a laptop. When analyzing sounds recorded during online meetings using tools like Zoom, this model achieves an accuracy rate of 93%. Furthermore, if the recordings are from a smartphone, the accuracy rate goes as high as 95%. It’s truly alarming!

In everyday life and work, when people enter passwords on their devices, some cautious individuals may take measures to hide their screens, such as using their hands to shield them. However, very few people think about concealing the sound of their keyboard. This situation might undergo some changes in the future, and it all stems from a paper published by scholars from Durham University in the UK titled “Practical Techniques for Keyboard Acoustic Side-Channel Attacks Based on Deep Learning.

With the latest advancements in deep learning technology and the widespread use of micro-sized smartphones and other smart devices, researchers from several universities in the UK have discovered that the threat of acoustic side-channel attacks on keyboards is greater than ever before.

On one hand, although several research papers have delved into this issue and established mathematical models capable of inferring correct keystrokes from keyboard sound data, most of these studies used desktop keyboards. These keyboards produce significantly louder sounds compared to modern keyboards, especially those found on laptops. Additionally, laptops of the same model often have identical keyboards, resulting in similar keyboard sounds. This implies that if a popular laptop model is proven to be susceptible to acoustic side-channel attacks, a large portion of consumers may be at risk of privacy breaches.

Furthermore, the accessibility of microphone devices for capturing sound has significantly improved in recent times. In the past, it largely relied on external microphones, but now, smartphones, smartwatches, and online meeting software like Zoom can be used to collect keyboard sounds. This also means that even without targeting laptops directly, certain low-level defense systems in smart wearables or smart home devices could be vulnerable to privacy breaches.

Most importantly, with the rapid advancement of artificial intelligence technologies such as deep learning, the computational models used for processing and analyzing data have seen significant improvements.

In the past, when researching acoustic side-channel attacks, machine learning methods were widely used. One common approach was to employ Hidden Markov Models (HMM), a model trained on text corpora primarily used for predicting the most likely words or characters in a sequence.

For example, when the classifier outputs “Hwllo,” HMM could be used to infer that the “w” in the word is actually a misclassified “e.” While this method is quite effective in many text processing scenarios, it has the major drawback of making strong independence assumptions and not considering contextual features. This limitation weakens its ability to model real-world situations, especially in the context of unordered passwords. This may be one of the reasons why HMM has become less popular recently.

In this paper, researchers have introduced a novel technique for applying deep learning models with self-attention layers to keyboard acoustic side-channel attacks. They have also, for the first time, employed self-attention transformer layers in keyboard attacks. Furthermore, they conducted multiple targeted experiments and evaluations in real-world attack scenarios. The results indicate that the risk of keyboard data leakage from current laptop keyboards is higher than ever before.

01

Let’s Recreate the “Crime Scene of AI Security Threats.”

In this experiment, researchers initially selected a MacBook Pro 16-inch (2021) laptop, equipped with 16GB of memory and an Apple M1 Pro processor, as the target for the attack. The keyboard design of this laptop is identical to models from the past two years and potentially future releases, with very few available models during the same period, and the keyboards are essentially the same.

In terms of collecting sound data, the researchers opted for two commonly used modes: first, recording using a smartphone located in close proximity to the laptop, and second, remote recording through online conferencing tools like Zoom.

With preparations in place, let’s now briefly recreate the “crime scene.”

Step one: Data Collection – In two separate experimental modes (using a smartphone and Zoom), researchers pressed the 36 keys on the laptop (0-9, a-z). After pressing each key 25 times with varying angles and pressure, a data file recording the sound was generated for each key.

Step two: Keystroke Isolation – After recording all the keystroke data, researchers used a fundamental signal analysis method, Fast Fourier Transform (FFT), to extract the keystroke sounds. They summed the coefficients of different frequencies to obtain energy. Then, they defined an energy threshold and marked a signal as a keystroke when it exceeded this threshold.
It’s worth noting that due to noise suppression during Zoom recordings, setting an energy threshold was challenging. Researchers employed an iterative approach to adjust the threshold continuously until the correct number of keystrokes was identified.

Key isolation procedure
Image from Internet

Step three: Feature Extraction – In this step, researchers used the Mel spectrogram method to extract sound features, making the differences between each keystroke recognizable.

Waveform and corresponding melody chart, left for mobile phone, right for Zoom recording
Image from Internet

Step four: Data Augmentation – To enhance the model’s generalization, which is the ability of a machine learning model to adapt to new, unseen data and avoid overfitting the training data, researchers employed data augmentation using a masking technique. This involved randomly selecting portions of data on the time and frequency axes and setting all values within these ranges to the average value of the spectrogram, effectively “masking” parts of the image.

Step five: Model Building – This is a crucial step in the experiment. Researchers chose the state-of-the-art CoAtNet model from the field of image recognition. CoAtNet is a model that combines the strengths of Convolutional Neural Networks (CNN) and Transformers (deep learning models based on self-attention mechanisms). It is trained to quickly process patterns in the data while reducing the sampling size (convolutions). Then, it calculates attention scores (self-attention) to determine the correlations between these patterns. This allows the model to achieve excellent classification results even in relatively small models.

In this process, researchers added an average pooling layer on top of CoAtNet. This layer computes the average values of regions within the image, which serves to alleviate the sensitivity of the convolutional layers to position and reduce the number of parameters. Following the average pooling layer is a fully connected linear layer, which is one of the fundamental components of a neural network. Its primary role is to take the input data and produce output results through a series of complex computations. This way, the output from CoAtNet can be reduced to percentages associated with each key.

Ultimately, the researchers’ experimental results showed that the keystroke classification accuracy reached 95% when recorded using a smartphone and 93% when recorded through Zoom. In simpler terms, for an 8-character password, it’s possible that 7 of them could be correctly recognized, and the one that is misclassified is often in close proximity to the correct key!

02

Is there any secrecy left for humans?

In the face of the latest AI models, it seems like humans may have very few secrets left. Besides obtaining passwords through keyboard sound, what other attack methods are there that we might not be aware of?

First, let’s clarify the concept of side-channel attacks mentioned in this paper. Side-channel attacks, also known as side-channel cryptanalysis, involve indirectly obtaining ciphertext information by exploiting various forms of leakage information generated during the runtime of encryption software or hardware. In simpler terms, any method that doesn’t involve a direct confrontation but rather takes alternative routes falls into the category of side-channel attacks. As the saying goes, “If the front door is closed, go through the side door; if the side door is locked, climb through the window.” There are myriad ways for hackers to achieve their goals; it’s only a matter of imagination and creativity.

Based on the types of side-channel information, aside from sound, common attack methods include timing attacks, power consumption attacks, and electromagnetic attacks, among others.

For example, timing attacks exploit the fact that every operation in a computer program takes time to execute. By precisely measuring the time taken for each operation, attackers can not only infer the program being executed but also expedite the process of cracking keys. Let’s illustrate this with a simple example: consider a 6-digit password, such as 654321. When hackers attempt to crack this password using a timing attack, they start by iterating through potential values for the “first digit” and measure the time it takes for the verification process to respond. If the first digit is incorrect, the verification process will quickly return, but when the correct digit “6” is input, it will take longer as it proceeds to the second digit. This process continues, and through a relatively small number of experiments, the password can be cracked quickly using this method.

Then, there are power consumption attacks. When humans use computers, the characters or information they input are encoded into combinations of 0s and 1s, which are then processed through countless transistor switches. Different instructions trigger varying numbers of transistor switches to open or close, resulting in different power consumption levels. By analyzing precise changes in power consumption, hackers can uncover critical information hidden within the system.

Finally, electromagnetic attacks are another common type. According to Faraday’s law, the movement of electrical current generates a magnetic field, and different programs generate different electromagnetic emissions during their operation. By capturing the electromagnetic signals emitted by a device during its operation and using appropriate analysis methods and leakage models, many critical pieces of information can be exposed.

Additionally, attack methods based on signals such as optical, temperature, and vibration are on the rise. In 2018, researchers at the University of California, Irvine, discovered a thermal imaging attack method that involves using a thermal camera to measure the residual heat left by a user on a keyboard, allowing them to reconstruct typed text information. In 2020, researchers in Israel found that even the fan speed data leaked by a computer can be used to steal critical information.

03

Words in the end

In summary, as long as there is signal leakage from a device, determined individuals can achieve the goal of side-channel attacks through a comprehensive combination of data collection, processing, analysis, and modeling.

Indeed, there are measures that humans can take to counteract these side-channel attacks. However, it’s a constant cat-and-mouse game where security measures evolve in response to new attack methods. Here are some strategies:

  1. Hardware Solutions: On the hardware level, components that minimize signal leakage or introduce interference to prevent information leakage can be used.
  2. Software Solutions: Upgrading security defense software and regularly updating operating systems and applications can help mitigate vulnerabilities.
  3. User Practices: Users can contribute by using complex combinations of passwords and regularly changing them, practicing good cybersecurity hygiene, and being cautious about the devices and software they use.

However, in a world where signals are pervasive, preventing information leakage remains a challenging and ongoing endeavor. It’s a constant battle to stay ahead of ever-evolving attack techniques.

End-of-Yunze-blog

Disclaimer: This article is created by the original author. The content of the article represents their personal opinions. Our reposting is for sharing and discussion purposes only and does not imply our endorsement or agreement. If you have any objections, please contact us through the provided channels.

Leave a Reply