New Directions in Eavesdropping

You don't need a microphone to bug a room anymore

Feb 16, 2022

From a nearby bridge, the inventors of Lamphone1 filmed the interior of an office suite through a telescope.

Lamphone (red) aimed (white) at a room (blue).

Using only this visual recording, they reconstructed the sound of the room’ interior with enough clarity that a song playing in the room was recognized by Shazam.

Compare these two spectrograms (visual representations of audio over time):

“Let it be - Recovered” is a spectrogram of the reconstructed audio signature of the room. It includes both ambient noise and the song “Let it be'“, which was playing on a speaker in the room.
“Let it be - Original” is a spectrogram of The Beatles’ song “Let it be”. It was created by directly analyzing an audio file of the song.

Despite the messiness introduced by ambient noise (the yellow/green ‘haze’ visible in “Let it be — Recovered”), the key sonic features of the song are clearly visible in both spectrograms. This feature-visibility corresponds to the ability of applications like Shazam to recognize the song. In other words, Lamphone’s reconstructions of room audio from visual recordings are quite accurate even though Lamphone has no microphone, and therefore collects no audio data.

How does it work?

A quick primer is warranted. Understanding the following makes understanding Lamphone very easy.

In 1948, mathematician Claude E. Shannon published "A Mathematical Theory of Communication".

This paper established the field of Information Theory and coined the term ‘bit’ to refer to the most basic unit of information. Every form of digital communication can trace its existence to Shannon’s paper.

Shannon outlined the basic elements of communication with the following diagram:

A figure from Claude Shannon’s major paper showing his model of communication

Communication involves sending information to a receiver. However, information is an idea and has no form of its own. For information to be understood, it must be physically represented in a way that can be interpreted by a recipient.

For example, here are the steps of the process in verbal communication:

Information is encoded in a message.
1. A person represents what they mean with words
  1. meaning=information, words=message
A transmitter must then send that message in the form of a signal.
1. A person uses their vocal tract to create vibrations in the air
  1. vocal tract=transmitter, air vibrations=signal
Between sender and receiver, noise is introduced, creating signal interference.
1. Vibrations travel through the air, which contains other vibrations
  1. other vibrations=noise
The signal is received and presented as a message to the receiver.
1. Vibrations hit the cochlea and cause a neural response in the auditory cortex that the brain’s language network recognizes as words
  1. vibrations=signal, words=message
The message is decoded back into information.
1. The words are mapped to what the receiver believes them to mean
  1. words=message, meaning=information

All forms of communication follow these steps. Just the same, any process that follows these steps is communication.

Eavesdropping

Eavesdropping is the act of covertly hijacking the signal. A classic technique for eavesdropping on verbal communications is to hide a microphone within earshot and retrieve the communicated information from the recording. Audio recording is possible because sound exerts a physical effect on a microphone’s diaphragm, allowing it to represent the vibration pattern as data. Audio speakers do the opposite, turning data back into vibration patterns with the exact same rules of transformation.

Imagine a camera filming a room. A sound in the room disturbs the air, which disturbs the light bulb, which disturbs the distribution of light in the room. If the camera is sufficiently powerful, the changes in light distribution will be captured in the video. In theory, these changes can be translated back into the original sound if we are able to mathematically account for the signal’s journey (sound→air→bulb→light) and use the same formulas to reverse through the steps (sound←air←bulb←light).

This is essentially what Lamphone does.

Lamphone’s Strategy

Lamphone’s optical recordings capture the interference in a room’s light caused by sound waves hitting a lightbulb. Frame by frame, a specialized algorithm visually examines these minuscule changes in light distribution and reverse-engineers the audio signature of the sound causing those changes.

In a room, people talk (1), creating sound waves that collide with an overhanging lightbulb (2). Outside of the room, Lamphone (3) records optical data through a telescope aimed at the lightbulb (2). Lamphone’s algorithm (4) translates the optical data into audio data, which can be played on standard headphones (5).

The physics behind this technique are relatively straightforward: the recording is done with an electro-optical sensor (essentially a special type of camera), which detects changes in light and maps them numerically. Four numbers are assigned to each point of this numerical map: X, Y, and Z coordinates representing a location in 3D space, and a voltage measurement representing the intensity of light at that location. Sound waves hitting a lightbulb create disturbances that can be picked up by these sensors and therefore be encoded into the resulting numerical data maps.

Electro-optical sensors measure the intensity of light at any given coordinate on the above diagram. Measuring these intensities over time allows sonic vibrations of the light bulb to be observed via their effect on light distribution.

By themselves, these maps are not enough. The eavesdropper knows that the audio information is encoded within them, but does not have a way of getting it out. This is because the interference pattern created by a given sound does not map directly to any of the numbers outputted by the sensor. Rather, a sound exerts a ripple effect upon the entire dataset, subtly adding itself into the sum off all vibrations affecting the distribution of light, including those that were not caused by the sound the eavesdropper is interested in. Playing the role of eavesdropper, the researchers needed a tool that could (roughly speaking) subtract this background noise from the data map, leaving only the pattern created by the sound of interest. This pattern, unlike that of the original data map’s, can be converted to digital audio.

The key contribution of Lamphone is the development of this tool. After conducting a series of experiments to determine how different sounds affect a bulb’s field of light, the researchers were able to build an algorithm that could effectively ‘speak the language’ of sonic light interference, translating lightbulb vibrations into high-fidelity audio in real time.

Lamphone vs. other eavesdropping methods

Central obstacles to the quintessential eavesdropping strategy of “bugging a room” are the challenges of placing a listener within earshot of the communication source and of maintaining the secrecy of the listener’s location. Lamphone circumvents the need to do either of these things. However, while it carries a substantial advantage in this respect, it is not without its own limitations.

Lamphone can be beaten by turning off the light or using a sufficiently low-watt bulb (no light = no interference pattern for an algorithm to decode), or by obstructing any windows that a remote observer could aim a telescope at. These methods also thwart a similar tool that rebuilds audio from footage of potato chip bags, which are also vibrated by sound waves.

Dark rooms are safe from Lamphone, but remain vulnerable to an earlier technology called the laser microphone — a device that projects a laser, captures its reflection, and analyzes the interference added by sound along the way.

The laser microphone, an older spy gadget, differs from Lamphone in that it produces its own light instead of relying on a bulb in the targeted room.

The trade-off is that, with the right tools, the laser emitted by the laser microphone can be detected in the room while Lamphone cannot. In fact, the only way to know you are being recorded by Lamphone is to look out the window at the person on the street aiming a telescope at your window.

Lip-reading is comparable to Lamphone in that it deduces speech from purely visual information. Lip-reading requires no technology, but does require an uninterrupted view of a speaker’s face, which in many cases is not obtainable. Lip-reading is also limited to speech, whereas Lamphone and the laser microphone can reconstruct any kind of sound, including speech.

This is the current state of declassified eavesdropping technology in a nutshell. The purpose of this piece is to illustrate how information can serve as both a liability and a prophylactic—a crucial first step in protecting yourself from spies is to learn their tradecraft. If you are ever worried about being spied on, keep in mind the sensitivity of modern machine learning algorithms. Information embeds itself into the environment, and a clever engineer can extract that information from the most unexpected sources.

Lamphone: Real-Time Passive Sound Recovery from Light Bulb Vibrations,
Nassi, Ben and Pirutin, Yaron and Shamir, Adi and Elovici, Yuval and Zadov, Boris

Phenomenautica

Discussion about this post