High-Resolution Efficient Image Generation From Wifi Mapping

Posted3 months agoActive3 months ago

oldfuture

143 points

35 comments

arxiv.orgResearchstory

skepticalmixed

Debate

80/100

Wifi ImagingDiffusion ModelsSurveillance

Key topics

Wifi Imaging

Diffusion Models

Surveillance

Researchers have developed a method to generate high-resolution images from WiFi mapping data using diffusion models, sparking debate about the technology's capabilities, limitations, and potential surveillance implications.

Snapshot generated from the HN discussion

Discussion Activity

Active discussion

First comment

Peak period

2-4h

Avg / period

4.4

Comment distribution35 data points

Loading chart...

Based on 35 loaded comments

Key moments

01Story posted
Oct 1, 2025 at 2:33 AM EDT
3 months ago
Step 01
02First comment
Oct 1, 2025 at 3:42 AM EDT
1h after posting
Step 02
03Peak activity
15 comments in 2-4h
Hottest window of the conversation
Step 03
04Latest activity
Oct 2, 2025 at 2:35 AM EDT
3 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (35 comments)

Showing 35 comments

jychang

3 months ago

2 replies

The image examples from the paper are absolutely insane.

Is this just extremely overfitted?

Is there a way for us to test this? Or even if the model isn't open source, I'd pay $1 to upload a capture from my wifi card on my linux box and upload it to the researchers and have them generate a picture and see if it's accurate

RicDan

3 months ago

1 reply

Yeah this seems too insane to be true. I understand that wifi signal strength etc. is heavily impacted by the contents of a room, but even so it seems farfetched that there is enough information in its distortion to lead to these results.

esrh

3 months ago

A lot of wifi sensing results that have high-dimensional outputs are usually using wideband links... your average wifi connection uses 20MHz of bandwidth and is transmitting on 48 spaced out frequencies. In the paper, we use 160MHz with effectively 1992 input data points. This still isn't enough to predict a 3x512x512 image well enough, which motivated predicting 4x64x64 latent embeddings instead.

The more space you take up in the frequency domain, the higher your resolution in the time domain is. Wifi sensing results that detect heart rate or breathing, for example, use even larger bandwidth, to the point where it'd be more accurate to call them radars than wifi access points.

tylervigen

3 months ago

That is not how it works. The images of the room are included in the generative training model. The wifi is "just" helping identify the locations of objects in the room.

If you uploaded a random room to the model without retraining it, you wouldn't get anything as accurate as the images in the paper.

fxtentacle

3 months ago

2 replies

FYI the images are not generated based on the WiFi data. The WiFi data is used as additional conditioning for a regular diffusion image generation model. So what that means is the WiFi measurements are used for determining which objects to place where in the image, but the diffusion model will then fill in any "knowledge gaps" with randomly generated (but visually plausible) data.

jstanley

3 months ago

3 replies

I'm confused about how it gets things like the floor colour and clothing colour correct.

It seems like they might be giving it more information besides the WiFi data, or else maybe training it on photos of the actual person in the actual room, in which case it's not obvious how well it would generalise.

gblargg

3 months ago

It wouldn't generalize at all. The Wi-Fi is just differentiating among a small set of possible object placement/orientations within that fixed space, then modifying photos taken appropriately, as far as I can tell.

Aurornis

3 months ago

> I'm confused about how it gets things like the floor colour and clothing colour correct.

The model was trained on the room.

It would produce images of the room even without any WiFi data input at all.

The WiFi is used as a modulator on the input to the pre trained model.

It’s not actually generating an image of the room from only WiFi signals.

f_devd

3 months ago

This is what GP eludes to, the original dataset has many similar reference images (i.e. the common mode is the same), and the LatentCSI model is tasked to reconstruct the correct specific instance (or a similarly plausible image in case of the test/validation set)

esrh

3 months ago

Think of it as an img2img stable diffusion process, except instead of starting with an image you want to transform, you start with CSI.

The encoder itself is trained on latent embeddings of images in the same environment with the same subject, so it learns visual details (that are preserved through the original autoencoder; this is why the model can't overfit on, say, text or faces).

equinox_nl

3 months ago

4 replies

I'm highly skeptical about this paper just because the resulting images are in color. How the hell would the model even infer that from the input data?

orbital-decay

3 months ago

1 reply

That's just a diffusion model (Stable Diffusion 1.5) with a custom encoder that uses CSI measurements as input. So apparently the answer is it's all hallucinated.

pftburger

3 months ago

2 replies

Right but it’s hallucinating the right colours which to me feels like some data is leaking somewhere. Because no way wifi sees colours

moffkalast

3 months ago

3 replies

Well perhaps it can, a 2.4Ghz antenna is just a very red lightbulb. Maybe material absorption correlates, though it would be a long shot?

steinvakt2

3 months ago

1 reply

If it sees the shape of a fire extinguisher, the diffusion model will "know" it should be red. But that's not all that's going on here. Hair color etc seems impossible to guess, right? To be fair I haven't actually read the paper so maybe they explain this

defraudbah

3 months ago

downvoted until you read the paper

jstanley

3 months ago

You can't even pick colour out of infra-red-illuminated night time photography. There's no way you can pick colour out of WiFi-illuminated photography.

AngryData

3 months ago

There would be some correlation between the visual color of objects and the spectrum of an object in another EM frequency, many object's color share the same dye or pigment materials, but it seems pretty unlikely that it would be reliable at all with a spectrum of different objects and materials and dyes because there is no universal RGB dye or pigment set we rely upon. You can make the same red color many different ways but each material will have different spectral "colors" outside of the visual range. Even something simple like black plastics can be completely transparent in other spectrums like the PS3 was to infrared. Structural colors would probably be impossible to see discern however I don't think too many household objects have structural colors unless you got a stuffed bird or fish on the wall.

HeatrayEnjoyer

3 months ago

1 reply

Different materials and dyes have different dialectical properties. These examples are probably confabulation but I'm sure it's possible in principle.

plorg

3 months ago

Assuming you mean dielectric, but I do like the idea that different colors are different arguments in conflict with each other.

dtj1123

3 months ago

This is largely guesswork but I think whats happening is this. The training set contains images of a small number of rooms taken from specific camera angles with only that individual standing in it, and associated wifi signal data. The model then learns to predict the posture of the individual given the wifi signal data, outputting the prediction as a colour image. Given that the background doesn't vary across images, the model learns to predict it consistently with accurate colors etc.

The interesting part of the whole setup is that the wifi signal seems to contain the information required to predict the posture of the individual to a reasonably high degree of accuracy, which is actually pretty cool.

anthonj

3 months ago

It is an overfitted model thst use WiFi data as hints for generation:

"We consider a WiFi sensing system designed to monitor indoor environments by capturing human activity through wireless signals. The system consists of a WiFi access point, a WiFi terminal, and an RGB camera that is available only during the training phase. This setup enables the collection of paired channel state information (CSI) and image data, which are used to train an image generation model"

meindnoch

3 months ago

The model was trained on images of that particular room, from that particular angle. It can only generate images of that particular room.

malux85

3 months ago

2 replies

PSA: If you publish a paper that talks about high resolution images can you please include at least 1 high resolution image.

I know that is a subjective metric but by anyone’s measure a 4x4 matrix of postage stamp sized images are not high resolution.

mistercow

3 months ago

1. “High resolution” in this kind of context is generally relative to previous work.

2. “Postage stamp sized” is not a resolution. Zoom in on them and you’ll see that they’re quite crisp.

amagasaki

3 months ago

The HTML version has much larger images

nntwozz

3 months ago

1 reply

One step closer to The Light of Other Days.

"When a brilliant, driven industrialist harnesses the cutting edge of quantum physics to enable people everywhere, at trivial cost, to see one another at all times: around every corner, through every wall, into everyone's most private, hidden, and even intimate moments. It amounts to the sudden and complete abolition of human privacy--forever."

nashashmi

3 months ago

So privacy is a mathematical function using variables of cost, capability, control, reach?

esrh

3 months ago

1 reply

This is my paper (first author).

I think the results here are much less important and surprising than what some people seem to be thinking. To summarize the core of the paper, we took stable diffusion (which is a 3-part system of an encoder, u-net, decoder), and replaced the encoder to use WiFi data instead of images. This gives you two advantages: you get text-based guidance for free, and the encoder model can be smaller. The smaller model combined with the semantic compression from the autoencoder gives you better (SOTA resolution) results, much faster.

I noticed a lot of discussion about how the model can possibly be so accurate. It wouldn't be wrong to consider the model overfit, in the sense that the visual details of the scene are moved from the training data to the model weights. These kinds of models are meant to be trained & deployed in a single environment. What's interesting about this work is that learning the environment well has become really fast because the output dimension is smaller than image space. In fact, it's so fast that you can basically do it in real time... you turn on a data collection node and can train a model from scratch online, in a new environment that gets decent results with at least a little bit of interesting generalization in ~10min. I'm presenting a demonstration of this at Mobicom 2025 next month in Hong Kong.

What people call "WiFi sensing" is now mostly CSI (channel state information) sensing. When you transmit a packet on many subcarriers (frequencies), the CSI represents how the data on each frequency changed during transmission. So, CSI is inherently quite sensitive to environmental changes.

I want to point out something that most everybody working in the CSI sensing/general ISAC space seems to know: generalization is hard and most definitely unsolved for any reasonably high-dimensional sensing problem (like image generation and to some extent pose estimation). I see a lot of fearmongering online about wifi sensing killing privacy for good, but in my opinion we're still quite far off.

I've made the project's code and some formatted data public since this paper is starting to pick up some attention: https://github.com/nishio-laboratory/latentcsi

phh

3 months ago

1 reply

Is there a survey of SoTA of what can be achieved with CSI sensing you would recommend?

What is available on the low level? Are researchers using SDR, or there are common wifi chips that properly report CSI? Do most people feed in CSI of literally every packet, or is it sampled?

esrh

3 months ago

I'd suggest reading https://dl.acm.org/doi/abs/10.1145/3310194 (2019) for a survey on early methods and https://arxiv.org/abs/2503.08008.

As for low level:

The most common early hardware was afaik esp32s & https://stevenmhernandez.github.io/ESP32-CSI-Tool/, and also old intel NICs & https://dhalperi.github.io/linux-80211n-csitool/.

Now many people use https://ps.zpj.io/ which supports some hardware including SDRs, but I must discourage using it, especially for research, as it's not free software and has a restrictive license. I used https://feitcsi.kuskosoft.com/ which uses a slightly modified iwlwifi driver, since iwlwifi needs to compute CSI anyway. There are free software alternatives for SDR CSI extraction as well; it's not hard to build an OFDM chain with GNUradio and extract CSI, although this might require a slightly more in-depth understanding of how wifi works.

nashashmi

3 months ago

Where is the color info coming from? It can’t come from WiFi. Is that being fed in using a photo?

Lumoscore

3 months ago

Honestly, this whole field of research—turning WiFi signals into visual data—is one of the coolest and scariest things happening in tech right now.

Basically, researchers figured out how to use the invisible radio waves from your Wi-Fi router to create surprisingly clear pictures of whatever is around it, even if there are walls in the way.

Your router is constantly firing out radio signals, right? When those signals hit a person, a dog, or a chair, they bounce off and create a unique echo pattern. This echo pattern is called CSI (Channel State Information). It's a precise digital "shadow" of everything in the room. Turning that messy echo pattern into an actual picture used to be super difficult and slow. But now, they use a fancy type of AI—the same kind that generates images when you type a prompt—to do the heavy lifting.

The AI is super smart and knows how to instantly translate that invisible echo pattern into a high-resolution image.

So the Big Picture is, It's like they've figured out how to use your average home Wi-Fi to "see" without light or a camera, and they can do it so efficiently (quickly and cheaply) that it might become a normal thing.

It’s pretty wild, and the applications are huge—especially for things like monitoring the health of older people without putting cameras in their rooms. Of course, it also means walls don't stop surveillance anymore, which is kind of unsettling!

brcmthrowaway

3 months ago

So the applications of this work is.. surveillance. Why are there people working in this space?

cracki

3 months ago

So they trained a model on a handful of poses? OK cool.

Any "unknown" state of the scene is bound to confuse it.

View full discussion on Hacker News

ID: 45434941Type: storyLast synced: 11/20/2025, 4:44:33 PM

Want the full context?