EyesOff: How I built a screen contact detection model
Mood
thoughtful
Sentiment
positive
Category
tech
Key topics
machine learning
computer vision
screen contact detection
The author shares their experience building a screen contact detection model, detailing their approach and techniques used.
Snapshot generated from the HN discussion
Discussion Activity
Moderate engagementFirst comment
2d
Peak period
8
Day 3
Avg / period
5
Based on 10 loaded comments
Key moments
- 01Story posted
11/15/2025, 8:42:33 AM
4d ago
Step 01 - 02First comment
11/17/2025, 2:40:01 PM
2d after posting
Step 02 - 03Peak activity
8 comments in Day 3
Hottest window of the conversation
Step 03 - 04Latest activity
11/18/2025, 10:04:12 PM
11h ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
They create this CNN for exactly this task, autism diagnosis in children. I suppose this model would work for babies too.
Put a tablet in front of a baby. Left half has images of gears and stuff, right half has images of people and faces. Does the baby look at the left or right half of the screen? This is actually pretty indicative of autism and easy to put into a foolproof app.
The linked github is recording a video of an older child's face while they look at a person who is wearing a camera or something, and judging whether or not they make proper eye contact. This is thematically similar but actually really different. Requires an older kid, both for the model and method, and is hard to actually use. Not that useful.
Intervening when still a baby is absolutely critical.
P.S., deciding which half of a tablet a baby is looking is MUCH MUCH easier than gaze tracking. Make the tablet screen bright white around the edges. Turn brightness up. Use off the shelf iris tracking software. Locate the reflection of the iPad in the baby's iris. Is it on the right half or left half of the iris? Adjust for their position in FOV and their face pose a bit and bam that's very accurate. Full, robust gaze tracking is a million times harder, believe me.
Is there any research/papers on this type of autism diagnosis tools for babies?
To your last point, yes I agree. Even the task I setup the model for is relatively easy compared to proper gaze tracking, I just rely on large datasets.
I suppose you could do it in the way you say and then from that gather data to eventually build out another model.
I'll for sure look into this, appreciate the idea sharing!
FYI: I got funding and gathered really big mobile phone gaze datasets and trained CNN models on them that got pretty accurate. Avg err below 1cm.
The whole thing worked like this:
Mech Turk workers got to play a mobile memory game. A number flashed on the screen at a random point for a second while the phone took a photo. Then they had to enter it. If they entered it correctly, I assumed they were looking at that point at the screen when the photo was taken and added it to the gaze dataset. Collecting clean data like this was very important for model accuracy. Data is long gone, unfortunately. Oh and the screen for the memory game was pure white, which was essential for a reason described below.
The CNN was a cascade of several models:
First, off the shelf stuff located the face in the image.
A crop of the face was fed to an iris location model we trained. This estimated eye location and size.
Crops of the eyes were fed into 2 more cycles of iris detection, taking a smaller crop and making a more accurate estimation until the irises were located and sized to within about 1 pixel. Imagine the enhance... enhance... trope.
Then crops of the super well centered and size-normalized irises, as well as a crop of the face, were fed together into a CNN, along with metadata about the phone type.
This CNN estimated gaze location using the labels from the memory game derived data.
This worked really well, usually, in lighting the model liked. Failed unpredictably in other lighting. We tried all kinds of pre-processing to address lighting but it was always the achilles heel.
To my shock, I eventually realized (too late) that the CNN was learning to find the phone's reflection in the irises, and estimating the phone position relative to the gaze direction using that. So localizing the irises extremely well was crucial. Letting it know what kind of phone it was looking at, and ergo how large the reflection should appear at a certain distance, was too.
Making a model that segments out the phone or tablet's reflection in the iris is just a very efficient shortcut to do what any actually good model will learn to do anyway, and it will remove all of the lighting variation. Its the way to make gaze tracking on mobile actually work reliably without infrared. Never had time to backtrack and do this because we ran out of funding.
The MOST IMPORTANT thing here is to control what is on the phone screen. If the screen is half black, or dim, or has random darkly colored blobs, it will screw with where the model things the phone screen reflection begins and ends. HOWEVER if your use case allows you to control what is on the screen so it always has for instance a bright white border, your problem is 10x easier. The baby autism screener would let you control that for instance.
But anyway, like I said, to make something that just determines if the baby is looking on one side of the screen or the other, you could do the following:
1. Take maybe 1000 photos of a sampling of people watching a white tablet screen moving around in front of their face. 2. Annotate the photos by labeling visible corners of the reflection of the tablet screen in their irises 3. Make simple CNN to place these
If you can also make a model that locates the irises extremely well, like to within 1px, then making the gaze estimate becomes sort of trivial with that plus the tablet-reflection-in-iris finder. And I promise the iris location model is easy. We trained on about 3000-4000 images of very well labeled irises (with circle drawn over them for the label) with a simple CNN and got really great sub-pixel accuracy in 2018. That plus some smoothing across camera frames was more than enough.
Anyway, hope some of this helps. I know you aren't doing fine-grained gaze tracking like this but maybe something in there is useful.
Do you remember the cost of Mech Turk? It was something I wanted to use for EyesOff but never could get around the cost aspect.
I need some time to process everything you said, but the EyesOff model has pretty low accuracy at the moment. I'm sure some of these tidbits of info could help to improve the model, although my data is pretty messy in comparison. I had thought of doing more gaze tracking work for my model, but at long ranges it just breaks down completely (in my experience, happy to stand corrected if you're worked on that too).
Regarding the baby screener, I see how this approach could be very useful. If I get the time, I'll look into it a bit more and see what I can come up with. I'll let you know once I get round to it.
We paid something like $10 per hour and people loved our tasks. We paid a bit more to make sure our tasks were completed well. The main thing was just making the data collection app as efficient as possible. If you pay twice as much but collect 4x the data in the task, you doubled your efficiency.
Yeah I think its impossible to get good gaze accuracy without observing the device reflection in the eyes. And you will never, ever be able to deal well with edge cases like lighting, hair, glasses, asymmetrical faces, etc. There's just a fundamental information limit you can't overcome. Maybe you could get within 6 inches of accuracy? But mostly it would be learning face pose I assume. Trying to do gaze tracking with a webcam of someone 4 feet away and half offscreen just seems Sisyphean.
Is EyesOff really an important application? I'm not sure many people would want to drain their battery running it. Just a rhetorical question, I don't know.
With the baby autism screener its difficult part is the regulatory aspect. I might have some contacts in Mayo Clinic that would be interested in productizing something like this though, and could be asked about it.
ha, thats probably why I noticed the EyesOff accuracy drops so much at longer ranges, I suppose two models would do better but atm battery drain is a big issue.
I'm not sure if it's important or not, but the app comes from my own problems working in public so I'm happy to continue working on it. I do want to train and deploy an optimised model, something much smaller.
Sounds great, once a POC get's built I'll let you know and can see about the clinical side.
Thanks for the tips! I'll be sure to post something and reach out if I get round to implementing such a model.
I remember a guy watching a video them looking up and it paused, etc
4 more comments available on Hacker News
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.