Alterego: Thought to Text
Posted4 months agoActive4 months ago
alterego.ioTechstoryHigh profile
skepticalmixed
Debate
80/100
Brain-Computer InterfaceSilent Speech RecognitionAssistive Technology
Key topics
Brain-Computer Interface
Silent Speech Recognition
Assistive Technology
Alterego is a device that claims to translate silent speech into text, sparking both excitement and skepticism among HN commenters about its potential applications and feasibility.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
N/A
Peak period
31
0-3h
Avg / period
10.7
Comment distribution128 data points
Loading chart...
Based on 128 loaded comments
Key moments
- 01Story posted
Sep 8, 2025 at 5:17 PM EDT
4 months ago
Step 01 - 02First comment
Sep 8, 2025 at 5:17 PM EDT
0s after posting
Step 02 - 03Peak activity
31 comments in 0-3h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 10, 2025 at 8:27 AM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45174125Type: storyLast synced: 11/20/2025, 5:42:25 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
I suspect it's EMG though muscles in the ear and jaw bone, but that seems too rudimentary.
The TED talk describes a system which includes sensors on the chin across the jaw bone, but the demo obviously has removed that sensor.
I want to know is what are the connected to? A laptop? A AS400? An old Cray they have lying around? I'd think doing the demo while walking would have been de riguer.
Anyway, tres cool!
[1] https://www.media.mit.edu/publications/alterego-IUI/
I have to wonder, if they have enough signal to produce what essentially looks like speech-to-text (without the speech), wouldn't it be possible to use the exact same signal to directly produce the synthesized speech? It could also lower latency further by not needing extra surrounding context for the text to be pronounced correctly.
(I think it was https://en.wikipedia.org/wiki/Oath_of_Fealty_%28novel%29 but can't find enough details to confirm.)
This is an LLM model thing. Plenty of open source (or at least MIT licensed) LLMs and TTS models exist that translate and can be zero shot trained on a user's speech. Direct audio to audio models tend to be less researched and less advanced than the corresponding (but higher latency) audio to text to audio pipelines.
That said you can get audio->text->audio down to 400ms or so latency if you are really damn good at it.
I'm sure that's not the last word though!
I don't really buy that typing speed is a bottleneck for most people. We can't actually think all that fast. And I suspect AI is doing a lot of filling in the gaps here.
It might have some niche use cases, like being able to use your phone while cycling.
I can break 100wpm, especially if I accept typos. It's still much, much slower to type than I can think.
That’s already solved by AI, if you let AI listen to your meetings.
> Just provide some context on the company etc
The necessary “context” includes at least the name and pronunciation of the names of all workers of a company with a non English first name, so it's far from trivial.
Was that deliberate, or a typo? I am genuinely wondering!
https://www.openstenoproject.org/plover/
Also, keybr.com helps speed up typing if you were thinking about it.
So this definitely wouldn't help me here. Realistically though, there ought to be better solutions like something that just listens to the meeting and automatically takes notes.
https://xkcd.com/341/
Puling out my phone, unlocking it, remembering what the hotkey is today for starting google/gemini, is a bottle neck. Damned if I can remember what random gesture lets me ask Gemini to take a note today (presumably gemini has notes support now, IIRC the original release didn't).
Finding where Google stashes todo items at, also a bottle neck. Of course that entails me getting my phone out and navigating to whatever notes app (for awhile todos/notes were inside a separate Google search app!) they are shoved into.
My Palm Pilot from 2000 had more usability than a modern smartphone.
This device can solve all of those issues.
If you're wearing it at the right moment.
That is if this exists. But if it does, it's not what you think it does for you.
As to whether typing speed is a bottleneck for most people, maybe not most people, but definitely some people, and it's a massive bottleneck for me personally.
I think better when I'm talking and since I have started using speech to text, it has increased my writing speed and coding speed by at least an order, maybe two orders of magnitude.
But you are right, the AI filling in gaps can really cause trouble using speech, goodness knows what it's doing using sub-speech.
Honestly I have no idea if it's fake. I wouldn't be surprised if it's both fake and real: the actual video is entirely fake, but a reasonably accurate demonstration of actual capabilities (like a lot of tech demos at live events...)
But more having a conversation with a really fast coding agent. That should feel like you’re micro-managing an intern as they code really fast, you could start describing the problem and it could start coding and you interject and tell it to do do things differently. There the bottleneck would be typing, especially if you have fast inference. But with voice now you’re coding at the speed of your thoughts.
I think doing that would be super cool but awkward if you’re talking out loud in an office, that’s where this device would come in.
depends on what they are connected to in the back there.
also adding their press release here:
https://docsend.com/view/dmda8mqzhcvqrkrk/d/fjr4nnmzf9jnjzgw
One of the major ways you can speed up reading, is that you stop 'vocalizing' each word in your head. It does seem that thinking is much faster than 'thinking aloud' (in your head)
https://www.media.mit.edu/projects/alterego/overview/
check also the publications tab, and this pr:
https://docsend.com/view/dmda8mqzhcvqrkrk/d/fjr4nnmzf9jnjzgw
https://www.media.mit.edu/projects/alterego/frequently-asked...
I wonder how far they've gotten past it.
I think its cool, I've been brainstorming how a good MCI would work for a while and didn't think of this. I think its a great novel approach that will probably be expanded on soon.
I guess I also kind of enjoy the physical sensations of putting a key in a lock, opening the door etc. Definitely don't want a digital-only existence.
You wouldn't use a regular WIMP[1] paradigm with this, that completely defeats the advantages you have. You don't need to have a giant window full of icons and other clickable/tappable UI elements, that becomes pointless now.
[1]https://en.wikipedia.org/wiki/WIMP_(computing)
But for me speed isn't even the issue. I can dictate to Siri at near-regular-speech speeds -- and then spend another 200% of the time that took to fix what it got wrong. I have reasonable diction and enunciation, and speech to text is just that bad while walking down the street. If this is as accurate as they're showing, it would be worth it just for the accuracy.
https://x.com/keleftheriou/status/1963399069646426341
Going from voice input to silent voice input is a huge step forward for UX.
But I'm sceptical about this specific company with the lack of technical details.
Literacy rates in the US are already garbage, this device may just make it worse. If people never have to read or write, why would they bother learning how?
I agree for productivity use cases it isn't suitable, but that is only important for information workers, which are not the largest segment of society. The fact is, Gen Z and Alpha are already having issues using the high information density desktop paradigm, and technology like this will only work to further erode the needed capabilities of the average citizen. Doesn't bode well for democracy and all that.
- There is a ML model which was trained on 31 hours of silently spoken text. That’s the training data. You still need to know the red fruit in front of you is called apple bc that’s what the model is trained on. So you must be literate to get this working.
- The accuracy in the paper is on a very small text type, numerals. As much as I could understand, they asked users to do mathematical operations and they checked the accuracy on that. Someone with a deeper understanding please correct me.
- Most of the video demo(honestly) is meh, once you have the text input for a LLM, you are limited to what the LLM can do. The real deal is the ml model that translates the neuromuscular signals to actual words. Those signals must be super noisy. So training a model with only 31 hours of data is a bit surprising and impressive. But the model would probably require calibration for each user’s silent voice, like say this sentence silently , “a quick brown fox jumped over the rope”. I think this will be cool.
- I really hope this tech works. I really really hope they don’t sell to big tech jerks like Meta. I really really really hope this tech removes screens from our lives(or at least a step in the right direction).
Literacy is about written text, not spoken words. I think you've confused it with fluency.
> Alterego only responds to intentional, silent speech.
What exactly do they mean by this? Some kind of equivalent to subvocalization [1]?
[1] https://en.wikipedia.org/wiki/Subvocalization
If that is what is happening, to me it feels like harder work than just speaking (similar to how singing softly but accurately can be very hard work). It would still be pretty cool, but only practical in use cases where you have to be silent and only for short periods of usage.
As a disability speech aid though maybe it would be amazing?
(My current solution is to tear the fingertip off my offhand glove so I can unlock and use my device....)
As for the privacy thing, I would say that I absolutely hate talking out loud to my devices. Just the idea of talking my ideas into a recorder in my own office where nobody can hear me feels very strange to me. But I love thinking through ideas and writing scripts for speeches or presentations in my mind, or to plan out some code or overall project. A device like this would allow me to do the internal monologue thing, then turn to "silent speak" them into this device to take notes which sounds great. And the form-factor doesn't look that dissimilar to a set a bone-conduction headsets which would be perfect for privacy-aware feedback while allowing you to take in your surroundings.
With this tech demo though it seems like the transmission rate is veeery slow, he sits still in his chair staring into the room and a short sentence is all that appears. Not exactly speed of thought..
And of course there is the cable running off to who knows what kind of computational resources.
The AI parts of this are less exciting to me, but as an input device I'm really on-board with the idea.
Anyhow, Alterego just seems like another vaporware product, that will never enter or even begin to penetrate the overall market. But let's see!
So they came up with this groundbreaking idea but couldn't come up with better use case then typing on a train.
Look, I can't but not appreciate that at least they are doing something interesting as opposed to vibe one shot fork of vs code things that we see.
Seems like vaporware.
https://www.media.mit.edu/projects/alterego/overview/
adding also their press release here:
https://docsend.com/view/dmda8mqzhcvqrkrk/d/fjr4nnmzf9jnjzgw
There’s endless comedy about the confusion on a bus when someone's talking into Bluetooth and their neighbor thinks they’re being addressed. Silent Sense + AR gets your eyes up and around you, fixes posture, frees your hands and keeps the guy next to you out of the conversation.