Key Takeaways
I can put on a thick glove (losing touch and pressure sensitivity all together) and grab a fragile glass without breaking it.
An example of feed-forward manipulation is lifting a medium-sized object. Classic example is lifting a coffee cup. If you misjudge a full cup for empty you may spill the contents before your brain manages to replan the action based on sensory input. It takes around 300ms for that feedback loop to happen. We do many thing faster than that would allow.
The linked article has a great example of a task where a human needs feedback control: picking up and lighting a match.
Sibling comments also make a good point on that touch may well be necessary to learn the task. Babies do a lot of trial-and-error manipulation and even adults will do new tasks slower first.
Robots can also react much faster than 300ms. Sure, that massive transformer you put in charge of high level planning and reasoning probably isn't going to run at 200 tokens a second. But a dozen smaller control-oriented networks that are directly in charge of executing the planned motions can clock at 200 Hz or more. They can adjust fast if motor controllers, which know the position and current draw of any given motor at any given time, report data that indicates the grip is slipping.
Consider whether you could pick up that same fragile glass with your eyes closed? I’d wager you could, as you’d still receive (diminished) textile feedback despite the thick gloves.
Edit: and you probably are not gonna be as fast doing it
For 10,000 different problems. A great many of which have been solved in recent years.
Robotics is improving at a very fast clip, relative to most tech. I am unaware of any barrier, or any reason to infer there is one, for dextrous robots.
I think the primary difference between AI software models and services, and robotic AI, is economics.
The cost per task for AI software is .... very small. And the cost per task for a robot with AI is ... many orders of magnitude over that.
The marginal costs of serving one more customer are completely incomparable.
It's just a push of a button to replace the "fleet" of chatbots a million customers are using. Something unthinkable in the hardware world.
The seemingly lower level of effort and progress is because hardware that could operate in our real world with the same dexterity that ChatGPT/Claude can converse online, will be extremely expensive at first.
Robotics companies are not just focused on dexterity. They are focused on improvements to dexterity that stay within a very tight economic envelope. Inexpensive dexterity is going to take a while.
Pretraining data?
When we (ZenRobotics) tried this 15 years ago a big problem was the creation of sufficiently high-fidelity simulated worlds. Gathering statistics and modelling the geometry, brittleness, flexibility, surface texture, friction, variable density etc of a sufficiently large variety of objects was harder than gathering data from the real world.
I believe this is the most popular tool now: https://github.com/google-deepmind/mujoco
AFAICT these have not resulted in any shipping products.
Example of modern physics simulation: https://www.youtube.com/watch?v=7NF3CdXkm68
You can train in stages.
First stage, either digitally generate (synthetic) basic movements, or record basic human recorded movements of a model. The former is probably better and an generate endless variation.
But the model is only trying to control joint angles, position etc. no worries about controlling power. The simulate system has no complications like friction.
The you train with friction, joint viscosity, power deviance from demand based on up, down times, fade, etc.
Then train in a complex simulated environment.
Then train for control.
Etc.
The point being, robotic control is easy to be broken down into small steps of capability.
That massively improves training speed and efficiency, even potentially smaller models.
It is also a fear simpler task by many orders of magnitude to learning the corpus of the written internet.
Comparable to that, would be training an AI to operate with any land, sea or air device. Which, nobody today is trying, (AFAIK)
Eg: https://hub.jhu.edu/2025/07/09/robot-performs-first-realisti...
How hard can it be to consistently pick up boxes and set them down again in a different location? Pretty hard, apparently.
The problem with things like cardboard boxes, especially at any size is internal weight distribution and deformation of the box. If you take someone that is pretty new to stacking boxes at a wearhouse and give them sloppy boxes (ones that bend or otherwise shift) they are going to be pretty slow at it for the first hour or so, then we'll internalize the play in the materials and start speeding up considerably while getting a nice result.
It's pretty amazing how evolution has optimized us for feedback sensing like this.
I don't think there's a fundamental barrier to building a humanoid robot but the cost will be an extremely high barrier to adoption.
A human is nature's ultimate robot: hundreds of servos, millions of sensors, self-assembling from a bag of rice, self-repairing for minor damage. You just can't beat that, not for a very long time.
The problem with touch is making sensors that are cheap and durable and light and thin and repairable and sensitive and shape-conforming. Representation is trivial in comparison.
I've done that and similar things many times. Touch is important. It may not be essential for all tasks but it is for some. Maybe even many.
I would think the way to do it is build the touch sensors first (and it seems they're getting pretty close) then just tele-operate some robots and collect a ton of data. Either that, or put gloves on humans that can record. Pay people to live their normal lives but with the gloves on.
Totally agree. Wheels are cheaper, more durable and more effective than legs.
Human would have wheels if there was an evolution pathway to wheels.
Hmm, no, it sounds like it's externally powered:
> The style consists of a transparent glycoprotein rod which is continuously formed in a cilia-lined sac and extends into the stomach. The cilia rotate the rod, so that it becomes wrapped in strands of mucus.
https://en.wikipedia.org/wiki/Rotating_locomotion_in_living_...
Or maybe the cilia ( = wiggly hairs) could be seen as a kind of motor. Depends how you count it and exactly what the set-up is, I can't tell from this.
That's quite interesting.
It's hard to see one. Even a nice flat world with ample incentive and taking good "bearings" for granted, how can you evolve a wheel-organ that maintains a biological connection as well as being able to rotate an indefinite number of time?
A few difficult and grotesque endpoints:
* The wheel only rotates a fixed number of times before the creature must pivot and "unwind" in the opposite direction. This one seems most plausible, but it's not a real wheel.
* The main body builds replacement wheels internally (like tooth enamel) and periodically ejects a "dead" wheel which can be placed onto a spoke. This option would make it easier to generate very tough rim materials though.
* A biological quick-release/quick-connect system, where the wheel-organ disconnects to move, but then reconnects to flush waste and get more nutrients.
* A communal organism, where wheel-creatures are alive and semi-autonomous, with their own own way to acquire nutrients. Perhaps they would, er... suckle. Eeugh.
Maybe cartwheeling humans could lead to some adaptation where the whole body becomes the wheel.
Maybe in your living room. But step into a dense forest (which is what we are made for) and that statement will be far away from reality.
On the other hand there is not much reason to constrain ourselves to the unstable and tricky bipedal platform or insist on having a really top-heavy human-like torso. You could have 3-4 little legs on a dog scale body with several extra long upwards reaching arms for eg.
So embedding such a sensor in every rigid component, wiring a single data line to all of them (using the chassis as electrical ground) and feeding the data back to the model seems a trivial way to work around this problem without any kind of real pressure sensitivity. The model knows the inputs it gives to the actuators/servos, so it will quickly learn to predict the free mechanical behavior of the body, and use any deviation to derive data equivalent to pressure and force feedback.
Another possible source of data is the driving current of the motors/actuators which is proportional to the mechanical resistance the limb encounters. All sorts of garbage sources of data that were almost useless noise in the classical approach become valuable with a model large enough.
The "bitter lesson" says to stop trying to find simple rules for how to do things - stop trying to understand - and instead to use massive data and massive search to deal with all the incredibly fussy and intractable details magically.
But the article here is saying that the lesson is false at its root, because in fact lots of understanding is applied at the point of choosing and sanitising the data. So just throwing noise the model won't do.
This doesn't seem to match experience, where information can be gleaned from noise and "garbage sources of data ... become valuable with a model large enough", but maybe there's something illusory about that experience, IDK.
The problem is getting to that model state. Evolution found these models by creating a ridiculously huge number of experiments with the cut off function being 'can it breed before it dies on a limited number of calories'.
At least at this point it doesn't seem likely we can find a shortcut beyond that necessary computation. Evolution did it with time and parallelism. We do it differently with scale and rapid energy usage.
I don't know of anyone spending tens of billions on this problem like Microsoft did for OpenAI. First you'd have to build up a dataset of trillions of token equivalents for motion. What that looks like alone is largely guess work. Then you'll need to build a super computer to scale up the current sota motion model to 100 times the size of the biggest model today. Then you'll have to pretrain and finetune the models.
If after all that dexterity still isn't solved all we can say is that we need more data and bigger models.
People seriously don't understand how big big data for AI is and what a moonshot GPT3 and 4 were.
While the companies that have cars moving around safely all used a very diverse mix of human and ML created models. Completely on the face at the Bitter Lesson.
Where can you buy the artificial equivalent?
The article makes a compelling case that a certain kind of sensory input and learning is necessary to crack robotic movement in general, it remains to be seen if such a fine array of sensors as the human hand is useful outside very specific use-cases. A robot that can stock shelves reliably would still be immensely useful and very generalizable, even if it can't thread the needle due to limited fine sensory abilities.
Title of the article you're commenting: Why Today’s Humanoids Won’t Learn Dexterity
Thesis the article is contradicting: The idea is that humanoid robots will share the same body plan as humans, and will work like humans in our built for human environment. This belief requires that instead of building different special purpose robots we will have humanoid robots that do everything humans can do.
You are now arguing that a specialized robot lacking dexterity would still be immensely useful. Nobody is disputing that. It's just not what the article is about.
The problem is precisely the actuators. A lot of a human's muscles actually come in pairs - agonist and antagonist muscles [1], and it's hard to match the way human muscles work and their relatively tiny size in a non-biological actuator.
Just take your elbow and angle it to 90 degrees, then rapidly close it so your upper and lower arm are (almost) in parallel. An absolutely easy, trivial task to do for your pair of muscles controlling the tendons. But now, try to replicate even this small feat in a motor based actuator. You either use some worm gear to prevent the limb from going in the wrong direction but lose speed, or you use some sort of stepper motor that's very hard to control and takes up a lot of space.
[1] https://en.wikipedia.org/wiki/Anatomical_terms_of_muscle
That's trivial with the modern flat motors and position feedback. In fact, motors can do it faster and with more precision than we.
The only reason it was ever hard was because motors didn't have a lot of torque/volume.
The reason our muscles come in pairs is because they can only really apply force in one direction. Motors don't have this limitation, and don't need to be paired.
Anyway, motors still don't have enough torque density for making fine manipulators, and the lack of sensorial data will still stop you from interacting well with the outside world.
For some reason (judging by Fritz Lang, Gundam, etc.) humanity has some deep desire or curiosity for robots to look like humans. I wonder if cats want robot cats?
I don't think you can draw that conclusion. Most people find humanoid robots creepy. I think we have a desire for "Universal Robotics". As awesome as my dishwasher is, it's disappointing that it's nearly useless for any other task. Yeah, it washes my dishes, but it doesn't was my clothes, or put away the dishes. Our desire for a humanoid robot, I think, largely grows out of our desire for having a single machine capable of doing anything.
Of course, there is also the thing where authors and artists tend to draw anything with human intelligence as humans, from robots to aliens. Maybe it's the social reason you mention, or they just unconsciously have assumed humans to be the greatest design to ever exist. But even despite this, in a human world, I expect the first true general-purpose robots to be "standing" upright, with one or several arm-like limbs.
Also it would just be compatible with our current world.
One robot that rules them all is preferable from many perspectives, but we're simply not there yet.
We have lots of those.
I'm actually surprised or interested that this isn't more of a thing, it doesn't take any high tech either. I suppose people like having their own stuff, or people can't be trusted, or it's prohibitively expensive to outsource food / laundry (even if especially in the US ordering food or eating out is very common).
Side-rant: As cool as some cyberpunk/sci-fi ideas are, I can't imagine a widespread elective mechanical limb replacement within the lifetime of anyone here. We dramatically under-estimate how amazing our normal limbs are. I mean, they're literally swarms of nanobots beyond human comprehension. To recycle an old comment against mechanical limbs:
________
[...] just remember that you're sacrificing raw force/speed for a system with a great deal of other trade-offs which would be difficult for modern science to replicate.
1. Supports a very large number of individual movements and articulations
2. Meets certain weight-restrictions (overall system must be near-buoyant in water)
3. Supports a wide variety of automatic self-repair techniques, many of which can occur without ceasing operation
4. Is entirely produced and usually maintained by unskilled (unconscious?) labor from common raw materials
5. Contains a comprehensive suite of sensors
6. Not too brittle, flexes to store and release mechanical energy from certain impacts
7. Selectively reinforces itself when strain is detected
8. Has areas for the storage of long-term energy reserves, which double as an impact cushion
9. Houses small fabricators to replenish some of its own operating fluids
10. Subsystems for thermal management (evaporative cooling, automatic micro-activation)
_______________
I predict the closest thing we might see instead will be just growing replacement biological limbs, followed by waldoes where you remotely control an arm without losing your own.
Then another quote, "No one has managed to get articulated fingers (i.e., fingers with joints in them) that are robust enough, have enough force, nor enough lifetime, for real industrial applications."
So (3) and (7) are relevant to lifetime, but another point, related to sensors, is that humans will stop hurting themselves if finger strain occurs, such as by changing their grip or crying off the task entirely. Hands are robust because they can operate at the edge of safe parameters by sensing strain and strategizing around risk. Humans know to come in out of the rain, so to speak.
We severely underestimate how complex natural systems are. Autonomous agents seem like something we should be able to build. The idea is as old as digital computers. Turing famously wrote about that.
But an autonomous complex system is complex to an astronomical degree. Self driving vehicles, let alone autonomous androids, are several orders of magnitude more complex that we can even model.
I have read Wiener and Ashby to reach this conclusion. I've used this argument before. A piece of software capable of creating any possible software would be infinitely complex. Also the reason I don't buy the "20 w general intelligence exists". The wattage for generally intelligent humans would be the entire energy input to the biosphere up to the evolution of humans.
Planetary biospheres show general intelligence, not individual chunks of head meat.
Then you can connect it up to some input and output, and ... it exhibits intelligence somehow. Initially by screaming like a baby. Then it adapts to the knowledge implicit in its input and output systems ... and that's down to the designer. If it has suction cup end effectors and a CCD image sensor array doobrie ... I guess it's going to be clumsy and bewildered. But would it be noticeably intelligent? Could it even scream like a baby, actually? I suppose our brains are pre-evolved to learn to talk. Maybe this unfortunate person would only be able to emit a static hiss. I can't decide if I think it would ever get anywhere and develop appreciable smarts or not.
Similarly reading this article I agree with the author and I feel like what they're saying seems obvious. Of course making robots that can match humans' abilities is an absolutely insurmountable task. Yes, insurmountable as in I don't think we will ever do it.
Automating specific tasks in a factory is one thing, making a robot that can just figure out how to do things and learn like a human does is many orders of magnitude beyond. Even LLMs aren't there, as we can see from how they fail at basic tasks like counting the Rs in Raspberry. It's not intelligence it's just the illusion of intelligence. Actual intelligence requires learning. Not training. Actual intelligence won't run a command, fail to read it's output, make up the output and continue as if everything is fine while in fact nothing is fine. But LLMs will because they're stupid stochastic parrots, basically fancy search engines. It's really strange to me how everyone else seems blind to this.
Maybe if we some day figure out real artificial intelligence we will have a chance to make humanoids that can match our own abilities.
https://plasticity-lab.com/body-augmentation
https://www.carlosterminel.com/wearable-compass
https://www.madsci.org/posts/archives/mar97/858984531.Ns.r.h...
https://www.sciencedirect.com/science/article/pii/S096098220...
Bolting on extra senses, tools, limbs is no big deal.
Humans are also some of the most physically adaptable animals on the planet, in terms of being able to remodel our bodies to serve new tasks. "specific adaptation to imposed demand" is one of the things that really sets us (and a few other animals) apart in a remarkable way. Few animals can practice and train their bodies like we can.
In addition, I understand research shows that people with amputations very quickly adapt both practically and psychologically, as a general principle (some unfortunate folks are stuck with phantom pain and other adaptive issues).
The old discussion about "adding 20 minutes to your commute is worse than losing a leg below the knee" takes into account the fact that most people underestimate how large a negative effect commuting has, but also overestimate how large a negative effect losing a portion of a limb has.
Which seems to reuse the same brain wiring as what's used for controlling the body. To a professional backhoe operator, the arm of the backhoe is, in a very real way, his arm.
Curiously enough, most current neural interfaces don't seem to expose much of this flexibility. It's likely that you'd have to wire into premotor cortex for that - but for now, we're mostly using the primary motor cortex instead, because it's much better understood. The signals found there are more human-comprehensible and more prior work was done on translating them into useful motions.
Take the elbow joint and the muscles it's connected to. It supports very fine precision, slow speed operations as well as high speed but at the same time the same operation at high speeds - say, lifting yourself up on a horizontal bar, assuming adequate strength you can either do a slow or a fast lift, and both at enough precision and torque to prevent your body mass from impacting to the bar which is another feat in itself.
Now try to replicate that with a classic mechanical mechanism, you'll always lose either precision, speed or torque.
But then he goes on to vision, where the form that goes into vision processing today is an array of pixels. That's not much preprocessing. That's pretty much what existed at the image sensor. Older approaches to vision processing had feature extractors, with various human-defined feature sets. That was a dead end. Today's neural nets find their own features to extract.
Touch sensing suffers from sensor problems. A few high-detail skin-like sensors have been built. Ruggedness and wear are a big problem.
Consider, though, a rigid tool such as an end wrench. Humans can feel out the position of a bolt with an end wrench, get the wrench around the bolt, and apply pressure to tighten or loosen a nut. Yet the total information available is position plus six degrees of freedom of force. If the business end of your tool is rigid, the amount of info you can get from it is quite limited. That doesn't mean you can't get a lot done. (I fooled around with this idea pre-LLM era, but didn't get very far.) That's at least a way to get warmed up on the problem.
Here's a video of a surgeon practicing by folding paper cranes with small surgical tools.[1] These are rigid tools, so the amount of touch information available is limited. That's a good problem to work on.
Further robots generally have more than a single rigid manipulator.
Not sure which lab (I think google?) it was, but there was a recent demo of a ML-model driven robot that folded paper in that style as one of the tasks.
>10000 sensors with specific spatial layout and various specialised purposes (pressure / vibration / stretching / temperature) that require different mechanical connections between the sensor and the skin.
mechanical connection wouldn't be an issue if we lithograph the sensors right onto the "skin" similarly to chips.
The “more than 10000” also has a large impact in size (sensors need to be very small) and cost (you are not paying for one sensor but 10000).
Of course some applications can do with much less. IIUC the article is all about a _universal_ humanoid robot, able to do _all_ tasks.
But I think most of those can be replaced by existing robotics as well anyway. I mean take car manufacturing, over time more and more humans were replaced by robots, and nowadays the newest car factories are mostly automated (see lights-out manufacturing: https://en.wikipedia.org/wiki/Lights_out_(manufacturing)). Interestingly a Japanese robot factory has been lights-out since 2001, where they can run for 30 days on end without any lights, humans, heating or cooling.
And I could see it. With prevalence of screens kids already don't learn a lot of dexterity that previous generations have learned. Their grip strength is weak and capacity for fine 3d motions is probably underdeveloped as well.
Last week I've seen an intelligent and normally developing 7 year old kid asking mum to operate a small screwdriver to get to the battery compartment of a toy because that apparently was beyond his competence.
Now with recent developments in robotics, fully neural controllers and training in simulated environments there could be that modern babies will have very little tasks requiring dexterity left when they grow up.
This has almost nothing to do with nature (barring a development issue).
This has to do with nurture. Every time they went to do something with a tool a helicopter gunship of a parent showed up to tell them no. Now they have a learned helplessness when it comes to these things.
But that's not really any different then when I was a kid so very long ago. At 4 or 5 I was given a stack of old broken radios and took them to the garage for a rip and tear session. I got to look at all their pretty electronic guts that fascinated me. There were plenty of other parents of that time that would have been horrified to see their kids do something similar.
Take size, strength, precision, longevity, and speed. It's not hard to match or beat organic muscle fibers on one or two of these dimensions with an electrically driven system, but if it does, it's going to neglect other dimensions to such a degree as to put building a humanoid robot that achieves parity with a human completely out of reach.
You can slather as much AI as you want on top of inadequate hardware - it's not going to help.
Sure it takes a bigger motor to produce the same torque, but speed and precision are actually the strengths of electric motors. The fundamental problem with them is that reducers are not impact resistant and they have internal inertia, which is something muscles do not have. Another problem is building actuators with multiple degrees of freedom. The ideal configuration for legs is a ball joint, not two consecutive rotary joints.
> The center piece of my argument is that the brute force learning approaches that everyone rightfully touts as great achievements relied on case-specific very carefully engineered front-ends to extract the right data from the cacophony of raw signals that the real-world presents.
In nearly each of the preceding examples, isn't the argument really about the boundaries that define the learning machine? Just because data preparation / formatting / sampling / serialization is more cost-effective to do externally from the learning machine, doesn't mean that boundary is necessary. One could build all of this directly inside the boundary of the learning machine and feed it the raw, messy, real world signals.
Also, humans having plentiful learning aids doing "tokenization", as anyone who helped a child learn to count has experienced first hand.
But without it being done, it's an unproven hypothesis at best.
Also, the comment was not related to LLMs only.
Note that the goal is to get comparable performance, iow to compare like for like.
I spent a few minutes excitedly trying to figure out how one of my favourite declarative programming languages was used to solve modern robotic sensing problems, only to realise it was probably just a misspelling ... :(
A: False.
Next, insert a Standard screwdriver into a screw head, set the screw in place, and screw it in. In order to make it work, you have to push and torque it at the same time, and not let the blade slip out of the hole or damage the screw head.
If you think this is easy, try to teach a kid to do it. Watch them struggle to control the nut and the screwdriver.
Our hands are really, really good at both major motor control and very fine motor control.
But what the article misses: We can just rearrange our environment to make it easy to interact with by robots. There might be only standardized nuts and bolts, with IDs imprinted so the robots know exactly how to apply them. Dishes might come in certified robot-known dimensions, or with invisible marks on where to best grip them. Matchsticks might be replaced by standardized gas lighters. Maybe robot companies will even sell those themselves.
Standards are good, but then how about you take up pottery and make a plate yourself and now your robot won’t handle them…
An example that came to my mind: squeezing fruit juice requires a lot of dexterity. But if we sold pre-chopped fruit bits in a standardized satchet, then robots could easily squeeze the delicious and healthy fruit juice from those! And health-conscious people drink fruit juice every day, so this could easily be made into a subscription-based service! A perfect business model right there. You could call it iJuice or juice.ai or even Juicero.
That's not an important goal. The important goal is to optimize the life of the people that use the lines, not artificial measures taken from just looking at the machines running in them.
It's the same issue as self-driving cars: universal worker robots have to either learn to use the same things humans do, or never leave the labs.
They even ran on things like firewood, coal, or, for the first ICEs, relatively common liquid fuels that could be sourced in large cities.
Cars rely on gas stations today - but gas stations only became a thing after cars already became a thing.
Nowadays, Tesla had to make Superchargers happen all by themselves before EVs could really happen - despite EVs already having the crushing advantage of being able to charge overnight in any garage that has power.
Can you see a robot company branching out to be a competitor to McDonalds to prove that their kitchen robot is viable if you design the entire kitchen for it? Well, it's not entirely impossible, but I think it unlikely.
From that to every manufacturer adopting the standard on every product, independently of the client you just need some competition on their market. I dunno if there is any, but it's not a lot.
But to me thats the end state of this conversation. Like lets take shipping as an example, we came up with pallets and containers not because they're useful for a person to move but because they're helpful for robots (and analogs) to move. People aren't born with palletjacks for hands. So to me it seems as you add more robotics into the kitchen you're going to slowly change your supplies to arrive in more robotic form.
Which at that point is really just the Japanese train system and surrounding infrastructure, which many places (at least in the US) don't seem capable or willing to make happen.
(They could give the robot instructions on how to set up their furniture as well, the business plan really writes itself)
(I'm not disagreeing with the author, just sharing an article that is interesting/relevant.)
56 more comments available on Hacker News
Not affiliated with Hacker News or Y Combinator. We simply enrich the public API with analytics.