What Makes 5% of AI Agents Work in Production?

Posted3 months agoActive3 months ago

AnhTho_FR

126 points

121 comments

motivenotes.aiTechstoryHigh profile

heatedmixed

Debate

80/100

AILlmsAgentic SystemsContext Engineering

Key topics

Llms

Agentic Systems

Context Engineering

The article discusses the challenges of deploying AI agents in production, with a panel of experts suggesting that 95% of deployments fail due to lack of scaffolding around the models, while the discussion revolves around the validity of this claim and the complexities of building reliable AI systems.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

Peak period

108-120h

Avg / period

24.2

Comment distribution121 data points

Loading chart...

Based on 121 loaded comments

Key moments

01Story posted
Oct 2, 2025 at 6:30 PM EDT
3 months ago
Step 01
02First comment
Oct 6, 2025 at 10:01 PM EDT
4d after posting
Step 02
03Peak activity
59 comments in 108-120h
Hottest window of the conversation
Step 03
04Latest activity
Oct 8, 2025 at 11:36 PM EDT
3 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (121 comments)

Showing 121 comments

another_twist

3 months ago

2 replies

Its weird that this makes the front page and Metas code world model never did.

CuriouslyC

3 months ago

1 reply

HN front page dynamics are heavily driven by having readers of /new who are stans for your content.

mnky9800n

3 months ago

1 reply

is that an eminem reference?

esafak

3 months ago

1 reply

now entered the lexicon among the younger crowd.

mnky9800n

3 months ago

That eminem song is 25 years old.

metadat

3 months ago

First I've heard of it:

https://ai.meta.com/research/publications/cwm-an-open-weight...

monero-xmr

3 months ago

1 reply

A non-open ended path collapses into a decision tree. Very hard to think of customer support use-cases that do not collapse into decision trees. Most prompt engineering on the SaaS side results in very long prompts to re-invent decision trees and protect against edge cases. Ultimately the AI makes a “decision function call” which hits a decision tree. LLM is very poor replacement for a decision tree.

I use LLM every day of my life to make myself highly productive. But I do not use LLM tools to replace my decision trees.

LPisGood

3 months ago

2 replies

It just occurred to me that with those massive system files people use we’re basically reinventing expert systems of the past. Time is a flat circle, I suppose.

LostMyLogin

3 months ago

2 replies

Any chance you can ELI5 this to me?

dmbche

3 months ago

Just search "expert system"

esafak

3 months ago

https://en.wikipedia.org/wiki/Decision_tree

schrodinger

3 months ago

A decision tree is simply a model where you follow branches and make a decision at each point. Like...

If we had tech support for a toaster, you might see:

    if toaster toasts the bread:
      if no: has turning it off and on again worked?
        if yes: great! you found a solution
        if no: hmm, try ...
      if yes:
        is the bread burnt after?
          if no: sounds like your toaster is fine!
          if yes: have you tried adjusting the darkness knob?
            if no: ship it in for repair
            if yes: try replacing the timer. does that help?
              if no: ship it in for repair
              if yes: yay you're toaster is fixed

jongjong

3 months ago

1 reply

It's interesting because my management philosophy when delegating work has been to always start by telling people what my intent is, so that they don't get too caught up in a specific approach. Many problems require out-of-the-box thinking. This is really about providing context. Context engineering is basically a management skill.

Without context, even the brightest people will not be able to fill in the gaps in your requirements. Context is not just nice-to-have, it's a necessity when dealing with both humans and machines.

I suspect that people who are good engineering managers will also be good at 'vibe coding'.

HardCodedBias

3 months ago

"I suspect that people who are good engineering managers will also be good at 'vibe coding'."

I have observed that those who have both technical and management experience seem to be more adept (or perhaps willing?) to use LLMs in the daily life to good effect.

Of course what really helps, like in all things, is conscientiousness and an obsession for working through problems (if people don't like obsession then tenacity and diligence).

sbierwagen

3 months ago

7 replies

>This Monday, I moderated a panel in San Francisco with engineers and ML leads from Uber, WisdomAI, EvenUp, and Datastrato. The event, Beyond the Prompt, drew 600+ registrants, mostly founders, engineers, and early AI product builders.

>We weren’t there to rehash prompt engineering tips.

>We talked about context engineering, inference stack design, and what it takes to scale agentic systems inside enterprise environments. If “prompting” is the tip of the iceberg, this panel dove into the cold, complex mass underneath: context selection, semantic layers, memory orchestration, governance, and multi-model routing.

I bet those four people love that the moderator took a couple notes and then asked ChatGPT to write a blog post.

As always, the number one tell of LLM output, besides the tone, is that by default it will never include links in the body of the post.

stingraycharles

3 months ago

3 replies

Yeah, “here’s the reality check:”, “not because they’re flashy, but because they’re blah blah”.

Why can’t anyone be bothered anymore to write actual content, especially when writing about AI, where your whole audience is probably already exposed to these patterns in content day in, day out?

It comes off as so cheap.

mccoyb

3 months ago

4 replies

It comes off as someone who lives their life according to quantity, not quality.

The real insight: have some fucking pride in what you make, be it a blog post, or a piece of software.

palmotea

3 months ago

2 replies

> The real insight: have some fucking pride in what you make, be it a blog post, or a piece of software.

The businessmen's job will be complete when they've totally eliminated all pride from work.

philipallstar

3 months ago

3 replies

This same instinct is why a pencil costs almost nothing and is perfect, and isn't rubbish, really expensive, and created by someone who took pride in their work.

soks86

3 months ago

1 reply

I hope you don't take pride in that sentence because I'm still not sure what it means.

Also, automation and pride can go hand in hand. Pride doesn't mean "make it by hand," that would be silly.

philipallstar

3 months ago

1 reply

To put it another way: an apocryphal businessman took something that people took pride in and gradually optimised everything so much that all the logging, transportation, graphite work and combination resulted in a perfect pencil that costs basically nothing almost anywhere in the world.

gf000

3 months ago

1 reply

Pencils here are a bit like grains. The market works for them because they fall into such a niche that economic "laws" works there.

But it's a fallacy to apply it elsewhere and there are millions of examples where the free market failed to optimize a product.

philipallstar

3 months ago

I don't agree. Loads of things are like this. Cars, microchips, hard drive storage, monitors, TVs, laptops. All either much better than they used to be, or much cheaper, or both.

palmotea

3 months ago

1 reply

> This same instinct is why a pencil costs almost nothing and is perfect, and isn't rubbish, really expensive, and created by someone who took pride in their work.

No. Have you worked with businessmen? 90% of the time they're telling you to cut corners and leave things broken, to the point you have a janky mess that can be barely held together. And, right now, we're talking about a technology (LLMs) that is well known to introduce stupid but often hard to spot errors.

They don't want a pencil that's perfect. They want one that's just barely good enough to write with and that they can get maximum profit margin on.

And then, you know, there's the whole thing about life being more than output.

philipallstar

3 months ago

1 reply

Life can be more than output, which is why you don't want buying pencils, or anything else, to take up any more of your wages than is absolutely necessary.

palmotea

3 months ago

1 reply

> Life can be more than output, which is why you don't want buying pencils, or anything else, to take up any more of your wages than is absolutely necessary.

You're not getting it. It'd probably help if you stopped focusing on your pencil story, it's frankly off-topic.

To try one more time: You probably spend half your waking ours at work. The quality of that time is important to your well being. Even if the businessmen sell you cheap, perfect pencils (which I do not grant), swimming in them in your off hours won't help with the other half of your time.

philipallstar

3 months ago

> It'd probably help if you stopped focusing on your pencil story, it's frankly off-topic.

I've no idea what this italicisation is meant to do; nor why this is off-topic. Stating things isn't explaining them.

> Even if the businessmen sell you cheap, perfect pencils (which I do not grant), swimming in them in your off hours won't help with the other half of your time.

It helps in that I don't have to spend as much of my time working to buy pencils. It's the same with everything. There's no reason why a laptop doesn't cost $1m except that the incredible, detailed, cross-continent cooperative work is done by experts and coordinated by a market for that work driving costs down and quality up.

blargey

3 months ago

Do you actually use pencils? The most popular US (cheapo) brands have atrocious quality because they compromised on materials and construction to get the lowest sticker price possible.

The brands that do have a claim to "perfection" necessarily had the pride to not participate in that race to the bottom.

mycall

3 months ago

At the same time, if there is a business opportunity in having pride when no one else has it, it will become a businessmen's job to do so.

Analemma_

3 months ago

1 reply

Taking pride in your work makes your labor more expensive than that of someone who does not do this, so over time as "efficiency" increases, you will eventually be removed and replaced by someone without these compunctions. Taking no pride in your work is economically rational and maximizes your long-term value to capital.

mccoyb

3 months ago

1 reply

Economically rational, but bereft of identity or _soul_ -- which, paradoxically, becomes highly valued when economically rational agents all regress to a mean of mediocrity.

littlecosmic

3 months ago

Valued by the worker to give meaning and quality of life not by the buyer - so it does carry much weight.

jihadjihad

3 months ago

Don't forget to turn your point into a playful rhetorical question [0].

"The real insight?"

0: https://en.wikipedia.org/wiki/Hypophora

WhyOhWhyQ

3 months ago

Where's the pride in what you make when you're using AI agents? Seems like you're fantasizing about a by-gone era. The name of the activity, "vibe-coding", already makes it clear that this is a pride-free industry.

rapind

3 months ago

2 replies

> Why can’t anyone be bothered anymore to write actual content

The way I see it is that the majority of people never bothered to write actual content. Now there’s a tool the non-writers can use to write dubious content.

I would wager this tool is being used much differently by actual writers focused on producing quality. There’s just way less of them, same way there is less of any specialization.

The real question with AI to me is whether it will remain consistently better when wielded by a specialist who has invested their time into whatever the thing is they are producing. If that ever changes then we are doomed. When it’s no longer slop…

retSava

3 months ago

2 replies

You're absolutely right! (/s)

The tone of AI-written stuff sounds to me just like the soul-less SEO-optimized content marketing blog crap we saw the years before AI became a thing. Very prevalent on Linkedin too. It just sounds/reads so hopelessly artificial.

If I were to begin using AI to write stuff for me (comments or articles or whatever), I'd at least begin with having it train on the collection of everything I've written so far.

collingreen

3 months ago

This makes sense and is extremely possible and how I thought these things would be positioned in the first place. In surprised we don't see this more - would be better results, less shame thrown at users, and make the product stickier.

pickledoyster

3 months ago

SEO slop is what the LLMs were trained on. GIGO

stingraycharles

3 months ago

That’s a good insight. So basically we have a whole new generation of authors out there, in the same way we have a whole new generation of coders out there.

Perhaps they can be called vibe bloggers?

What bothers me compared to code is that for software, the code is just a means to and end. But for articles, it’s much more than that.

I wonder how this will end up affecting our lives. Last week I saw a video that highlighted how AI is already affecting our vocabulary. It introduces words not typically used in American English (but more commonly used in Nigeria, where a lot of content writing is outsourced to) into mainstream media.

I can totally see how this will slowly start affecting language itself.

alexchantavy

3 months ago

3 replies

Yeah it bugs me. We've got enough examples in this article to make Cards Against Humanity ChatGPT edition

> One panelist shared a personal story that crystallized the challenge: his wife refuses to let him use Tesla’s autopilot. Why? Not because it doesn’t work, but because she doesn’t trust it.

> Trust isn’t about raw capability, it’s about consistent, explainable, auditable behavior.

> One panelist described asking ChatGPT for family movie recommendations, only to have it respond with suggestions tailored to his children by name, Claire and Brandon. His reaction? “I don’t like this answer. Why do you know my son and my girl so much? Don’t touch my privacy.”

stingraycharles

3 months ago

1 reply

Yeah, AI isn’t creative. You need to ask it to describe these types of patterns, and then include avoiding them in your original prompt to make it come across as somewhat natural.

What I wonder is whether the author of the article recognized these patterns and didn’t care, didn’t even recognize them, or didn’t proofread the article?

EForEndeavour

3 months ago

1 reply

I gather he's operating Beyond the Prompt, and isn't here to rehash prompt engineering tips.

collingreen

3 months ago

This made me chuckle

donnaoana

3 months ago

1 reply

it's not written by AI

sethaurus

3 months ago

You've said plainly elsewhere in these comments that you did use AI to write it:

> thanks, I used AI but aren't we all? I thought the point of AI is to get us to be more productive.

You've also repeatedly dismissed any criticism of the writing as "hate."

If you want readers to do you the favor of reading your work, please do them the favor of writing it.

nylonstrung

3 months ago

Are there any good lists of these GPTisms or research on the common patterns?

Beyond the em dashes and overuse of "delve" etc. there is this distinctive style of composition I want to understand and recognize better

tkgally

3 months ago

2 replies

I started to suspect a few paragraphs in that this post was written with a lot of AI assistance, but I continued to read to the end because the content was interesting to me. Here's one point that resonated in particular:

"There’s a missing primitive here: a secure, portable memory layer that works across apps, usable by the user, not locked inside the provider. No one’s nailed it yet. One panelist said if he weren’t building his current startup, this would be his next one."

ares623

3 months ago

2 replies

Isn’t that markdown files?

tkgally

3 months ago

1 reply

I was thinking about consumer-facing AI products, where md files controlled by the user presumably wouldn’t fly.

I find it annoying that, when prompting ChatGPT, Claude, Gemini, etc. on personal tasks through their chat interfaces, I have to provide the same context about myself and my job again and again to the different providers.

The memory functions of the individual providers now reduce some of that repetition, but it would be nice to have a portable personal-memory context (under my control, of course) that is shared with and updated semiautomatically by any AI provider I interact with.

As isoprophlex suggests in a sister comment, though, that would be hard to monetize.

ares623

3 months ago

Brb going to squat openmemory.org

Edit: Aaaand it’s gone.

isoprophlex

3 months ago

Sheesh how ever will you monetize a text file

Will someone please think of the MRR!

donnaoana

3 months ago

thanks, I used AI but aren't we all? I thought the point of AI is to get us to be more productive. But that's only after I came up with the questions for the speakers and I wrote a draft of the blog, and the penelists read it, added comments and I published. It seems I get a lot of hate here for it, but I am happy with the number of engineers and founders sharing feedback that this was useful to them. I'm not forcing anyone to read my content, but if people want to put the time to hate on it, it's their choice.

geoffbp

3 months ago

1 reply

And the Oxford comma

collingreen

3 months ago

Nooooo I believe in the oxford comma don't let them drag it down! :(

scotty79

3 months ago

It did good enough job for me to skim it.

donnaoana

3 months ago

thanks for the hate, they did love it indeed, the questions I've asked them, the draft I wrote for them to read, and published only after they read and added comments. I am curious, do you not use AI? isn't the point to polish things and make it more efficient? I am curious if there was anything useful to you in the article or if you have constructive criticism? I was sad to read some of the hate, but overall, I am very happy with the many notes form founders and builders who found it useful.

carimura

3 months ago

the future is now where debates about human vs machine will influence our trust and enjoyment! I read the article wondering how much of it was AI generated (new worry!), but also how biased it was based on the authors startup business interest (old worry!), and concluded that if I learned something about the panel it was worth the 5 minutes. Or maybe 2 minutes if an AI summarized it.

esperent

3 months ago

> the number one tell of LLM output, besides the tone, is that by default it will never include links in the body of the post.

This isn't true. I've been using Gemini 2.5 a lot recently and I can't get it to stop adding links!

I added custom instructions: Do not include links in your output. At the start of every reply say "I have not added any links as requested".

It works for the first couple of responses but then it's back to loads of links again.

AdieuToLogic

3 months ago

4 replies

It's funny that what the author identifies as "the reality check":

  Here’s the reality check: One panelist mentioned that 95%
  of AI agent deployments fail in production. Not because the 
  models aren’t smart enough, but because the scaffolding 
  around them, context engineering, security, memory design, 
  isn’t there yet.

Could be a reasonable definition of "understanding the problem to solve."

In other words, everything identified as what "the scaffolding" needs is what qualified people provide when delivering solutions to problems people want solved.

whatever1

3 months ago

3 replies

They fail because the “scaffolding” is building the complicated expert system that AI promised that one would not have to do.

If I implement myself a strict parser and an output post-processor to guard against hallucinations, I have done 100% of the business related logic. I can skip the LLM in the middle altogether.

nylonstrung

3 months ago

Very interesting that you've found ways to mitigate the hallucination issue. Are you able to share more about what worked for you with the post processor and parser?

moduspol

3 months ago

> If I implement myself a strict parser and an output post-processor to guard against hallucinations, I have done 100% of the business related logic. I can skip the LLM in the middle altogether.

You might even be able to put a UI on it that is a lot more effective than asking the user to type text into a box.

AdieuToLogic

3 months ago

> If I implement myself a strict parser and an output post-processor to guard against hallucinations, I have done 100% of the business related logic. I can skip the LLM in the middles altogether.

Well said and I could not agree more.

mnky9800n

3 months ago

You see, in order to get the AI agent to do it's job, we needed to write a lot of software to provide it with guard rails so that it doesn't lose its mind when doing so.

might as well just write the ai agent part of the software yourself as well.

codyb

3 months ago

At work we're deploying a chat bot to help users with our internal tools and it's just a forcing function to write and mark as deprecated the documentation we never maintained in the first place.

So...

The bot, to its credit, returns some decent results. But my guess is that it will be quite a while before we see it in prod since a lot of these projects go from 0 - 80% in a week and 80% - deployable in several years.

danieltanfh95

3 months ago

It is really just BS. These are just basic DSA stuff. We deployed a real world solution by doing of all of that on our side. It's not magic. It's engineering.

ares623

3 months ago

2 replies

At some point, say 5 years from now, someone will revisit their AI-powered production workloads and ask the question "how can we optimize this by falling back to non-AI workload?". Where does that leave AI companies when the obvious choice is to do away with their services once their customers reach a threshold?

EdwardDiego

3 months ago

1 reply

"Huh, turns out we could replace it all with a 4 line Perl script doing linear regression."

ares623

3 months ago

1 reply

“How I used ancient programming techniques to save the company $100k/year in token costs”

topaz0

3 months ago

They're going to need gigawatts worth of datacenters just to hold all the posts with that title.

anonzzzies

3 months ago

A lot of what we encounter is; there is this 'chat' interface which is the 'wow factor': you type something in english and something (like text to sql) falls out, maybe 60-80% of what was needed. But then the frustration (for the user) starts: the finetuning of the result. After a few uses, they always ask for the 'old way' back to do that: just editing the query or give them knobs to turn to finetune the result. Where most want knobs which are, outside the most generic cases (pick a timespan for a datetime column), custom work. So AI is used for the first 10% of the work time (which gives you 60%+ of the solution) until the frustration lands: the last 40% or less are going to take 90% of your time. Still great as overall it will probably take far less time than before.

hshdhdhehd

3 months ago

1 reply

Base models are the seed, fine tuning is the genetically modified seed. Context is the fertiliser.

handfuloflight

3 months ago

1 reply

Agents are the oxen pulling the plow through the seasons... turning over ground, following furrows, adapting to terrain. RAG is the irrigation system. Prompts are the farmer's instructions. And the harvest? That depends on how well you understood what you were trying to grow.

nylonstrung

3 months ago

Locusts are when LLMs inexplicably rewrite your existing code despite in-line and prompt instructions not to

iagooar

3 months ago

4 replies

Wow, half of this article deeply resonates with what I am working on.

Text-to-SQL is the funniest example. It seems to be the "hello world" of agentic use in enterprise environments. It looks so easy, so clear, so straight-forward. But just because the concept is easy to grasp (LLMs are great at generating markup or code, so let's have them translate natural language to SQL) doesn't mean it is easy to get right.

I have spent the past 3 months building a solution that actually bridges the stochastic nature of AI agents and the need for deterministic queries. And boy oh boy is that rabbit hole deep.

jamesblonde

3 months ago

2 replies

Text2SQL was 75% on bird-bench 6 months ago. Now it's 80%. Humans are still at 90+%. We're not quite there yet. I suspect text-to-sql needs a lot of intermediate state and composition of abstractions, which vanilla attention is not great at.

https://bird-bench.github.io/

ares623

3 months ago

2 replies

Text to sql is solved by having good UX and a reasonable team that’s in touch with the customers needs.

A user having to come up with novel queries all the time to warrant text 2 sql is a failure of product design.

caust1c

3 months ago

This is exactly it. AI is sniffing out the good datamodels from the bad. Easy to understand? AI can understand it too! Complex business mess with endless technical debt? Not too much.

But this is precisely why we're seeing startups build insane things fast while well established companies are still questioning if it's even worth it or not.

strange_quark

3 months ago

This 1000x. I’ve sat through several vendor demos of BI tools that have a chatbot and seen my PM go all starry eyed that you can ask it “show me top x over the last week” and get a chart back. How an empty text box is easier to use than a UI with several filter drop-downs, I’ll never understand, and I suspect that the people impressed with this stuff don’t know either.

impossiblefork

3 months ago

There were some iffy things about the text to SQL datasets though, historically.

People got good results on the test datasets, but the test datasets had errors so the high performance was actually just the models being overfitted.

I don't remember where this was identified, but it's really recent, but before GPT-5.

juleiie

3 months ago

1 reply

> building a solution that actually bridges the stochastic nature of AI agents and the need for deterministic queries

Wait but this just sounds unhinged, why oh why

pbronez

3 months ago

The problem is that precision is expensive. Writing is thinking. Writing software is defining the business problem.

People don't know exactly what they want from the data warehouse, just a fuzzy approximation of it. You need stochastic software (AI) to map the imprecise instructions from your users to precise instructions the warehouse can handle.

data-ottawa

3 months ago

SQL is never just the tables and joins, it’s knowing the table grains, the caveats, all the modelling definitions and errors (and your data warehouse almost certainly has modelling errors as business logic in your app drifts), plus the business context to correctly answer questions.

60% of the time I spend writing sql is probably validation. A single hallucinated assumption can blow the whole query. And there are questions that don’t have clear modelling approaches that you have to deal with.

Plus, a lot of the sql training data in LLMs is pretty bad, so I’ve not been impressed yet. Certainly not to let business users run an AI query agent unchecked.

I’m sure AI will get good at this, so I’m building up my warehouse knowledge base and putting together documentation as best I can. It’s just pretty awful today.

donnaoana

3 months ago

glad it resonates, that was the intention

EdwardDiego

3 months ago

5 replies

> One team suggested that instead of text-to-SQL, we should build semantic business logic layers, “show me Q4 revenue” should map to a verified calculation, not raw SQL generation.

Okay, how would that work though? Verified by who and calculated by what?

I need deets.

dchftcs

3 months ago

1 reply

On one side, you have an agent calculating the revenue.

On the other side, you have an SQL that calculates the revenue

Compare the two. If the two disagree, get the AI to try again. If the AI is still wrong after 10 tries, just use the SQL output.

mnky9800n

3 months ago

1 reply

so you have an answer and then you throw compute at trying to produce the answer in a different way.

What I hear is a billion dollar AI startup in the making!

encyclopedism

3 months ago

Well put...lol :-)

meheleventyone

3 months ago

1 reply

They're saying that someone should implement the CalculateQuarterRevenue(year, quarter) function somewhere in a manner that has been verified (e.g. run it against previous quarters to make sure it works correctly) then rather than using the LLM to generate SQL you use it to decide what domain function should be called. Which to me seems to mean that someone on the panel was gently taking the piss out of the idea. Since if you've done all the hardwork anyway presenting this in a deterministic way with a nice UX is straightforward bit of front end work.

moduspol

3 months ago

1 reply

It also removes a lot of the value of the LLM. They're perceived as being smart, and the interface (open-ended text) implies they are capable of more than executing pre-defined functions.

So if you have a "CalculateQuarterRevenue(year, quarter)" function, you'll soon find your users asking for the data per-month. Or just for the last six weeks. Or just for a specific client. And they'll be confused when it doesn't work.

esafak

3 months ago

1 reply

The conversational user interface is misleading then, isn't it? It can't make you a sandwich either, though it allows you to submit this request.

moduspol

3 months ago

Yes. The sandwich example is contrived, but the basis is "discoverability." It's very opaque to the user what actually can be done and how reliable the result is.

Compare this to basically any website you've ever been to. It's the "GUIs vs. CLIs" discussion all over again, except even CLIs had man pages for discoverability.

esafak

3 months ago

2 replies

In other words, there should be a list of predefined queries, or possibly subqueries, that the user can request. This is basically how products used to work before AI. The difference is now you can request which query you want verbally.

edit: I'm serious. I'm just answering the question, not making a value judgement.

lesuorac

3 months ago

1 reply

I assume you're being tongue in check but I've watched a lot of people use software and they really just don't know anything about it. Being able to verbally request something is something they can learn to do while googling how do I normalize the scores in my rubric to add up to 100 is something they couldn't.

Verbal queries is the solution for the world we have even if it's not optimal.

slfnflctd

3 months ago

Your last sentence sums it up. This is what users want.

The main killer app, I think, boils down really expensive speech-to-text (and vice versa) with a reasonable number of seemingly authoritative querying details in fairly plain language. It's a new, 'better' search engine, just with different pitfalls people need to get up to speed on. And that may be enough, because employing humans to fill the same role as effectively is more expensive still.

thr0w

3 months ago

So simple classification problem. Big deal.

tirumaraiselvan

3 months ago

A simple way is perhaps implement a text-to-metrics system where metrics could be defined as SQL functions.

moomoo11

3 months ago

psychedelics

intended

3 months ago

2 replies

I just refuse to read long AI generated text. Sadly this feels exactly like that.

donnaoana

3 months ago

1 reply

I am curious, how would you use AI then, if not to make one more productive? The text is not AI generated, I came up with the questions, moderated the discussion, wrote a draft that the speakers red, added comments and the AI polished it, the AI was a custom GPT that I trained on my previous text from that substack. I am curious what would you have done differently or if you would refuse to use AI at all? I wrote the article, so I am genuinly curious. I didn't know someone posted on Hacker News, I knew people like to be negative here because there is no accountablity, I want to learn from all this hate. I am personally happy with the outcome, I gover over 30 notes from people who are building that this was useful to them, and the speakers were happy. So I am curious what could have I done differently from your perspective or what should be my learning from all these people who take time from their day to hate on this piece of writing instead of deciding not to read and moving on.

intended

3 months ago

Hey it’s your call! As you said it’s your productivity.

If you said it’s something you made for perusal and reading? Then it reads like AI.

I’ve had to read tons of papers and articles, the most testing being conference submissions. I won’t read something with that structure unless I have to.

codyb

3 months ago

I get really frustrated when I see it on PRs cause it's such a time sink, super obvious, and so fluffy.

So you scaffold this up in 30 seconds but want me to read through it carefully? Cool, thanks.

thisisit

3 months ago

6 replies

It seems to me that people think AI is somehow magic. Recently I led a product demo. The conversation went something like this:

End users (at my company) - Can your AI system look at numbers and find differences and generate a text description?

Pre-sales - (trying to clarify) For our systems to generate text it will be better if you give it some live examples so that it understands what text to generate.

End users - But there is supporting data (metadata) around the numbers. Can't your AI system just generate text?

Pre-Sales - It can but you need to provide context and examples. Otherwise it is going to generic text like "there is x difference".

End user - You mean I need to write comments manually first? That is too much work.

Now these users have a call with another product - MS Copilot.

nowittyusername

3 months ago

4 replies

This is a very common sentiment I see everywhere and it really highlights how uneducated most people are about technology in general. Most folks seem to expect things to work magically and perform physics breaking feats and it honestly baffles me. I would expect this attitude from maybe the younger generations who grew up only being users of technology like tablets and smartphones, but I honestly never expected millennials to be in the same camp, but nope they are just as ignorant. And I am thinking to myself, did I grow up different? Were my friends also not using the same Nintendo cartridges, and VCR's and camcorders and all the other tech that you had no choice but to learn at least basic fundamentals to use? Apparently most people never delved deeper then surface level on how to use these things and everything else went right over their head...

nitwit005

3 months ago

Plenty of people have a story of managers asking them to do impossible or nonsensical things. It should be unsurprising people will do the same with a machine.

__s

3 months ago

Vonnegut in On Writing Science Fiction reflected on Player Piano being labeled sci-fi since it involved machines, "The feeling persists that no one can simultaneously be a respectable writer and understand how a refrigerator works, just as no gentleman wears a brown suit in the city"

TheHegemon

3 months ago

> Apparently most people never delved deeper then surface level on how to use these things and everything else went right over their head...

This is really the truth of all things in life.

bluefirebrand

3 months ago

> Most folks seem to expect things to work magically and perform physics breaking feats and it honestly baffles me

This is how it is being marketed and I guess people are silly enough to believe marketing so it's not too surprising

hadlock

3 months ago

The MS Copilot pre-sales person responded "oh, there is metadata? then yes, it will discover that and generate a text description, no problem"

amenhotep

3 months ago

Pray, Mr Babbage, etc

beezlebroxxxxxx

3 months ago

Well, you hear a lot about how AI will "empower" employees and generate new "insights" based off of data for analysts and execs. In reality, most executives aren't really interested in that. They'd like it for sure, but really what they want is automation. They want "efficiencies"; they want cost cutting.

Anyone that's been involved in data science roles in corporate environments knows that "the data" is usually forced into an execs pre-existing understanding of a phenomenon. With AI, execs are really excited at "cutting out the middlemen" when the middlemen in the equation are very often their own paid employees. That's all fine and dandy in an abstract economic view, but it's sure something they won't say publicly (at least most won't).

In terms of potential cost cutting, it probably is the most recent "new magic". You used to have to pay a consultant, now you can "ask AI".

alganet

3 months ago

> It seems to me that people think AI is somehow magic.

That's because it is marketed as magic. It's marketed as magic so people will adopt the thing before knowing its shortcomings.

https://pbfcomics.com/comics/the-masculator/

alansaber

3 months ago

TBF synthetic data generation exists for this reason. I do understand why a lot of companies go with the "safe" choice (copilot) even though it's crap.

LogicFailsMe

3 months ago

1 reply

95% of the talent is being paid top dollar to build ~5% of the applications?

alansaber

3 months ago

Absolutely, when we're talking about infrastructure versus model development (RL/fine tuning, let alone pre-training).

janalsncm

3 months ago

> The panel’s consensus: conversation works when it removes a learning curve.

Conversational UIs are controversial but I think there are a good number of websites where a better search could be more centric. Not generating text, but surfacing the most relevant text.

I’m thinking of a lot of library documentation, government info websites, etc. Basically an improvement over deep hierarchical navigation, where their way of organizing info is a leaky abstraction.

Maybe that will be one of the side effects of this AI boom. Who knows.

another_twist

3 months ago

So I have read the MIT paper and the methodology as well as the conclusions are just something else.

For example, the number comes from perceived successes and failures and not actual measurements. The customer conclusions are also - it doesnt improve or it doesnt remember. Literally buying into the hype of recursive self improvement and completely oblivious to the fact that API dont control model weights and such cant do much self improvement besides writing more CRUD layers. The other complaints are about integrations which are totally valid. But in industries which still run windows XYZ without any API platforms so thats not going away in those cases.

Point being, if the paper itself is not very good discourse just a well marketed punditry, why should we discuss on the 5% number. It makes no sense.

tirumaraiselvan

3 months ago

This article is getting a lot of hate but honestly it does have good amount of useful content learned through practical experience, although at an abstract level. For example, this section:

``` The teams that succeed don’t just throw SQL schemas at the model. They build:

Business glossaries and term mappings

Query templates with constraints

Validation layers that catch semantic errors before execution ```

Unfortunately, the mixing of fluffy tone and high level ideas is bound to be detested by hands on practitioners.

marcosdumay

3 months ago

Those 5% that generate revenue on the MIT article do that because the only thing they are used for is creating marketing spam to send to people.

And now we have an entire panel of bullshitters with an article-long theory about how to make LLMs program actually for real this time.

(Oh, and it would be great if journalists actually cited their public sources, instead of pretending they link to the article but actually linking to their review of related content.)

hn_throwaway_99

3 months ago

> Here’s the reality check: One panelist mentioned that 95% of AI agent deployments fail in production. Not because the models aren’t smart enough, but because the scaffolding around them, context engineering, security, memory design, isn’t there yet.

It's a big pet peeve of mine when an author states an opinion, with no evidence, as some kind of axiom. I think there is plenty of evidence that "the models aren't smart enough". Or to put it more accurately, it's an incredibly difficult problem to get a big productivity gain when an automated system is blatantly wrong ~1% of the time but when those wrong answers are inherently designed to look like right answers as much as possible.

zoeey

3 months ago

I've always felt the real challenge isn't the LLM itself, but managing the context around it. Many people assume that writing a good prompt is enough, but the real work is turning something unpredictable into a tool you can actually rely on.

View full discussion on Hacker News

ID: 45456381Type: storyLast synced: 11/20/2025, 4:23:22 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN