73% of AI startups are just prompt engineering
Mood
controversial
Sentiment
negative
Category
tech_discussion
Key topics
Ai
Startups
Prompt Engineering
Discussion Activity
Very active discussionFirst comment
28m
Peak period
59
Hour 2
Avg / period
12.3
Based on 160 loaded comments
Key moments
- 01Story posted
Nov 23, 2025 at 11:17 AM EST
15h ago
Step 01 - 02First comment
Nov 23, 2025 at 11:45 AM EST
28m after posting
Step 02 - 03Peak activity
59 comments in Hour 2
Hottest window of the conversation
Step 03 - 04Latest activity
Nov 24, 2025 at 1:26 AM EST
1h ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Google even said they have no moat, when clearly the moat is people that trust them and not any particular piece of technology.
Burning VC money isn't a long term business model and unless your business is somehow both profitable on llama 8b (or some such low power model) _and_ your secret sauce can't be easily dupliated, you're in for a rough ride.
The only barrier between AI startups at this point is access to the best models and that's dependent on being able to run unprofitable models that spend someone else's money.
Investigating in a startup that's basically just a clever prompt is gambling on the first movers advantage because that's the only advantage they can have.
"In tech, often an expert is someone that know one or two things more than everyone else. When things are new, sometimes that's all it takes."
It's no surprise it's just prompt engineering. Every new tech goes that way - mainly because innovation is often adding one or two things more the the existing stack.
And then you need to see what variations work best with different models.
My POCs for personal AI projects take time to get this right. It’s not like the API calls are the hard portion of the software.
It's up to the domain experts and me to understand where giving it data will tone down the hallucinative nonsense an LLM puts out, and where we should not give data because we need the problem solving skills of the LLM itself. A similar process is for tool-use, which in our case are pre-selected Python scripts that it is allowed to run.
- write an evaluation pipeline to automate quality testing
- add a query rewriting step to explore more options during search
- add hybrid BM-25+vector search with proper rank fusion
- tune all the hyperparameters for best results (like weight bias for bm25 vs. vector, how many documents to retrieve for analysis, how to chunk documents based on semantics)
- parallelize the search pipeline to decrease wait times
- add moderation
- add a reranker to find best candidates
- add background embedding calculation of user documents
- lots of failure cases to iron out so that the prompt worked for most cases
There's no "just give LLM all the data", it's more complex than that, especially if you want best results and also full control of data (we run all of that using open source models because user data is under NDA)
I've debugged single difficult bugs before for two weeks, a whole feature that takes two weeks is an easy feature to build.
P.S. No vibe coding was used. I only used LLM-as-a-judge to automate quality testing when tuning the parameters, before passing it to human QA
Not sure if we called it engineering ten years ago.
Painting is "engineering data flow"
Directing a movie is "engineering data flow"
Playing the guitar is "engineering data flow"
This statement merely reveals a bias to apply high value to the word "engineering" and to the identity "engineer".
Ironic in that silicon valley lifted that identity and it's not even legally recognized as a licensed profession.
My manager gives me specifications, which I turn into code.
My manager is not coding.
This is how to look at it.
The machine model for natural language doesnt exist - it is too ambiguous to be useful for many applications.
Hence, we limited natural language to create programming languages whose machine model is well defined.
In math, we created formalism to again limit language to a subset that can be reasoned with.
It’s like how we’ve seen basically all gadgets meld into the smart phone. People don’t have Garmin’s and beepers and clock radios anymore (or dedicated phones!). It’s all on the screen that fits in your pocket. Any would-be gadget is now just an app
It seems right now like there is a tradeoff between creativity and factuality, with creative models being good at writing and chatting, and factuality models being good at engineering and math.
It why we are getting these specific -code models.
But cloud services run in... the cloud. It's as big as you need it to be. My cloud service can have as many backing services as I want. I can switch them whenever I want. Consumers don't care.
"One model that can do everything for you" is a nice story for the hyper scalers because only companies of their size can pull that off. But I don't think the smartphone analogy holds. The convenience in that world is for the the developers of user-facing apps. Maybe some will want to use an everything model. But plenty will try something specialized. I expect the winner to be determined by which performs better. Developers aren't constrained by size or number of pockets.
I don’t think that’s the expectation set by “everyone else” in the AI space, even if it arguably is for OpenAI (which has always, at least publicly, had something of a focus on eventual omnicapable superintelligence.) I think Google Antigravity is evidence of this: there’s a main, user selected coding model, but regardless of which coding model is used, there are specialized models used for browser interaction and image generation. While more and more capabilities are at least tolerably supported by the big general purpose models, the range of specialized models seems to be increasing rather than decreasing, and seems likely that, for conplex efforts, combining a general purpose model with a set of focussed, task-specific models will be a useful approach for the forseeable future.
Foundational models are not great for many specific tasks. Assuming that one architecture will eventually work for everything is like saying that x86/amd64/ARM will be all we ever need for processors.
Any reasonably informed person realizes that most AI start-ups looking to solve this are not trying to create their own pre-trained models from scratch (they will almost always lose to the hyperscale models).
A pragmatic person realizes that they're not fine-tuning/RL'ing existing models (that path has many technical dead ends).
So, a reasonably informed and pragmatic VC looks at the landscape, realizes they can't just put all their money into the hyperscale models (LP's don t want that) and they look for start-ups that take existing hyperscale models and expose them to data that wasn't in their pre-Training set, hopefully in a way that's useful to some users somewhere.
To a certain extent, this study is like saying that Internet start-ups in the 90's relied on HTML and weren't building their own custom browsers.
I'm not saying that this current generation of start-ups will be successful as Amazon and Google, but I just don't know what the counterfactual scenario is.
The author seems to not exist and it's unclear where the data underlying the claims is even coming from since you can't just go and capture network traffic wherever you like.
A little due diligence please.
Either you have a smash-and-grab strategy or you are awful at risk analysis.
But I'd like to know an actual answer to this, too, especially since large parts of this post read as if they were written by an LLM.
Which would be a major security hole. And sure, lots of startups have major security holes, but not enough that he could come up with these BS statistics.
I'm a little dismayed at how high up this has been voted given the data is guaranteed to be made up.
> Which would be a major security hole.
An officially supported security hole
https://platform.openai.com/docs/api-reference/realtime-sess...
They claim to have found that.
Is Teja Kusireddy a real person? Or is this maybe just an experiment from some AI company (or other actor) to see how far they can push it? A Google search by that name doesn't find anything not related to the article.
The article should be flagged. Otoh, this should get discussed.
To be able to call the OpenAI directly from the front end, you'd need to include the OpenAI key, which would be a huge security hole. I don't doubt that many of these companies are just wrappers around the big LLM providers, but they'd be calling the APIs from their backend where nothing should be interceptable. And sure, I believe a few of them are dumb enough to call OpenAI from the frontend, but that would be a minority.
This whole thing smells fishy, and I call BS unless the author provides more details about how he intercepted the calls.
You mean, except for explaining what he's doing 4-5 times? He was literally repeating himself restating it. Half the article is about the various indicators he used. THERE'S EXAMPLES OF THEM.
There's this bit:
> Monitored their network traffic for 60-second sessions
> Decompiled and analyzed their JavaScript bundles
Also there's this whole explanation:
> The giveaways when I monitored outbound traffic:
> Requests to api.openai.com every time a user interacted with their "AI"
> Request headers containing OpenAI-Organization identifiers
> Response times matching OpenAI’s API latency patterns (150–400ms for most queries)
> Token usage patterns identical to GPT-4’s pricing tiers
> Characteristic exponential backoff on rate limits (OpenAI’s signature pattern)
Also there's these bits:
> The Methodology (Free on GitHub next week):
> - The complete scraping infrastructure
> - API fingerprinting techniques
> - Response time patterns for every major AI AP
One time he even repeats himself by stating what he's doing as playwright pseudocode, in case plain English isn't enough.
This was also really funny:
> One company’s “revolutionary natural language understanding engine” was literally this: [clientside code with prompt + direct openai API call].
And there's also this bit at the end of the article:
> The truth is just an F12 away.
There's more because LITERALLY HALF THE ARTICLE IS HIM DOING THE THING YOU COMPLAIN HE DIDN'T DO.
In case it's still not clear, he was capturing local traffic while automating with playwright as well as analyzing clientside JS.
How can he monitor what's going on between a startup's backend and OpenAI's server?
> The truth is just an F12 away
That's just not how this works. You can see the network traffic between your browser and some service. In 12 cases that was OpenAI or similar. Fine. But that's not 73%. What about the rest? He literally has a diagram claiming that the startups contact an LLM service behind the scenes. That's what's not described, how does he measure that?
You are not bothered by the only sign that the author even exist is this one article and the previous one? Together with the claim to be a startup founder? Anybody can claim that. It doesn't automatically provide credibility.
Presumably OpenAI didn't add that for fun, either, so there must be non-zero demand for it.
But I still believe the vast majority of startups do wrapping in their own backend. Yes, I read what he's doing, and he's still only able to analyze client-side traffic, which means his overall claims of "73%" are complete and total bullshit. It is simply impossible to conclude what he's concluding without having access to backend network traces.
He is not claiming to be doing that. He says what and how he's capturing multiple times. He says he's capturing what's happening in browser sessions.
> That's just not how this works. You can see the network traffic between your browser and some service.
Yes, the author is well aware of that as are presumably most readers. If your client makes POST requests to the startup's backend like startup.com/api/make-request-to-chatgpt and the payload is {systemPrompt: "...", userPrompt: "..."}, not much guessing as to what is going on is necessary.
> You are not bothered by the only sign that the author even exist is this one article and the previous one?
Moving goalposts. He may or not be full of shit. Guess we'll see if/when we see the receipts he promised to put on GitHub.
What bothers me most is the lack of general reading comprehension being displayed in this thread.
> Together with the claim to be a startup founder? Anybody can claim that.
What? Anybody can be a startup founder today. Crazy claim. Also... what?
This also matches the latency of a large number of DB queries and non-OpenAI LLM inference requests.
>Token usage patterns identical to GPT-4’s pricing tiers
What? Yes this totally smells real.
He also mentions backoff patterns, which I'm not sure how he'd disambiguate extremely standard backoff in a normal API.
Given the ridiculousness of these claims, I believe there's a reason he didn't include the fingerprinting methodology in this article.
https://medium.com/@teja.kusireddy23/i-reverse-engineered-20...
The article is basically a description of where to look for clues. Perhaps they've contracted with some of these companies and don't want to break some NDA by naming them, but still know a lot about how they work.
This makes literally no sense. Why would any companies (let alone most of them) contract with this guy who seems hell bent on exposing them all.
The article is simple made up, most likely by an LLM.
If we overlook that non-determinism isn't really compatible with a lot of business processes and assume you can make the model spit out exactly what you need, you can't get around the fact that an LLM is going to be a slower and more expensive way of getting the data you need in most cases.
LLMs are fantastic for building things. Use them to build quickly and pivot where needed and then deploy traditional architecture for actually running the workloads. If your production pipeline includes an LLM somewhere in the flow, you need to really, seriously slow down and consider whether that's actually the move that makes sense.
[1] - There are exceptions. There are always exceptions. It's a general rule not a law of physics.
I don't believe any of this. Why aren't we questioning the source of how the author is apparently able to figure out some sites are using REDIS etc etc?
I might allow them more credit if the article wasn't in such an obviously LLM-written style. I've seen a few cases like this, now, where it seems like someone did some very modest technical investigation or even none at all and then prompted an LLM to write a whole article based on it. It comes out like this... a whole lot of bullet points and numbered lists, breathless language about the implications, but on repeated close readings you can't tell what they actually did.
It's unfortunate that, if this author really did collect this data, their choice to have an LLM write the article and in the process obscure the details has completely undermined their credibility.
The bigger question is, this is the same story with apps on mobile phones. Apple and google could easily replicate your app if they wanted to and they did too. That danger is much higher with these ai startups. The llms are already there in terms of functionality, all the creators figured out the value is in vertical integration and all of them are doing it. From that sense all these startups are just showing them what to build. Even perplexity and cursor are in danger.
The author never explains how he is able to intercept these API calls to OpenAI, etc. I definitely believe tons of these companies are just wrappers, but they'd be doing the "wrapping" in their backend, with only a couple (dumb) companies doing the calls directly to OpenAI from the front end where they could be traced.
This article is BS. My guess is it was probably AI generated because it doesn't make any sense.
The message might not even be wrong. But why is everybody's BS detection on ice in the AI topic space? Come one people, you can all do better than this!
Thanks for flagging. Though whenever such a made up thing is flagged, we lose the chance to discuss this (meta) topic. People need to be aware how prevalent this is. By just hiding it every time we notice, we're preventing everybody to read the kind of comment you wrote and recalibrate their BS-meters.
For example, in the 90's, a startup that offered a nice UI for a legacy console based system, would have been a great idea. What's wrong with that?
Being LLM users would be fine but they pretend they do AI.
At what point can you claim that you did "it"?
Do you have to use an open source model instead of an API? Do you have to fine tune it? How much do you need to? Do you have to create synthetic data for training? Do you have to gather your own data? Do you need to train from scratch? Do you need to come up with a novel architecture?
10 years ago if you gathered some data and trained a linear model to determine the likelihood your client would default on their loan and used that to decide how much, if any, to loan them- you're absolutely doing "actual AI"
---
Any other software you could ask all the same questions but with using a high level language, frameworks, dependencies, hiring consultants / firm, using an LLM, no-code, etc.
At what point does outsourcing some portion of the end product become no longer doing the thing?
When the core of your business ist something that’s shamelessly farmed out to <LLM provider of choice>.
It would be like calling yourself a restaurant, and then getting uber eats deliveries of whatever customers ordered and handing that to them.
But the customers don't see where the food is coming from and are still coming to eat. If you can make the economics work...
Btw, the so-called AI devs or model developers are "users" of the databases and all the underlying layers of the stack.
I assume it works because the ecosystem is, as you say, so new. Non-technical observers have trouble distinguishing between LLM companies and CRUD companies
The thing that annoys me is when clearly non-AI companies try to brand themselves as AI: like how Long Island Iced Tea tried to brand themselves as a blockchain company or WeWork tried to brand themselves as a tech company.
If we’re complaining about AI startups not building their own in house LLMs, that really just seems like people who are not in the arena criticizing those who are.
A lot of these startups have little to no moat, but they're raking in money like no one's business. That's exactly what happened in the dotcom bubble.
AI has created a new interface with a higher level abstraction that is easier to use. Of course everyone is going to use it (how many people still code assembler?).
The point is what people are doing with it is still clever (or at least has potential to be).
AI software is not long-lasting; its results are not deterministic.
Hell man, I attended a session at an AWS event last year that was entirely the presenter opening Claud and writing random prompts to help with AWS stuff... Like thanks dude... That was a great use of an hour. I left 15 minutes in.
We have a team that's been working on an "Agent" for about 6 months now. Started as prompt engineering, then they were like "no we need to add more value" developed a ton of tools and integrations and "connectors" and evals etc. The last couple of weeks were a "repivot" going back full circle to "Lets simplify all that by prompt engineering and give it a sandbox environment to run publicly documented CLIs. You know, like Claude Code"
The funny thing is I know where it's going next...
When you press them on this, they have all sorts of ideas like a judge LLM that takes the outputs, comes up with modified SOPs and feeds those into the prompts of the mixture-of-experts LLMs. But I don't think that works, I've tried closing that loop and all I got was LLMs flailing around.
All of these are things that will need to be solved long-term in the model itself though, at least if the AI bubble needs to be kept alive. And solving those things would in fact materially improve all sorts of benchmarks, so there's an incentive for frontier labs to do it.
I suspect the model that doesn’t need scaffolding is simply ASI, as in, the AI can build its own scaffolding (aka recursive self-improvement), and build it better than a human can. Until that point, the job is going to remain figuring out how to eval your frontier task, scaffold the models’ weaknesses, and codify/absorb more domain knowledge that’s not in the training set.
You are talking about context management stuff here, the solution will be something like a proper memory subsystem, maybe some architectural tweaks to integrate it. There are more obvious gaps beyond that which we will have to scaffold and then solve in turn.
Another way of thinking about this is just that scaffolding is a much faster way of iterating on solutions than pre-training, or even post-training, and so it will continue to be a valuable way of advancing capabilities.
I think the reason of the recent pivot is to “keep the human in the loop” more. The current thinking is they tried to remove the human too much and were getting bad results. So now they just want to make the interaction faster and let the human be more involved like how we (developers) use Claude code or copilot by checking every interaction and nudging it towards the right/desired answer.
I got the sense that management isn’t taking it well though. Just this Friday they gave a demo of the new POC where the LLM is just suggesting things and frequently asking for permissions and where to go next and expecting the user to interact with it a lot more than the one-shot approach before (which I do think is likely to yield better results tbh) but the main reaction was “this seems like a massive step backward”
You all get offshored?
This dismisses a lot of actual hard work. The scaffolding required to get SOTA performance is non-trivial!
Eg how do you build representative evals and measure forward progress?
Also, tool calling, caching, etc is beyond what folks normally call “prompt engineering”.
If you think it’s trivial though - go build a startup and raise a seed round, the money is easy to come by if you can show results.
And more so than even most VC markets, raising for an "AI" company is more about who you know than what results you can show.
If anyone is actually showing significant results, where's the actual output of the AI-driven software boom (beyond just LLMs making coders more efficient by being a better google)? I don't see any real signs of it. All I see is people doing after market modifications on the shovels, I've yet to see any of the end users of these shovels coming down from the hills with sacks of real gold.
https://www.forbes.com/sites/iainmartin/2025/10/29/legal-ai-...
Law is slow and conservative, they were likely just the first to get a enterprise sales team.
This assumes that those companies do evaluations. In my experience, seeing a huge amount of internal AI projects at my company (FAANG), there's not even 5% that have any sort of eval in place.
This is a big chasm that I could well believe a lot of founders fail to cross.
It’s really easy to build an impressive-looking tech demo, much harder to get and retain paying customers and continuously improve.
But! Plenty of companies are actually doing this hard work.
See for example this post: https://news.ycombinator.com/item?id=46025683
And many companies are "just CRUD".
It's still early in the paradigm and most startups will fail but those that succeed will embed themselves in workflows.
Thus has it always been. Thus will it always be.
I have to wonder, are people voting this up after reading the article fully, and I'm just wrong and this sort of info dump with LLM dressing is desirable? Or are people skimming it and upvoting? Or is it more of an excuse to talk about the topic in the title? What level of cynicism should I be on here, if any?
https://github.com/zou-group/textgrad
and bonus, my rant about this circa 2023 in the context of Stable Diffusion models: https://gist.github.com/Hellisotherpeople/45c619ee22aac6865c...
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.