CommerceTXT
commercetxt.orgKey Features
Key Features
Tech Stack
All these files should be registered with IANA and put under the .well-known namespace.
We follow the precedent of robots.txt, ads.txt, and llms.txt.
The reason is friction. Platforms like Shopify and Wix make .well-known folders difficult or impossible for merchants to configure. Root files work everywhere.
Adoption matters more than namespace hygiene.
https://en.wikipedia.org/wiki/Well-known_URI#List_of_well-kn...
robots.txt was created three decades ago, when we didn’t know any better.
Moving llms.txt to /.well-known/ is literally issue #2 for llms.txt
https://github.com/AnswerDotAI/llms-txt/issues/2
Please stop polluting the web.
That said, I am open to supporting .well-known as a secondary location in v1.1 if the community wants it.
The same paragraph takes you to RFC 8615, which is the .well-known you are being told to use. That is not your "secondary location" for v1.1. That is the only path you are permitted to standardize.
You are technically correct regarding IETF norms.
But you say: "Wix and Shopify have zero bearing on the standardization of the Web."
I fundamentally disagree. The Web is not just a namespace for engineers; it is an economy for millions of small businesses. If a standard is technically "pure" but unusable by 80% of merchants on hosted platforms, it fails the Web.
However, to respect the namespace: We will mandate checking /.well-known/commerce.txt first.
But we will keep the root location as a fallback. We prioritize accessibility for the "aspiring" shop owner over strict purity for the standards writer.
Thankfully, you've licensed your work CC0, so someone who wants to see this standardized could simply fork your work, fix the offending parts, and move for successful standardization without you.
Consider C#. Yeah, yeah, we all know the provenance of the language, that what ECMA has standardized is basically a Microsoft specification, but once it's an ECMA standard it's Something Else. Competitors can work on it together, and we're all fine with that. Carrying on C# development in the open is harder for Microsoft in some ways, and easier for them in others. This opinion is about ten years old, mind you, and speaks more to the origin of C# (I'm not a practitioner), so I'm sure the Core stuff has changed all of this and made me look silly saying this, but that speaks to my point -- work evolves in public.
Say I work at Apple. I tell my boss I had lunch with a Samsung guy, I might get a side eye. I tell my boss I had lunch with a Samsung guy because we're collaborating on some revision to SSD TRIM or something, it's oh, OK, cool, no side eye. That's the orthodoxy. It's extremely important to even _attain_ public standards that we all suspend the rules of commerce and competition and conflict and all that. You're arguing the opposite.
There's a collaborating on the common good that should be inherent to the production of shared standards of humanity. Kind of like science and their centuries of wrestling with this very point. The Internet is one of humanity's most important inventions, and getting trillion-dollar caps to agree on how to operate it is so incredibly fragile.
If you try to argue with me that because Wix and Shopify both have stupid designs that remove control over a URI from a Web author, I should relax my belief that standardization efforts are fundamentally an activity agnostic of commerce itself, I'd rather gnaw off my left leg than collaborate with a group you lead. We're just going to fight too much. I don't mean this to be disrespectful, for the record, I'm only trying to vividly illustrate how far apart philosophically that seemingly minor opinion places us.
The raw data shows 42. We used @SEMANTIC_LOGIC to force a limit of 3. The AI obeys the developer's rules, not just the CSV.
We failed to mention this context. It causes confusion. We are changing it to 42.
Physical stock rarely equals sellable stock. Items sit in abandoned carts. Or are held as safety buffers. If you have 42 items and 39 are reserved, telling the user "42 available" is the lie. It causes overselling.
The protocol allows the developer to define the sellable reality.
Crucially, we anticipated abuse. See Section 9: Cross-Verification.
If an agent detects systematic manipulation (fake urgency that contradicts checkout data), the merchant suffers a Trust Score penalty. The protocol is designed to penalize dark patterns, not enable them.
Even if it were, JSON is verbose. Every bracket and quote costs tokens.
In reality, the data is buried in 1MB+ of HTML. You download a haystack to find a needle.
We fetch a standalone text file. It cuts the syntax tax. It is pure signal.
People serve plain JSON all the time. This proposed standard is essentially a structured file anyway.. why not YAML? Why not INI? Getting away from bespoke unicorn file formats has been good for everyone.
Even then, you pay a syntax tax. JSON is verbose. Brackets and quotes waste valuable context window. Furthermore, the standard lacks behavior. JSON-LD lists facts but lacks instructions on how to sell (like @SEMANTIC_LOGIC). CommerceTXT is a fast lane. It does not replace JSON-LD. It optimizes it.
I'd say it's not heavy. JSON syntax is pretty lean compared to XML.
We map strictly to Schema.org for all transactional data (Price, Inventory, Policies). This ensures legal interoperability.
But Schema.org describes what a product is, not how to sell it.
So we extend it. We added directives like @SEMANTIC_LOGIC for agent behavior. We combine standard definitions for safety with new extensions for capability.
What I have found, however, with existing standardization of this kind of data (yours is not the first!), is that shopping sites (big ones) will lie, and you still need to read the HTML as ground truth.
Think of it like a cache. You use the commerce.txt for 99% of your agentic workflows because it’s 30% cheaper in tokens and 95% faster than parsing a 2MB HTML haystack.
You only 'bother' with the HTML for periodic spot-checks or when a high-value transaction requires absolute verification.
Without CommerceTXT, you are forced to pay the 'HTML tax' on every single interaction. With it, you get a high-speed fast lane for context, while keeping the HTML as a decentralized source of truth for when trust needs to be verified. It’s about moving the baseline from 'expensive and fragile' to 'efficient and auditable'.
I commend you for trying to start a standard. Letting the established players establish standards and protocols just gives them a bigger moat and more influence.
Pay very close attention to e-commerce and conversational commerce, rent seekers are pushing protocols.
When you are being scraper there are two possible reactions: 1 - good, because someone scraping your data is going to help you make a sale (discoverability) 2 - bad, work to obfuscate/block/prevent access.
In the first case, introducing a complex new standard that few if any will adopt achieves nothing compared to "here's a link for all the data in one spot, now leave my site alone. cheers".
In the second case, you actively don't want your data scraped, so why would you ever adopt this?
If you are reading all the inventory data into context then you are doing it wrong. Use your LLM to analyze the website and build a mapping for the HTML data, then parse using traditional methods (bs4 works nicely). You'll save yourself a gajillion tokens and get more consistent and accurate results at 1000x the speed.
Our spec handles this via @SEMANTIC_LOGIC and @BRAND_VOICE. It’s about how the AI represents your brand, not just the raw numbers.
Regarding bs4: mapping HTML to a thousand different store layouts is exactly what we are trying to escape. That is the 'fragility tax'. We are proposing a deterministic fast-lane that bypasses the need for custom scrapers for every single store.
You don't want the AI to 'guess' your data. You want it to 'know' your data.
You are also solving a business problem with a technical solution. Shopify recently announced that they will open up their entire catalog via an easy to use API to a select few enterprise partners. Amazon is doing a similar thing. This is because they do not want you and I to have the ability to programmatically query their catalog. They want to extract money out of specific partners who are trying to enshittify AI chat apps by throwing tons of ads in there. The big movers in the industry could have already easily adopted a similar standard but they are not going to on purpose. On top of you technical issues other commenters are pointing out, I don’t see why this should be in use at all.
I support platforms like Shopify and Wix because they empower 80% of independent merchants to exist online. But I oppose their move toward 'enterprise-only' data silos. When Shopify gates their catalog API for a few select partners, they aren't protecting the merchant. They are protecting their own rent-seeking position.
CommerceTXT is a way for a merchant on any platform to say: 'My data is mine, and I want it to be discoverable by any agent, not just the ones who paid the platform's entry fee'.
Regarding 'design smell': Every major shift in computing has required specialized protocols. We didn't use Gopher for the web, and we shouldn't use 2010-era REST APIs for 2025-era LLMs. Models have unique constraints-token costs and hallucination risks-that traditional APIs simply weren't built to handle.
We aren't building for the gatekeepers. We are building for the open commons.
AI is excellent at mapping from one format to another.
I use this method to great affect.
CommerceTXT isn't just about extraction; it's about contract-based delivery. We are moving from 'Guessing through Scraping' to 'Knowing through Protocol'. You're optimizing the process of scraping; we are eliminating the need for it.
I built CommerceTXT because I got tired of the fragility of extracting pricing and inventory data from HTML. AI agents currently waste ~8k tokens just to parse a product page, only to hallucinate the price or miss the fact that it's "Out of Stock".
CommerceTXT is a strict, read-only text protocol (CC0 Public Domain) designed to give agents deterministic ground truth. Think of it as `robots.txt` + `llms.txt` but structured specifically for transactions.
Key technical decisions v1.0:
1. *Fractal Architecture:* Root -> Category -> Product files. Agents only fetch what they need (saves bandwidth/tokens). 2. *Strictly Read-Only:* v1.0 intentionally excludes transactions/actions to avoid security nightmares. It's purely context. 3. *Token Efficiency:* A typical product definition is ~380 tokens vs ~8,500 for the HTML equivalent. 4. *Anti-Hallucination:* Includes directives like @INVENTORY with timestamps and @REVIEWS with verification sources.
The spec is live and open. I'd love your feedback on the directive structure and especially on the "Trust & Verification" concepts we're exploring.
Spec: https://github.com/commercetxt/commercetxt Website: https://commercetxt.org
Not affiliated with Hacker News or Y Combinator. We simply enrich the public API with analytics.