CommerceTXT | Not Hacker News!

Discussion (31 comments)

Showing 60 comments

reddalo

14 days ago

1 reply

We should stop polluting website roots with these files (including llms.txt).

All these files should be registered with IANA and put under the .well-known namespace.

https://en.wikipedia.org/wiki/Well-known_URI

tsazan

14 days ago

2 replies

I understand the theoretical argument.

We follow the precedent of robots.txt, ads.txt, and llms.txt.

The reason is friction. Platforms like Shopify and Wix make .well-known folders difficult or impossible for merchants to configure. Root files work everywhere.

Adoption matters more than namespace hygiene.

JimDabell

14 days ago

1 reply

How about following the precedent of all of these users of /.well-known/

https://en.wikipedia.org/wiki/Well-known_URI#List_of_well-kn...

robots.txt was created three decades ago, when we didn’t know any better.

Moving llms.txt to /.well-known/ is literally issue #2 for llms.txt

https://github.com/AnswerDotAI/llms-txt/issues/2

Please stop polluting the web.

tsazan

14 days ago

2 replies

I prioritize simplicity and adoption for non-technical users over strict IETF compliance right now. My goal is to make this work for a shop owner on Shopify and Wix, not just for sysadmins.

That said, I am open to supporting .well-known as a secondary location in v1.1 if the community wants it.

xemdetia

14 days ago

1 reply

How is using a standard path 'just for sysadmins' again? You are introducing something new today.

tsazan

14 days ago

1 reply

Try uploading a file to /.well-known/ on Shopify or Wix. You cannot. Their file managers block hidden directories (starting with a dot). To do it, you need a custom app, a meta-field hack, or a reverse proxy. That is sysadmin work. Uploading a file to the root is user work. That is the difference.

reddalo

14 days ago

1 reply

So the solution is that both Shopify and Wix should let the user edit those files. That's it.

tsazan

14 days ago

Agreed. In a perfect world, they would. But I cannot merge PRs into Shopify's core. Waiting for trillion-dollar corporations to change their security models is a death sentence for a new protocol. We build for the infrastructure that exists today, not the one we wish for. When they open the gates, we will move. Until then, we live in the root.

hrimfaxi

14 days ago

1 reply

How are comments handled on proposals? Are you the final authority on the standard or is there some community consensus?

tsazan

14 days ago

I am the initiator, not the dictator. Governance is defined in Section 13: Contributing & Governance. Decisions are made by consensus of the Working Group. Right now, we are bootstrapping. I make the initial calls to ship v1.0, but the roadmap involves the community. I invite you to open an Issue.

robotstxtwasbad

14 days ago

1 reply

RFC 8820 section 2.3, also known as BCP 190, is not a theoretical argument. You are not entitled to declare what a root URI means in my Web namespace, and you are up against an IETF MUST NOT by doing so. This is a philosophical argument, not a practical one, and it's why I'm firmly against your standard out of the gate (and would work to reject it as, say, an RFC).

The same paragraph takes you to RFC 8615, which is the .well-known you are being told to use. That is not your "secondary location" for v1.1. That is the only path you are permitted to standardize.

tsazan

14 days ago

1 reply

I appreciate the detailed feedback (and the edits).

You are technically correct regarding IETF norms.

But you say: "Wix and Shopify have zero bearing on the standardization of the Web."

I fundamentally disagree. The Web is not just a namespace for engineers; it is an economy for millions of small businesses. If a standard is technically "pure" but unusable by 80% of merchants on hosted platforms, it fails the Web.

However, to respect the namespace: We will mandate checking /.well-known/commerce.txt first.

But we will keep the root location as a fallback. We prioritize accessibility for the "aspiring" shop owner over strict purity for the standards writer.

robotstxtwasbad

14 days ago

1 reply

If you fundamentally disagree with that, you are simply never going to deliver a workable standard via the IETF process.

Thankfully, you've licensed your work CC0, so someone who wants to see this standardized could simply fork your work, fix the offending parts, and move for successful standardization without you.

tsazan

14 days ago

1 reply

The CC0 license is not a bug. It is a feature. If you fork this and build a standard that helps merchants better, the mission succeeds. I will be the first to applaud. As for "We": It is an invitation, not a pretension. A standard cannot be a solo act. I am bootstrapping the working group. You are welcome to join it, disagreements and all.

robotstxtwasbad

14 days ago

1 reply

One of the things you realize reading and writing a lot of standards -- and I really don't mean that to be condescending towards you, promise -- is that there's a certain orthodoxy to the whole thing regarding keeping an arm's length from commerce.

Consider C#. Yeah, yeah, we all know the provenance of the language, that what ECMA has standardized is basically a Microsoft specification, but once it's an ECMA standard it's Something Else. Competitors can work on it together, and we're all fine with that. Carrying on C# development in the open is harder for Microsoft in some ways, and easier for them in others. This opinion is about ten years old, mind you, and speaks more to the origin of C# (I'm not a practitioner), so I'm sure the Core stuff has changed all of this and made me look silly saying this, but that speaks to my point -- work evolves in public.

Say I work at Apple. I tell my boss I had lunch with a Samsung guy, I might get a side eye. I tell my boss I had lunch with a Samsung guy because we're collaborating on some revision to SSD TRIM or something, it's oh, OK, cool, no side eye. That's the orthodoxy. It's extremely important to even _attain_ public standards that we all suspend the rules of commerce and competition and conflict and all that. You're arguing the opposite.

There's a collaborating on the common good that should be inherent to the production of shared standards of humanity. Kind of like science and their centuries of wrestling with this very point. The Internet is one of humanity's most important inventions, and getting trillion-dollar caps to agree on how to operate it is so incredibly fragile.

If you try to argue with me that because Wix and Shopify both have stupid designs that remove control over a URI from a Web author, I should relax my belief that standardization efforts are fundamentally an activity agnostic of commerce itself, I'd rather gnaw off my left leg than collaborate with a group you lead. We're just going to fight too much. I don't mean this to be disrespectful, for the record, I'm only trying to vividly illustrate how far apart philosophically that seemingly minor opinion places us.

tsazan

14 days ago

I respect that orthodoxy. It is the bedrock that allows the Internet to function. But we are optimizing for different variables. You optimize for architectural purity on a timeline of decades. You protect the namespace from temporary corporate flaws. I optimize for utility on a timeline of now. I want the flower shop owner to be visible to AI today, even if their platform is rigid. We have different North Stars. That is okay. You guard the temple. I will help the merchants outside. No leg-gnawing required. Thank you for the perspective.

amitav1

14 days ago

1 reply

Wait, am I dumb, or did the authors hallucinate? @INVENTORY says that 42 are in stock, but the text says "Only 3 left". Am I misunderstanding this or does stock mean something else?

tsazan

14 days ago

1 reply

Good eye. This demonstrates the protocol’s core feature.

The raw data shows 42. We used @SEMANTIC_LOGIC to force a limit of 3. The AI obeys the developer's rules, not just the CSV.

We failed to mention this context. It causes confusion. We are changing it to 42.

nebezb

14 days ago

1 reply

Ah, so dark patterns then. Baked right into your standard.

tsazan

14 days ago

1 reply

Not dark patterns. Operational logic.

Physical stock rarely equals sellable stock. Items sit in abandoned carts. Or are held as safety buffers. If you have 42 items and 39 are reserved, telling the user "42 available" is the lie. It causes overselling.

The protocol allows the developer to define the sellable reality.

Crucially, we anticipated abuse. See Section 9: Cross-Verification.

If an agent detects systematic manipulation (fake urgency that contradicts checkout data), the merchant suffers a Trust Score penalty. The protocol is designed to penalize dark patterns, not enable them.

hrimfaxi

14 days ago

1 reply

Who maintains this trust score? How is it communicated to other agents?

tsazan

14 days ago

There is no central authority. The Trust Score is a conceptual framework, not a shared database. Each AI platform (OpenAI, Anthropic, Google) builds its own model. They retain full discretion. Agents do not talk to each other. They talk to users. If a score is low, the agent warns the user. It adds caveats or drops the recommendation. It does not broadcast to other bots.

duskdozer

14 days ago

1 reply

I'm not sure I understand the point of this as opposed to something like a json file, and also, assuming there is any type of structured format, why one would use an LLM for this task instead of a normal parser.

tsazan

14 days ago

1 reply

You assume JSON is a standalone file. It rarely is.

Even if it were, JSON is verbose. Every bracket and quote costs tokens.

In reality, the data is buried in 1MB+ of HTML. You download a haystack to find a needle.

We fetch a standalone text file. It cuts the syntax tax. It is pure signal.

xemdetia

14 days ago

1 reply

I believe what the commenter is suggesting is that since this is supposed to be machine readable then why not start with a common format like JSON similar to how things like MCP serve what functions are available or an OpenAPI spec. Generate the JSON and serve that from the well known directory.

People serve plain JSON all the time. This proposed standard is essentially a structured file anyway.. why not YAML? Why not INI? Getting away from bespoke unicorn file formats has been good for everyone.

tsazan

14 days ago

JSON is great for code. It is heavy and deeply nested for Agents. The constraint is the context window. Brackets, quotes, and nesting are token tax. YAML is brittle. Whitespace errors break parsers. We chose the robots.txt model. It is dense and resilient. It is not a unicorn. It is a workhorse.

throwaway_20357

14 days ago

1 reply

Can shops not just embed Schema/JSON-LD in the page if they want their information to be machine readable?

tsazan

14 days ago

2 replies

That is the current standard. But it is hard for agents to read efficiently. To access JSON-LD, an agent must download the entire HTML page. This creates a haystack problem where you download 2MB of noise just to find 5KB of data.

Even then, you pay a syntax tax. JSON is verbose. Brackets and quotes waste valuable context window. Furthermore, the standard lacks behavior. JSON-LD lists facts but lacks instructions on how to sell (like @SEMANTIC_LOGIC). CommerceTXT is a fast lane. It does not replace JSON-LD. It optimizes it.

inerte

14 days ago

1 reply

Wouldn't be easier on everybody (servers and clients) to just expose Structured Data in a text file then? And add the 1 or 2 things it doesn't have?

tsazan

14 days ago

1 reply

That solves bandwidth. It fails on tokens. JSON syntax is heavy. Brackets and quotes consume context window. More importantly, Schema.org is a dictionary of facts. It lacks behavior. It defines what a product is, but not how to sell it. It has no concept of @SEMANTIC_LOGIC or @BRAND_VOICE. We need a format that carries both data and instructions efficiently. JSON-LD is too verbose and too static for that.

reddalo

14 days ago

1 reply

> JSON syntax is heavy.

I'd say it's not heavy. JSON syntax is pretty lean compared to XML.

tsazan

14 days ago

JSON is lean for data exchange between machines. But in the LLM economy, the currency is tokens, not bytes. To an LLM tokenizer, every bracket and quote is a distinct cost. In our tests, this 'syntax tax' accounts for up to 30% of the payload. We chose a line-oriented format to minimize overhead and maximize the context window for actual commerce data.

tjhorner

14 days ago

1 reply

Who says you need to pipe the entire document with JSON-LD directly into the context window? I agree, that is very wasteful. You can just parse the relevant bits out and convert the JSON-LD data into something like your txt format before presenting it to the LLM. Bake that right into whatever tool it uses to scrape websites.

tsazan

14 days ago

That solves the Token Tax. It fails the Bandwidth Tax. To get that JSON-LD, you still download 2MB of HTML. You execute JS. You parse the DOM. You are buying a haystack to find a needle, then cleaning the needle. We propose serving just the needle. Furthermore, JSON-LD is strictly for facts. It cannot express @SEMANTIC_LOGIC. It lacks the instructions on how to sell.

captn3m0

14 days ago

1 reply

How is "schema.org compatibility" related to Legal Compliance?

tsazan

14 days ago

1 reply

Schema.org is the dictionary for facts.

We map strictly to Schema.org for all transactional data (Price, Inventory, Policies). This ensures legal interoperability.

But Schema.org describes what a product is, not how to sell it.

So we extend it. We added directives like @SEMANTIC_LOGIC for agent behavior. We combine standard definitions for safety with new extensions for capability.

captn3m0

14 days ago

1 reply

Is there a specific regulation this is for? What’s the compliance bit?

tsazan

14 days ago

It targets Consumer Protection and Truth-in-Advertising laws globally. The 'compliance bit' is Price Transparency. If an AI quotes a price as 'final' but checkout adds hidden fees or tax, that is a deceptive practice. Our spec enforces fields like TaxIncluded and TaxNote. It instructs the Agent to disclose whether the price is net or gross. It prevents the AI from accidentally committing fraud via misleading omissions.

hrimfaxi

14 days ago

1 reply

How do you avoid downloading the whole haystack to search through the data? How does the hierarchy work? I have to keep a bunch of .txt files updated in my web root? Doesn't this require essentially mirroring the inventory db as text files (if the intent is for accurate counts of items, etc they would need to be updated in real time)?

tsazan

14 days ago

You do not download the haystack. You traverse it. The architecture is fractal. The agent reads the Root. If the user wants "Headphones", it follows that specific link. It ignores the rest. It is lazy loading for context. Do not mirror your DB manually. For real stores, generate the files dynamically. It is a view layer, just like HTML or sitemap.xml. Real-time? Yes. Since it is a dynamic response, it reflects the DB state instantly. Cache-Control headers handle the freshness.

pdntspa

14 days ago

1 reply

This would have been great if it was adopted while I was still working on shopping site scrapers

tsazan

14 days ago

1 reply

It definitely lowers the barrier. But relying on messy HTML as a defense against competitors is 'security through obscurity'. It does not stop them; it just costs you server CPU. The data is public. If you put it on the screen, a scraper can read it. CommerceTXT just ensures that the good bots (AI Agents bringing customers) get it efficiently, while you can still block the bad ones via WAF.

pdntspa

14 days ago

1 reply

If it delivers accurate data then I can hit that instead of scraping the full HTML. Everybody wins.

What I have found, however, with existing standardization of this kind of data (yours is not the first!), is that shopping sites (big ones) will lie, and you still need to read the HTML as ground truth.

tsazan

14 days ago

1 reply

You are right. Standardization often drifts from reality. That is why we built Section 9: Cross-Verification. The HTML remains the audit layer. The Agent does not trust blindly. It spot-checks. If commerce.txt says $50 but the HTML says $100, the merchant gets a Trust Score penalty. We do not replace the ground truth. We cache it, and we audit the cache to ensure it matches.

pdntspa

14 days ago

1 reply

Then why bother with commerce.txt?

tsazan

14 days ago

Because you don't need to audit every single transaction.

Think of it like a cache. You use the commerce.txt for 99% of your agentic workflows because it’s 30% cheaper in tokens and 95% faster than parsing a 2MB HTML haystack.

You only 'bother' with the HTML for periodic spot-checks or when a high-value transaction requires absolute verification.

Without CommerceTXT, you are forced to pay the 'HTML tax' on every single interaction. With it, you get a high-speed fast lane for context, while keeping the HTML as a decentralized source of truth for when trust needs to be verified. It’s about moving the baseline from 'expensive and fragile' to 'efficient and auditable'.

theturtletalks

14 days ago

1 reply

I’m working on a decentralized marketplace and for now, we tap into the store’s e-commerce platform API to get the inventory, handle cart creation, etc.

I commend you for trying to start a standard. Letting the established players establish standards and protocols just gives them a bigger moat and more influence.

Pay very close attention to e-commerce and conversational commerce, rent seekers are pushing protocols.

tsazan

14 days ago

1 reply

APIs are toll roads. If you need an API key just to read a price, it is not the Open Web. It is a walled garden. We designed this to be permissionless. A text file has no gatekeeper. It bypasses the rent seekers entirely. The standard must belong to the commons, or it becomes just another extraction layer. Keep fighting the good fight.

theturtletalks

14 days ago

1 reply

I'm working on a Shopify alternative[0] as part of this decentralized marketplace. If adding support for CommerceTXT is not too difficult, I wouldn't mind adding it.

0. https://github.com/openshiporg/openfront

tsazan

14 days ago

1 reply

That would be a fantastic first implementation. Openship is exactly the kind of architecture CommerceTXT is built for. Integration is straightforward: it’s essentially just a new 'View' layer. Instead of rendering HTML, you render a .txt endpoint that maps your existing product DB to our fields. I'll head over to your repo and open an Issue to discuss how we can map Openfront's data to the spec. I'd be happy to guide the implementation myself. Let's get this moving!

theturtletalks

14 days ago

Sure, that sounds good! Happy to hop on a call to get things moving.

dehugger

14 days ago

1 reply

Better idea, how about you just put a link to a csv dump of your inventory data and label it "AI Agents/Scrapers, click here to get all the inventory data", embed that on every page, then call it a day?

When you are being scraper there are two possible reactions: 1 - good, because someone scraping your data is going to help you make a sale (discoverability) 2 - bad, work to obfuscate/block/prevent access.

In the first case, introducing a complex new standard that few if any will adopt achieves nothing compared to "here's a link for all the data in one spot, now leave my site alone. cheers".

In the second case, you actively don't want your data scraped, so why would you ever adopt this?

If you are reading all the inventory data into context then you are doing it wrong. Use your LLM to analyze the website and build a mapping for the HTML data, then parse using traditional methods (bs4 works nicely). You'll save yourself a gajillion tokens and get more consistent and accurate results at 1000x the speed.

tsazan

14 days ago

2 replies

A CSV is a dump of facts. CommerceTXT is a layer of intent and logic. If you give an AI a giant CSV of your whole inventory, you blow the context window before the conversation even starts. If you serve a CSV per product, you still pay for headers and commas without getting any behavioral control.

Our spec handles this via @SEMANTIC_LOGIC and @BRAND_VOICE. It’s about how the AI represents your brand, not just the raw numbers.

Regarding bs4: mapping HTML to a thousand different store layouts is exactly what we are trying to escape. That is the 'fragility tax'. We are proposing a deterministic fast-lane that bypasses the need for custom scrapers for every single store.

You don't want the AI to 'guess' your data. You want it to 'know' your data.

IgorPartola

14 days ago

1 reply

Meh. I would rather just have the ability to query any given products catalog in a machine-readable way. Any tool or protocol specifically designed for an LLM to consume is in my opinion a design smell. We should instead design proper APIs and protocols usable by all kinds of program and the LLMs can adapt.

You are also solving a business problem with a technical solution. Shopify recently announced that they will open up their entire catalog via an easy to use API to a select few enterprise partners. Amazon is doing a similar thing. This is because they do not want you and I to have the ability to programmatically query their catalog. They want to extract money out of specific partners who are trying to enshittify AI chat apps by throwing tons of ads in there. The big movers in the industry could have already easily adopted a similar standard but they are not going to on purpose. On top of you technical issues other commenters are pointing out, I don’t see why this should be in use at all.

tsazan

14 days ago

1 reply

You’ve identified the exact tension we are navigating.

I support platforms like Shopify and Wix because they empower 80% of independent merchants to exist online. But I oppose their move toward 'enterprise-only' data silos. When Shopify gates their catalog API for a few select partners, they aren't protecting the merchant. They are protecting their own rent-seeking position.

CommerceTXT is a way for a merchant on any platform to say: 'My data is mine, and I want it to be discoverable by any agent, not just the ones who paid the platform's entry fee'.

Regarding 'design smell': Every major shift in computing has required specialized protocols. We didn't use Gopher for the web, and we shouldn't use 2010-era REST APIs for 2025-era LLMs. Models have unique constraints-token costs and hallucination risks-that traditional APIs simply weren't built to handle.

We aren't building for the gatekeepers. We are building for the open commons.

IgorPartola

13 days ago

Eh. This is an arrogant and misguided take. Good luck.

dehugger

13 days ago

1 reply

the entire point of the system I described is that it never needs to load that data into context.

AI is excellent at mapping from one format to another.

I use this method to great affect.

tsazan

13 days ago

The mapping approach assumes the web is static. In reality, you're building a 'maintenance debt' machine. For every 1,000 stores, you need 1,000 AI-generated mappings that break whenever a dev changes a CSS class.

CommerceTXT isn't just about extraction; it's about contract-based delivery. We are moving from 'Guessing through Scraping' to 'Knowing through Protocol'. You're optimizing the process of scraping; we are eliminating the need for it.

cloudflare728

14 days ago

Why do you think a standard generated by a single person using AI in few hours is a good idea? How much thought did you put into this? Did you considered real world use cases?

tsazan

17 days ago

Hi HN, author here.

I built CommerceTXT because I got tired of the fragility of extracting pricing and inventory data from HTML. AI agents currently waste ~8k tokens just to parse a product page, only to hallucinate the price or miss the fact that it's "Out of Stock".

CommerceTXT is a strict, read-only text protocol (CC0 Public Domain) designed to give agents deterministic ground truth. Think of it as `robots.txt` + `llms.txt` but structured specifically for transactions.

Key technical decisions v1.0:

1. *Fractal Architecture:* Root -> Category -> Product files. Agents only fetch what they need (saves bandwidth/tokens). 2. *Strictly Read-Only:* v1.0 intentionally excludes transactions/actions to avoid security nightmares. It's purely context. 3. *Token Efficiency:* A typical product definition is ~380 tokens vs ~8,500 for the HTML equivalent. 4. *Anti-Hallucination:* Includes directives like @INVENTORY with timestamps and @REVIEWS with verification sources.

The spec is live and open. I'd love your feedback on the directive structure and especially on the "Trust & Verification" concepts we're exploring.

Spec: https://github.com/commercetxt/commercetxt Website: https://commercetxt.org

Resources