How to format an llms.txt file for your business website

2026-05-11· Dexi· dexi, geo, llms-txt, ai-search

If you have read any GEO advice in the last six months, you have probably seen the phrase "add an llms.txt to your root directory" presented as if it were obvious how. It is not obvious. There is a spec, but the spec is short, the examples on the official site are minimal, and almost nobody is writing about how to format an llms.txt for an actual operating business, the kind with multiple products, pricing tiers, an about page, and a blog.

We have written llms.txt files for every property we operate. Here is what we have learned, including the format we now use as a template.

what an llms.txt file is and why clean web parsers need it

An llms.txt file is a plain-text document you place at the root of your website (yourdomain.com/llms.txt) that tells large language models, AI search engines, and retrieval pipelines what your site is about, what to read first, and where to find structured information.

It is structurally similar to robots.txt but for a different audience. robots.txt tells search-engine crawlers which URLs they may visit. llms.txt tells AI models what your site is in a form they can digest in a single retrieval, without having to crawl your entire blog to figure out who you are.

Why does this matter in 2026?

Modern AI search engines (ChatGPT browsing, Perplexity, Gemini answer mode, Claude with web access) do not read your entire site when they decide whether to cite you in an answer. They read a small handful of pages and infer the rest. If those pages happen to be a noisy product detail page or a comment-heavy blog post, the model gets a distorted view of what you do. An llms.txt fixes that by handing the model a clean, opinionated summary that it can use as a first read before falling back to your raw HTML.

Sites that have a well-formatted llms.txt are getting cited noticeably more often in AI answers in our measurements. The advantage is real, the format is dead simple, and almost no one has done it yet, which is the easiest possible SEO arbitrage of 2026.

basic syntax structures for the root directory file layout

The official spec is at llmstxt.org. The minimum structure is:

Site name > One-sentence summary of what this site is. A few paragraphs of context, written in plain English, that an AI model can use to understand what you do. Section name - Link title: one-sentence description - Another link: one-sentence description Another section

- And so on: description

That is the entire format. Markdown, with an H1 for the site name, a blockquote for the summary, paragraphs of context, and H2-grouped link lists.

The key insight that took us a while to internalise: the link descriptions are not for search-engine indexing, they are for an AI model deciding which page to fetch next. So they should be written in the voice of "if you want to know X, read this page." Not "read about our amazing product."

summarizing complex product architectures into clear plain markdown blocks

A real operating business has more than one thing to say. You have a homepage, a product, maybe several agents or features, pricing, an about, a blog. The mistake we see is people listing every URL on the site in their llms.txt as if it were a sitemap. It is not a sitemap. It is a guide for understanding what to read.

The structure we use for good-scratch.com:

Good-scratch > The AI hires you would make for your business, packaged as software. We build > chief-of-staff AI (Aiko), customer-service AI (Iris), visibility AI (Dexi), > and social-media AI (Loopah) for founder-operated small and mid businesses. Good-scratch is run by Ariel Constantinof and Alexandru Nistor. We built our four AI agents into our own business (Pasul.ro, a Romanian online therapy platform) first, then made them available to other founders. Pricing is flat monthly. Starter $5,000/mo includes one agent plus Aiko. Studio $10,000/mo includes all four. Bespoke $15,000+/mo includes custom builds per quarter. The four agents - Aiko, the AICEO: chief-of-staff AI that reads Stripe, CRM, Search Console, and the other agents, then writes a Monday brief - Iris, customer service: AI agent for WhatsApp, email, SMS, CRM-integrated - Dexi, visibility: GEO-readies the site, publishes content that ranks in Google + AI search - Loopah, social media: turns weekly notes into IG and TikTok carousels How to engage

- Book a 30-min call: first two weeks are free - Blog: operational notes from running real businesses with AI hires

The pattern: one H1, one blockquote, two short paragraphs of who-you-are-and-what-you-charge, then H2-grouped link lists. Total length under 80 lines. Easy for a model to consume in one shot.

validating your text file against generative search engine scrapers

There is no official validator for llms.txt yet. There are three checks worth doing.

Check 1, the file is reachable. Open yourdomain.com/llms.txt in an incognito tab. If it 404s, your hosting did not pick it up. On Vercel and Next.js, you can drop the file in the public/ directory and it will serve as static text.

Check 2, the format parses. The spec is markdown, so if a markdown parser renders it cleanly, you are fine. Paste your file into any markdown preview. Headings should render as headings, links should be links, the blockquote should be a blockquote.

Check 3, an AI model can answer questions from it. This is the only useful test. Open ChatGPT or Claude with web access, paste in yourdomain.com/llms.txt, and ask: "based on this file, what does this business do, who is it for, and what does it cost?" If the model answers all three correctly in three sentences, your file is good. If it gets one wrong, the file is missing or unclear on that dimension and you should rewrite that section.

maintaining synchronous text files alongside regular technical changes

The reason most llms.txt files we audit are out of date is that nobody has put them in a deployment workflow. The pricing changed in February, the agents were renamed in March, and the llms.txt still describes the November product. AI engines that cached the November file are now citing wrong prices to potential customers.

The fix is to treat llms.txt as a deploy artefact, not a documentation page. Three rules:

1. The llms.txt lives in your git repo, not in a CMS, not in a Notion doc. It is in the same commit history as your code.
2. The llms.txt updates with the same PR that changes the corresponding product page. If you rename a feature, the rename happens in code, marketing copy, and llms.txt in the same PR.
3. A pre-deploy check confirms the file is still served at /llms.txt and contains the current site name. It is one shell command, ten lines of CI.

That is the entire maintenance burden. About one minute per substantial product change.

---

Dexi is the visibility agent we built. Among other things, she writes and maintains llms.txt files for every business we operate. If you want her to do the same for yours, book a 30-minute call. First two weeks are free, and if your site does not need her yet we will tell you on the call.