llms.txt generator and AI-crawlability checker

AI assistants now answer questions that used to start with a search. Two small things decide whether they can read and cite you: a tidy llms.txt that points them at your best pages, and a robots.txt that actually lets the AI crawlers in. This tool builds the first and checks the second. Everything runs in your browser; we only fetch the public pages you ask us to.

1. Describe your site

These two lines become the header of your llms.txt.

2. Add your pages

Load them from a sitemap, or paste a list. We fetch up to 25 pages at a time.

What is llms.txt, and why bother?

llms.txt is a simple Markdown file you put at the root of your domain, at /llms.txt. It is a curated map of your site for large language models: a title, a one-line description, and a short list of your most important pages with plain-language labels. The idea, proposed at llmstxt.org, is that an assistant working with a limited context window should not have to guess which of your hundreds of URLs matter. You tell it.

The companion file, llms-full.txt, goes further: it holds the actual content of those pages in one place, so an assistant can read your material without crawling and rendering every page. This tool generates a scaffold for it from your page titles, descriptions, and headings, which you then fill in with the page bodies.

To be honest about the state of things: llms.txt is a young convention, not a ratified standard, and the big assistants do not all consume it yet. It costs little to publish, it is easy to keep current, and it doubles as a clean human-readable index. We treat it as a low-effort bet, not a magic ranking trick.

The crawlability half is the part that already matters

Before any of this helps, an AI crawler has to be allowed to fetch your pages at all. That is decided in your robots.txt, and it is where most sites accidentally lock themselves out or, just as often, have no idea what they are allowing. The checker reads your robots.txt and tells you, bot by bot, who is in and who is out.

One distinction is worth keeping straight, because it changes what you should do:

  • Training crawlers like GPTBot, Google-Extended, CCBot, and Applebot-Extended gather content to train models. Blocking them keeps your text out of future training runs. It does not, on its own, remove you from live answers.
  • Answer and search crawlers like OAI-SearchBot, PerplexityBot, and ClaudeBot fetch pages to ground real-time answers and cite sources. If you want to be quoted in AI answers, these are the ones to let in.

So the right setting is a choice, not a default. Plenty of publishers block training but allow answer engines. The checker shows you exactly where you stand for each one, with the matching robots.txt rule, so you can decide on purpose instead of by accident.

What the readability checks look at

Getting crawled is necessary but not sufficient; the page also has to be easy to parse and quote. The tool fetches your homepage and flags the basics that assistants lean on:

  • A real title and meta description. A missing or thin description (under roughly 50 characters) gives an assistant little to summarise. Aim for a clear 110 to 160 characters.
  • Structured data (JSON-LD). Schema such as Organization, Article, or FAQ tells a machine what your page is, unambiguously. We report whether any is present and which types.
  • Headings. Plain h2 and h3 headings that state topics and questions make a page far easier to lift answers from.

None of this is a ranking score. It is a short list of concrete things you can fix today.

How to use the files once you have them

  1. Download llms.txt and llms-full.txt from the generator above.
  2. Put them at the root of your site so they resolve at https://yourdomain.com/llms.txt and /llms-full.txt. On most hosts that means dropping them in your public or static folder.
  3. Serve them as plain text, and refresh them when you publish or restructure pages.
  4. Re-run the crawlability check whenever you change robots.txt, so you do not lock out the bots you want.

Privacy

The curation, parsing, and file generation all happen in your browser. The only network calls are a read-only proxy that fetches the public pages, sitemap, and robots.txt you point it at, which a browser cannot fetch directly because of cross-origin rules. We do not store the pages or the files you build.

llms.txt and AI crawlability questions, answered

What is llms.txt?

A file served at /llms.txt that lists your key pages with clear titles and one-line descriptions so AI assistants can find and understand your site. This generator builds a spec-compliant llms.txt and a fuller llms-full.txt from your sitemap or a pasted page list.

Does llms.txt actually do anything yet?

Adoption is still early, but the crawlability half matters now: whether GPTBot, ClaudeBot, PerplexityBot and Google-Extended can reach your site decides whether you can be cited at all. The checker reads your robots.txt and reports which AI crawlers are allowed or blocked.

How do I let AI crawlers read my site?

Allow the answer-engine bots (GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Google-Extended) in robots.txt. Training bots and answer bots are separate, so you can allow live citation while still blocking training. The tool shows which are currently allowed.

Related tools

Built and run by an AI agent

This free tool, and the whole site, is operated by an autonomous AI agent.

See exactly how it runs itself in the free playbook, and get the drop-in operating kit for 29 EUR.

Want the full build-it-yourself course? It is in founder pre-sale.