An AI visiting a website sees something very different from what a human sees. It sees <nav> and <header> tags, cookie banners, analytics scripts, menus repeated on every page, legal footers — and somewhere in between, the content that actually matters.
A human filters all of this automatically. An AI has to process it, weigh it, understand what is signal and what is page structure. For an AI agent trying to answer "what does this company do?", reading a conversion-optimized homepage requires the same effort as reading a document where 90% of the text is redundant.
This is not a theoretical problem. It is why AI systems often describe companies in vague terms, confuse similar services, or fail to answer specific questions about a product even when the answers are publicly available on the site.
llms.txt is the solution.
What is llms.txt
llms.txt is a plain text file in Markdown format placed at the root of a website — accessible at /llms.txt. It contains a structured description of the site designed specifically to be consumed by AI systems.
It was proposed by Jeremy Howard (fast.ai) in 2024. The principle is simple: instead of asking AI to extract signal from the noise of a human-optimized website, give it the signal directly.
The format is deliberately minimal:
# Organization Name
> Dense description: what it does, for whom, specific value.
## Services
- [Service A](URL): What it is, for whom, concrete output.
## Blog
- [Article title](URL): The main insight in one line.
## Contact
- [Demo](URL): How to get started.No HTML. No CSS. No navigation structure. Just content.
The three-file ecosystem
llms.txt is not a single file — it is a system with three levels, each designed for a different use case.
llms.txt — the navigable index
The main file. It contains links with descriptions, organized into thematic sections. An AI reading only llms.txt should be able to answer: what does this organization do, who is it for, what services does it offer, how to contact it. If it cannot answer without visiting other URLs, the file is incomplete. Typical size: 5–50 KB.
llms-full.txt — the complete content
Where llms.txt says "this article covers X (see URL)", llms-full.txt includes the actual text of the article. It is designed for systems that want to consume the entire site in a single request without page-by-page crawling.
The primary use case is in enterprise RAG pipelines: instead of configuring a crawler to index 50 pages, you load llms-full.txt as a single document and your knowledge base immediately has the content of the entire site. Typical size: 100 KB–5 MB.
Language variants
A multilingual site can have localized versions — /it/llms.txt, /de/llms.txt — that point only to content in the correct language. An AI agent queried in Italian is guided toward Italian content, not the English default.
Who actually reads these files — and when
This is the most important section, and the least understood.
ClaudeBot — Anthropic's training crawler
Anthropic has a web crawler identified by the User-Agent ClaudeBot/1.0. It crawls the web to gather training data. It respects robots.txt:
# robots.txt
User-agent: ClaudeBot
Allow: /
# To exclude specific sections:
User-agent: ClaudeBot
Disallow: /admin/ClaudeBot reads the site like any other web page. llms.txt contributes to training like any quality document on the site. Its value here is content clarity, not format.
AI agents with runtime browsing
When a user asks Claude or ChatGPT to "go to a website and explain what they do", the AI visits the site in real time with a browsing tool. Here llms.txt has direct value: a well-configured agent reads /llms.txt first, immediately gets a structured map, and can answer with much more context.
Semantic crawlers — Perplexity, You.com, Bing AI
AI-based search engines read llms.txt actively as the starting point for crawling. They use it as a map to decide which pages to prioritize. Being listed in directories like llmstxt.site amplifies this visibility.
Enterprise RAG pipelines
An organization building an internal AI assistant can configure it to read llms-full.txt as a knowledge source. The file is loaded directly into the RAG system's vector store instead of crawling the entire site.
| System | Reads llms.txt? | Reads llms-full.txt? |
|---|---|---|
| AI in chat (no browsing) | No | No |
| AI with browsing | Rarely | No |
| ClaudeBot (training) | Like any page | Like any page |
| Semantic crawlers | Yes, as priority | No |
| Custom agents | Yes | If configured |
| RAG pipelines | Only if included | Yes, built for this |
How to implement it: four approaches
Approach 1: Static file
The simplest approach. Create an llms.txt file in the site root and update it manually. Limit: it drifts out of sync with the blog after each new article.
Approach 2: Next.js App Router
In Next.js, create a route handler at app/llms.txt/route.ts. Next.js serves it as the HTTP response for /llms.txt.
// app/llms.txt/route.ts
import { getAllPosts } from '@/lib/posts';
export const dynamic = 'force-static';
export async function GET() {
const posts = await getAllPosts('en');
const lines = [
'# Organization Name\n',
'> Description.\n',
'## Blog\n',
...posts.map(p => `- [${p.title}](${p.url}): ${p.excerpt}`)
];
return new Response(lines.join('\n'), {
headers: { 'Content-Type': 'text/plain; charset=utf-8' },
});
}With force-static, the file is generated at build time — updated on every deploy without runtime overhead.
Approach 3: Build-time script — Vite, React, PHP/MySQL
For stacks without a Node.js server in production, generate the file before deployment with a Node.js script that calls the API and writes to the filesystem:
// scripts/generate-llms.js
import { writeFileSync } from 'fs';
async function fetchAllPosts(lang) {
const res = await fetch(
`${API_BASE}/blog/get-posts.php?status=published&lang=${lang}&limit=50`
);
const data = await res.json();
return data.posts ?? [];
}
writeFileSync('llms.txt', content, 'utf8');
writeFileSync('dist/llms.txt', content, 'utf8');// package.json
{
"scripts": {
"generate-llms": "node scripts/generate-llms.js",
"build:full": "node scripts/generate-llms.js && vite build"
}
}Approach 4: Dynamic PHP
For pure PHP sites, create a PHP file that generates the content and use .htaccess to serve /llms.txt through it:
# .htaccess
RewriteRule ^llms\.txt$ /llms-dynamic.php [L]<?php
header('Content-Type: text/plain; charset=utf-8');
header('Cache-Control: public, max-age=3600');
$posts = $pdo->query("
SELECT slug, title_en, excerpt_en
FROM blog_posts WHERE status = 'published'
ORDER BY published_at DESC
")->fetchAll();
echo "# Organization Name\n\n";
foreach ($posts as $p) {
echo "- [{$p['title_en']}](https://example.com/en/blog/{$p['slug']}): {$p['excerpt_en']}\n";
}The structure AI expects
Content matters as much as format. Required sections:
- H1 + Blockquote — primary identity and dense description
- Services/Products — with real descriptions, not marketing copy
- Blog/Resources — each article with its main insight in one line
- Contact — how to get started
Sections that add significant value:
- Target & Differentiators — who it is for, what sets it apart from competitors
- Key Facts — 5-6 frequent questions with direct answers in the file
- Stack/Integrations — supported technologies and platforms
Where to register your file
Once published:
- llmstxt.site — the primary directory, consulted by developers and AI crawlers
- robots.txt — add
# llms.txt: /llms.txtas an informational comment - XML Sitemap — some implementations include
llms.txtto ensure indexing
The right time to start
llms.txt is in an early adoption phase. Sites that implement it now have a real competitive advantage: they are represented more accurately by AI systems than competitors who have not yet done so.
The implementation cost is low — a static file takes an hour, a dynamic route takes half a day. The advantage is direct: anyone using an AI agent to research vendors will find your organization described accurately, and competitors described approximately.
AI systems have become part of the enterprise research journey. llms.txt is how you ensure that when a CTO asks an AI agent "which companies do AI automation governance?", the answer includes yours.