An AI visiting a website sees something very different from what a human sees. It sees <nav> and <header> tags, cookie banners, analytics scripts, menus repeated on every page, legal footers — and somewhere in between, the content that actually matters.

A human filters all of this automatically. An AI has to process it, weigh it, understand what is signal and what is page structure. For an AI agent trying to answer "what does this company do?", reading a conversion-optimized homepage requires the same effort as reading a document where 90% of the text is redundant.

This is not a theoretical problem. It is why AI systems often describe companies in vague terms, confuse similar services, or fail to answer specific questions about a product even when the answers are publicly available on the site.

llms.txt is the solution.

What is llms.txt

llms.txt is a plain text file in Markdown format placed at the root of a website — accessible at /llms.txt. It contains a structured description of the site designed specifically to be consumed by AI systems.

It was proposed by Jeremy Howard (fast.ai) in 2024. The principle is simple: instead of asking AI to extract signal from the noise of a human-optimized website, give it the signal directly.

The format is deliberately minimal:

# Organization Name

> Dense description: what it does, for whom, specific value.

## Services

- [Service A](URL): What it is, for whom, concrete output.

## Blog

- [Article title](URL): The main insight in one line.

## Contact

- [Demo](URL): How to get started.

No HTML. No CSS. No navigation structure. Just content.

The three-file ecosystem

llms.txt is not a single file — it is a system with three levels, each designed for a different use case.

llms.txt — the navigable index

The main file. It contains links with descriptions, organized into thematic sections. An AI reading only llms.txt should be able to answer: what does this organization do, who is it for, what services does it offer, how to contact it. If it cannot answer without visiting other URLs, the file is incomplete. Typical size: 5–50 KB.

llms-full.txt — the complete content

Where llms.txt says "this article covers X (see URL)", llms-full.txt includes the actual text of the article. It is designed for systems that want to consume the entire site in a single request without page-by-page crawling.

The primary use case is in enterprise RAG pipelines: instead of configuring a crawler to index 50 pages, you load llms-full.txt as a single document and your knowledge base immediately has the content of the entire site. Typical size: 100 KB–5 MB.

Language variants

A multilingual site can have localized versions — /it/llms.txt, /de/llms.txt — that point only to content in the correct language. An AI agent queried in Italian is guided toward Italian content, not the English default.

Who actually reads these files — and when

This is the most important section, and the least understood.

ClaudeBot — Anthropic's training crawler

Anthropic has a web crawler identified by the User-Agent ClaudeBot/1.0. It crawls the web to gather training data. It respects robots.txt:

# robots.txt
User-agent: ClaudeBot
Allow: /

# To exclude specific sections:
User-agent: ClaudeBot
Disallow: /admin/

ClaudeBot reads the site like any other web page. llms.txt contributes to training like any quality document on the site. Its value here is content clarity, not format.

AI agents with runtime browsing

When a user asks Claude or ChatGPT to "go to a website and explain what they do", the AI visits the site in real time with a browsing tool. Here llms.txt has direct value: a well-configured agent reads /llms.txt first, immediately gets a structured map, and can answer with much more context.

Semantic crawlers — Perplexity, You.com, Bing AI

AI-based search engines read llms.txt actively as the starting point for crawling. They use it as a map to decide which pages to prioritize. Being listed in directories like llmstxt.site amplifies this visibility.

Enterprise RAG pipelines

An organization building an internal AI assistant can configure it to read llms-full.txt as a knowledge source. The file is loaded directly into the RAG system's vector store instead of crawling the entire site.

System	Reads llms.txt?	Reads llms-full.txt?
AI in chat (no browsing)	No	No
AI with browsing	Rarely	No
ClaudeBot (training)	Like any page	Like any page
Semantic crawlers	Yes, as priority	No
Custom agents	Yes	If configured
RAG pipelines	Only if included	Yes, built for this

How to implement it: four approaches

Approach 1: Static file

The simplest approach. Create an llms.txt file in the site root and update it manually. Limit: it drifts out of sync with the blog after each new article.

Approach 2: Next.js App Router

In Next.js, create a route handler at app/llms.txt/route.ts. Next.js serves it as the HTTP response for /llms.txt.

// app/llms.txt/route.ts
import { getAllPosts } from '@/lib/posts';

export const dynamic = 'force-static';

export async function GET() {
  const posts = await getAllPosts('en');

  const lines = [
    '# Organization Name\n',
    '> Description.\n',
    '## Blog\n',
    ...posts.map(p => `- [${p.title}](${p.url}): ${p.excerpt}`)
  ];

  return new Response(lines.join('\n'), {
    headers: { 'Content-Type': 'text/plain; charset=utf-8' },
  });
}

With force-static, the file is generated at build time — updated on every deploy without runtime overhead.

Approach 3: Build-time script — Vite, React, PHP/MySQL

For stacks without a Node.js server in production, generate the file before deployment with a Node.js script that calls the API and writes to the filesystem:

// scripts/generate-llms.js
import { writeFileSync } from 'fs';

async function fetchAllPosts(lang) {
  const res = await fetch(
    `${API_BASE}/blog/get-posts.php?status=published&lang=${lang}&limit=50`
  );
  const data = await res.json();
  return data.posts ?? [];
}

writeFileSync('llms.txt', content, 'utf8');
writeFileSync('dist/llms.txt', content, 'utf8');

// package.json
{
  "scripts": {
    "generate-llms": "node scripts/generate-llms.js",
    "build:full": "node scripts/generate-llms.js && vite build"
  }
}

Approach 4: Dynamic PHP

For pure PHP sites, create a PHP file that generates the content and use .htaccess to serve /llms.txt through it:

# .htaccess
RewriteRule ^llms\.txt$ /llms-dynamic.php [L]

<?php
header('Content-Type: text/plain; charset=utf-8');
header('Cache-Control: public, max-age=3600');

$posts = $pdo->query("
  SELECT slug, title_en, excerpt_en
  FROM blog_posts WHERE status = 'published'
  ORDER BY published_at DESC
")->fetchAll();

echo "# Organization Name\n\n";
foreach ($posts as $p) {
  echo "- [{$p['title_en']}](https://example.com/en/blog/{$p['slug']}): {$p['excerpt_en']}\n";
}

The structure AI expects

Content matters as much as format. Required sections:

H1 + Blockquote — primary identity and dense description
Services/Products — with real descriptions, not marketing copy
Blog/Resources — each article with its main insight in one line
Contact — how to get started

Sections that add significant value:

Target & Differentiators — who it is for, what sets it apart from competitors
Key Facts — 5-6 frequent questions with direct answers in the file
Stack/Integrations — supported technologies and platforms

Where to register your file

Once published:

llmstxt.site — the primary directory, consulted by developers and AI crawlers
robots.txt — add # llms.txt: /llms.txt as an informational comment
XML Sitemap — some implementations include llms.txt to ensure indexing

The right time to start

llms.txt is in an early adoption phase. Sites that implement it now have a real competitive advantage: they are represented more accurately by AI systems than competitors who have not yet done so.

The implementation cost is low — a static file takes an hour, a dynamic route takes half a day. The advantage is direct: anyone using an AI agent to research vendors will find your organization described accurately, and competitors described approximately.

AI systems have become part of the enterprise research journey. llms.txt is how you ensure that when a CTO asks an AI agent "which companies do AI automation governance?", the answer includes yours.

Talking to AI Through Your Website: A Guide to llms.txt

What is llms.txt

The three-file ecosystem

llms.txt — the navigable index

llms-full.txt — the complete content

Language variants

Who actually reads these files — and when

ClaudeBot — Anthropic's training crawler

AI agents with runtime browsing

Semantic crawlers — Perplexity, You.com, Bing AI

Enterprise RAG pipelines

How to implement it: four approaches

Approach 1: Static file

Approach 2: Next.js App Router

Approach 3: Build-time script — Vite, React, PHP/MySQL

Approach 4: Dynamic PHP

The structure AI expects

Where to register your file

The right time to start

AGORÀ

What is llms.txt

The three-file ecosystem

llms.txt — the navigable index

llms-full.txt — the complete content

Language variants

Who actually reads these files — and when

ClaudeBot — Anthropic's training crawler

AI agents with runtime browsing

Semantic crawlers — Perplexity, You.com, Bing AI

Enterprise RAG pipelines

How to implement it: four approaches

Approach 1: Static file

Approach 2: Next.js App Router

Approach 3: Build-time script — Vite, React, PHP/MySQL

Approach 4: Dynamic PHP

The structure AI expects

Where to register your file

The right time to start

Related Articles

35% of Organizations Admitted They Couldn't Stop a Rogue AI Agent

Half the World Has Never Heard of an AI Agent. The Other Half Is Building Them.

Security Leaders Called Their AI Posture 'Extremely Confident.' 84% Had Already Been Breached.

Stay One Step Ahead

Get the latest articles and updates

AGORÀ