2026 Webmaster Playbook: Automate llms.txt for AI Search with Python
A clear shift is happening in 2026: more content discovery starts from AI search and answer systems, not only classic search results.
The practical problem is that many sites still face three issues:
- AI systems discover pages slowly
- They crawl pages but miss the key content
- New posts go live, but discovery entry files are not updated in time
If you run a content site, adding llms.txt is a low-cost and practical upgrade.
In this tutorial, we will do it in a production-friendly way:
- Create a minimal manual
llms.txt - Move to Python auto-generation for long-term maintenance
- Validate the file after deployment
What is llms.txt
Think of llms.txt as a focused navigation file for AI-oriented content discovery.
It is usually placed at your site root, for example:
https://www.bobobk.com/llms.txt
It does not replace sitemap.xml or robots.txt. Its practical role is:
- Highlight the sections and pages you want AI systems to prioritize
- Provide a concise, human-readable map of high-value URLs
- Reduce crawl path friction for AI retrievers
Method 1: Start with a minimal manual version
Keep it simple first.
Create static/llms.txt with content like this:
# Bobobk
> Practical tutorials on Python, Linux, SEO automation, and data tools.
## Core Sections
- Blog (CN): https://www.bobobk.com/
- Blog (EN): https://www.bobobk.com/en/
- Latest posts: https://www.bobobk.com/index.xml
## High-value guides
- https://www.bobobk.com/how-to-improve-index-speed-by-indexnow.html
- https://www.bobobk.com/python-wordpress-workflow.html
- https://www.bobobk.com/build_own_tron_wallet.html
- https://www.bobobk.com/build_own_solana_wallet.html
With Hugo, this will be published automatically at the site root.
Method 2: Auto-generate llms.txt with Python
Manual updates do not scale. The script below does three things:
- Read
public/index.xml - Select the latest posts by publish/update time
- Write a fresh
static/llms.txt
1. Environment
This implementation uses only Python standard libraries, so no extra package is required.
2. Script code
#!/usr/bin/env python3
from __future__ import annotations
from datetime import datetime
from email.utils import parsedate_to_datetime
from pathlib import Path
import xml.etree.ElementTree as ET
SITE_NAME = "Bobobk"
SITE_DESC = "Practical tutorials on Python, Linux, SEO automation, and data tools."
SITE_CN = "https://www.bobobk.com/"
SITE_EN = "https://www.bobobk.com/en/"
RSS_PATH = Path("public/index.xml")
RSS_URL = "https://www.bobobk.com/index.xml"
OUTPUT = Path("static/llms.txt")
TOP_N = 20
def parse_pub_date(value: str | None) -> datetime:
if not value:
return datetime.min
try:
return parsedate_to_datetime(value)
except Exception:
return datetime.min
def text_of(parent: ET.Element, tag: str, default: str = "") -> str:
node = parent.find(tag)
if node is None or node.text is None:
return default
return node.text.strip()
def read_items_from_rss(path: Path) -> list[dict[str, str | datetime]]:
if not path.exists():
return []
root = ET.parse(path).getroot()
channel = root.find("channel")
if channel is None:
return []
items: list[dict[str, str | datetime]] = []
for item in channel.findall("item"):
title = text_of(item, "title", "Untitled")
link = text_of(item, "link", "")
pub_date = parse_pub_date(text_of(item, "pubDate", ""))
if link:
items.append({"title": title, "link": link, "pub_date": pub_date})
items.sort(key=lambda x: x["pub_date"], reverse=True)
return items
def build_llms_text(items: list[dict[str, str | datetime]]) -> str:
lines = [
f"# {SITE_NAME}",
"",
f"> {SITE_DESC}",
"",
"## Core Sections",
f"- Blog (CN): {SITE_CN}",
f"- Blog (EN): {SITE_EN}",
f"- Latest posts: {RSS_URL}",
"",
"## Latest High-value Posts",
]
for item in items[:TOP_N]:
title = str(item["title"]).replace("\n", " ").strip()
link = str(item["link"]).strip()
lines.append(f"- {title}: {link}")
lines.append("")
return "\n".join(lines)
def main() -> None:
items = read_items_from_rss(RSS_PATH)
content = build_llms_text(items)
OUTPUT.parent.mkdir(parents=True, exist_ok=True)
OUTPUT.write_text(content, encoding="utf-8")
print(f"Generated {OUTPUT} with {min(len(items), TOP_N)} links")
if __name__ == "__main__":
main()
3. Run commands
# Build first to refresh public/index.xml
hugo --config hugotest.toml -d public/
# Generate llms.txt
python3 scripts/generate_llms_txt.py
# Build again to publish static/llms.txt at site root
hugo --config hugotest.toml -d public/
Validate after deployment
1. Check accessibility
curl -I https://www.bobobk.com/llms.txt
curl https://www.bobobk.com/llms.txt | head -n 30
Expected:
- HTTP status
200 - Latest article links appear in file content
2. Optional: submit with IndexNow
If you already use IndexNow, submit llms.txt together with your new URLs.
python indexnow.py "https://www.bobobk.com/llms.txt"
Troubleshooting
Issue 1: llms.txt returns 404
Most common cause: wrong file location.
Fix:
- Make sure file path is
static/llms.txt - Rebuild Hugo output
- Purge CDN cache if needed
Issue 2: links in llms.txt are outdated
Most common cause: generation order is wrong.
Fix:
- Build site first to refresh
public/index.xml - Run Python generator
- Build site again to publish latest
llms.txt
Issue 3: too many low-quality links
llms.txt is not better when longer.
Best practice:
- Keep core sections and high-value pages only
- Keep structure stable
- Avoid long redirect chains in listed URLs
Summary
For content sites in 2026, llms.txt is a practical upgrade with low implementation cost.
Ship a minimal manual file first, then move to Python auto-generation so your AI discovery entry stays in sync with new content. Combined with sitemap.xml and IndexNow, your indexing and AI citation workflow becomes much more predictable.
The most useful next step is to add this script to your publish pipeline so llms.txt updates automatically after each content release.
- 原文作者:春江暮客
- 原文链接:https://www.bobobk.com/en/llms-txt-for-ai-search.html
- 版权声明:本作品采用 知识共享署名-非商业性使用-禁止演绎 4.0 国际许可协议 进行许可,非商业转载请注明出处(作者,原文链接),商业转载请联系作者获得授权。
相关文章
- Build Your Own Solana Wallet Toolkit (Batch Address Generation / SOL and USDT Transfer)
- Build Your Own TRON Wallet Toolkit (Batch Address Generation / USDT Transfer / Staking & Voting)
- How to Improve Website Indexing Speed with IndexNow
- Python: Creating Beautiful Lollipop Charts
- Downloading m3u8 Streaming Media Using Python and Embedding into AMP Webpages