春江暮客

春江暮客的个人学习分享网站

2026 Webmaster Playbook: Automate llms.txt for AI Search with Python

2026-05-24 Technology

A clear shift is happening in 2026: more content discovery starts from AI search and answer systems, not only classic search results.

The practical problem is that many sites still face three issues:

  1. AI systems discover pages slowly
  2. They crawl pages but miss the key content
  3. New posts go live, but discovery entry files are not updated in time

If you run a content site, adding llms.txt is a low-cost and practical upgrade.

In this tutorial, we will do it in a production-friendly way:

  1. Create a minimal manual llms.txt
  2. Move to Python auto-generation for long-term maintenance
  3. Validate the file after deployment

What is llms.txt

Think of llms.txt as a focused navigation file for AI-oriented content discovery.

It is usually placed at your site root, for example:

  • https://www.bobobk.com/llms.txt

It does not replace sitemap.xml or robots.txt. Its practical role is:

  • Highlight the sections and pages you want AI systems to prioritize
  • Provide a concise, human-readable map of high-value URLs
  • Reduce crawl path friction for AI retrievers

Method 1: Start with a minimal manual version

Keep it simple first.

Create static/llms.txt with content like this:

# Bobobk

> Practical tutorials on Python, Linux, SEO automation, and data tools.

## Core Sections
- Blog (CN): https://www.bobobk.com/
- Blog (EN): https://www.bobobk.com/en/
- Latest posts: https://www.bobobk.com/index.xml

## High-value guides
- https://www.bobobk.com/how-to-improve-index-speed-by-indexnow.html
- https://www.bobobk.com/python-wordpress-workflow.html
- https://www.bobobk.com/build_own_tron_wallet.html
- https://www.bobobk.com/build_own_solana_wallet.html

With Hugo, this will be published automatically at the site root.

Method 2: Auto-generate llms.txt with Python

Manual updates do not scale. The script below does three things:

  1. Read public/index.xml
  2. Select the latest posts by publish/update time
  3. Write a fresh static/llms.txt

1. Environment

This implementation uses only Python standard libraries, so no extra package is required.

2. Script code

#!/usr/bin/env python3
from __future__ import annotations

from datetime import datetime
from email.utils import parsedate_to_datetime
from pathlib import Path
import xml.etree.ElementTree as ET

SITE_NAME = "Bobobk"
SITE_DESC = "Practical tutorials on Python, Linux, SEO automation, and data tools."
SITE_CN = "https://www.bobobk.com/"
SITE_EN = "https://www.bobobk.com/en/"
RSS_PATH = Path("public/index.xml")
RSS_URL = "https://www.bobobk.com/index.xml"
OUTPUT = Path("static/llms.txt")
TOP_N = 20


def parse_pub_date(value: str | None) -> datetime:
    if not value:
        return datetime.min
    try:
        return parsedate_to_datetime(value)
    except Exception:
        return datetime.min


def text_of(parent: ET.Element, tag: str, default: str = "") -> str:
    node = parent.find(tag)
    if node is None or node.text is None:
        return default
    return node.text.strip()


def read_items_from_rss(path: Path) -> list[dict[str, str | datetime]]:
    if not path.exists():
        return []

    root = ET.parse(path).getroot()
    channel = root.find("channel")
    if channel is None:
        return []

    items: list[dict[str, str | datetime]] = []
    for item in channel.findall("item"):
        title = text_of(item, "title", "Untitled")
        link = text_of(item, "link", "")
        pub_date = parse_pub_date(text_of(item, "pubDate", ""))
        if link:
            items.append({"title": title, "link": link, "pub_date": pub_date})

    items.sort(key=lambda x: x["pub_date"], reverse=True)
    return items


def build_llms_text(items: list[dict[str, str | datetime]]) -> str:
    lines = [
        f"# {SITE_NAME}",
        "",
        f"> {SITE_DESC}",
        "",
        "## Core Sections",
        f"- Blog (CN): {SITE_CN}",
        f"- Blog (EN): {SITE_EN}",
        f"- Latest posts: {RSS_URL}",
        "",
        "## Latest High-value Posts",
    ]

    for item in items[:TOP_N]:
        title = str(item["title"]).replace("\n", " ").strip()
        link = str(item["link"]).strip()
        lines.append(f"- {title}: {link}")

    lines.append("")
    return "\n".join(lines)


def main() -> None:
    items = read_items_from_rss(RSS_PATH)
    content = build_llms_text(items)

    OUTPUT.parent.mkdir(parents=True, exist_ok=True)
    OUTPUT.write_text(content, encoding="utf-8")

    print(f"Generated {OUTPUT} with {min(len(items), TOP_N)} links")


if __name__ == "__main__":
    main()

3. Run commands

# Build first to refresh public/index.xml
hugo --config hugotest.toml -d public/

# Generate llms.txt
python3 scripts/generate_llms_txt.py

# Build again to publish static/llms.txt at site root
hugo --config hugotest.toml -d public/

Validate after deployment

1. Check accessibility

curl -I https://www.bobobk.com/llms.txt
curl https://www.bobobk.com/llms.txt | head -n 30

Expected:

  • HTTP status 200
  • Latest article links appear in file content

2. Optional: submit with IndexNow

If you already use IndexNow, submit llms.txt together with your new URLs.

python indexnow.py "https://www.bobobk.com/llms.txt"

Troubleshooting

Issue 1: llms.txt returns 404

Most common cause: wrong file location.

Fix:

  1. Make sure file path is static/llms.txt
  2. Rebuild Hugo output
  3. Purge CDN cache if needed

Most common cause: generation order is wrong.

Fix:

  1. Build site first to refresh public/index.xml
  2. Run Python generator
  3. Build site again to publish latest llms.txt

llms.txt is not better when longer.

Best practice:

  • Keep core sections and high-value pages only
  • Keep structure stable
  • Avoid long redirect chains in listed URLs

Summary

For content sites in 2026, llms.txt is a practical upgrade with low implementation cost.

Ship a minimal manual file first, then move to Python auto-generation so your AI discovery entry stays in sync with new content. Combined with sitemap.xml and IndexNow, your indexing and AI citation workflow becomes much more predictable.

The most useful next step is to add this script to your publish pipeline so llms.txt updates automatically after each content release.

友情链接

其它