llmstxt_architect: automated generation and maintenance of llms.txt files for LLM-aware websites

LLMsTxt Architect tackles a practical problem: how to keep an llms.txt file up to date for a website without losing the manual structure and organization that teams often painstakingly build. The llms.txt format is an emerging standard designed to communicate website content summaries to large language models (LLMs), helping them understand what pages contain at a glance.

What llmstxt_architect does and how it works

This project is a Python CLI tool and library that automates the crawling, summarization, and generation of llms.txt files. At its core, it uses LangChain’s RecursiveURLLoader to crawl provided URLs, respecting a configurable depth to gather related pages. Then, it leverages an LLM of your choice — including Anthropic’s Claude, OpenAI models, Ollama local models, or any LangChain-compatible provider — to generate concise, clear descriptions for each page.

The tool supports two main workflows: generating an llms.txt file from scratch starting with a list of URLs, or ingesting an existing llms.txt to update only the descriptions while preserving the file’s structure, headers, and URL ordering. This latter mode, enabled by the --update-descriptions-only flag, stands out as a practical feature addressing the real-world need to maintain hand-curated files without overwriting organizational context.

Architecturally, the tool is implemented in Python and can be used both as a CLI and as a Python library, making it flexible for integration into larger pipelines or automation scripts. It ships as a zero-install executable via uvx which simplifies getting started without local environment setup, but also supports traditional pip installation for more conventional development workflows.

What makes llmstxt_architect interesting: multi-provider LLM support and description-only updates

A key technical strength is its multi-provider LLM integration. By supporting Anthropic, OpenAI, Ollama, and any LangChain-compatible LLM, the tool offers flexibility depending on your infrastructure and privacy requirements. For instance, Ollama support means you can run local models without external API calls, which is important for privacy-sensitive projects or those with limited internet connectivity.

The use of LangChain’s RecursiveURLLoader under the hood is another solid choice. It provides a standard, battle-tested method of crawling web content recursively with control over depth, making the tool adaptable to sites of varying complexity.

The --update-descriptions-only flag deserves special mention. Many tools in this space simply regenerate the entire llms.txt file, which can be frustrating if your file contains carefully crafted headers, comments, or a specific URL order that you want to keep. This tool instead surgically updates only the descriptions using the LLM, preserving the original structure. This design choice shows a deep understanding of how teams actually maintain these files over time.

The codebase, while not deeply analyzed here, is described as a Python CLI and library that integrates cleanly with uvx for zero-install usage — a nice developer experience (DX) touch. This means you can run it easily without dealing with Python environment setup, but still have the option to install and import it programmatically.

Quick start

The repo’s README provides clear, copy-paste commands for getting started with various LLM providers.

For example, to run it with Anthropic’s Claude model (assuming you have ANTHROPIC_API_KEY set):

$ curl -LsSf https://astral.sh/uv/install.sh | sh
$ uvx --from llmstxt-architect llmstxt-architect --urls https://langchain-ai.github.io/langgraph/concepts --max-depth 1 --llm-name claude-3-7-sonnet-latest --llm-provider anthropic --project-dir tmp

This command crawls the LangGraph concepts page with depth 1 and generates descriptions using Claude.

To use a local model via Ollama:

$ ollama pull llama3.2:latest
$ uvx --from llmstxt-architect llmstxt-architect --urls https://langchain-ai.github.io/langgraph/concepts --max-depth 1 --llm-name llama3.2:latest --llm-provider ollama --project-dir tmp

This pulls the latest Llama 3.2 model with Ollama and runs the same crawl and description generation locally.

The tool outputs a structured llms.txt file with entries like:

[Concepts](https://langchain-ai.github.io/langgraph/concepts): LLM should read this page when seeking to understand LangGraph framework concepts, exploring agent patterns, or learning about LangGraph Platform deployment options. The page covers key concepts including LangGraph basics, agentic patterns, multi-agent systems, memory, persistence, streaming, and various LangGraph Platform deployment options (Self-Hosted, Cloud SaaS, BYOC).

There is also traditional installation with pip for use as a CLI or in Python scripts:

$ python3 -m venv .venv
$ source .venv/bin/activate  # On Windows: .venv\Scripts\activate
$ pip install llmstxt-architect
$ llmstxt-architect --urls https://langchain-ai.github.io/langgraph/concepts --max-depth 1 --llm-name claude-3-7-sonnet-latest --llm-provider anthropic --project-dir test

And you can import and use it programmatically in a notebook or Python app.

Verdict

llmstxt_architect fills a niche need for teams adopting the llms.txt standard to communicate website content to LLMs. Its ability to crawl URLs, summarize page content, and generate a well-structured llms.txt file is valuable, but the real practical win is in maintaining existing files without losing manual structure.

The multi-provider LLM support is a solid engineering choice that broadens applicability from cloud APIs to local models.

That said, this tool assumes some familiarity with LangChain and LLM provider setup, which may present a learning curve for newcomers. The recursive crawling depth and LLM configuration require tuning based on your website complexity and desired output granularity.

In production, the --update-descriptions-only mode makes this a maintenance tool rather than just a one-off generator — a thoughtful design that acknowledges real workflows.

If your team is adopting llms.txt and wants a reliable way to generate and keep it up to date while preserving your file’s organization, llmstxt_architect is worth exploring. Its Python codebase and CLI flexibility also mean you can integrate it into broader automation pipelines for continuous content summarization and LLM communication.

llm-wikid: agent-agnostic AI knowledge base with schema-driven compilation for Obsidian — llm-wikid uses a CLAUDE.md schema to control a multi-phase ingest pipeline compiling markdown wiki pages for Obsidian, o
LLM-driven browser automation with Browser-Use: a hands-on look — Browser-Use is a Python library enabling LLM-powered AI agents to automate browsers efficiently. It features a custom Ch
llm-wiki: orchestrating multi-agent LLM research into persistent knowledge bases — llm-wiki is a shell-based orchestration layer that turns LLM agents into a persistent, multi-agent research wiki. Suppor
Mapping the LLM agent landscape with the awesome-llm-agents curated catalog — A curated catalog of 20+ LLM agent frameworks and tools organized by agent type and capabilities. Understand architectur
LycheeMemory: a lightweight semantic long-term memory framework for LLM agents — LycheeMemory offers a lightweight semantic memory system for LLM agents, cutting token use by 71% and costs by 55% compa

→ GitHub Repo: rlancemartin/llmstxt_architect ⭐ 245 · Python

Noureddine RAMDI / llmstxt_architect: automated generation and maintenance of llms.txt files for LLM-aware websites

What llmstxt_architect does and how it works

What makes llmstxt_architect interesting: multi-provider LLM support and description-only updates

Quick start

Verdict

Related Articles