site-md solves a problem many developers face when optimizing web content for AI agents and crawlers: providing clean Markdown versions of pages while still serving rich HTML to human visitors. The clever part is how it uses Next.js middleware to intercept requests, fetch the HTML internally, and convert it to Markdown dynamically — no need for separate Markdown sources or content duplication.
How site-md serves content for AI agents and humans
site-md is a middleware package for Next.js applications that automatically detects requests from AI agents, crawlers, or large language models (LLMs) and serves them clean Markdown versions of your pages. Meanwhile, regular users receive the standard HTML pages.
Under the hood, it hooks into the Next.js middleware layer to intercept requests matching certain patterns — for example, URLs ending with .md, requests with an Accept: text/markdown header, query parameters, or known bot user agents. When such a request is detected, site-md internally performs a self-fetch of the HTML version of the page, bypassing the middleware to avoid recursion.
After fetching the HTML, it converts the content to Markdown on-the-fly using an internal HTML-to-Markdown converter. This means your site only maintains a single source of truth (the HTML pages), but can expose a Markdown version for agents that prefer or require it.
The package also generates /llms.txt and /llms-full.txt index files to aid LLM discovery, following the emerging standard for machine-readable site indexes.
The stack is TypeScript-based and built for Next.js 13+ environments, using the App Router convention. It leverages Next.js middleware API, API route handlers, and config wrappers to integrate cleanly into your app.
The technical strength: middleware interception and zero duplication
What sets site-md apart is its method of serving Markdown without duplicating content or forcing developers to maintain separate Markdown files.
Many tools that offer Markdown output require you to write Markdown separately or maintain a separate content pipeline. site-md avoids this by using Next.js middleware to intercept requests and perform an internal self-fetch of the original HTML. This self-fetch uses special bypass headers to prevent infinite loops and middleware re-execution.
Once it has the HTML, the package converts it to Markdown dynamically. This on-the-fly conversion means no extra build steps or content regeneration are needed, which reduces maintenance overhead.
The middleware component uses an AST merge approach when injecting itself into existing middleware.ts files. This means it can add its logic without overwriting your custom middleware or matchers, preserving your existing routing logic.
The tradeoff here is performance: the internal self-fetch adds latency because the server must handle a second request per Markdown request. However, since requests for Markdown are typically from bots or AI agents, this overhead is manageable and isolated from the human user experience.
The codebase is surprisingly clean and modular. It separates concerns well between middleware proxying, API route handling for the Markdown endpoints, and configuration wrapping for Next.js. The use of TypeScript adds reliability and clarity.
Quick start: one command install and usage
Installing site-md is straightforward and can be done with a single command:
npx site-md
This command detects your package manager (npm, pnpm, yarn, or bun) and your src/ directory layout. It then installs the site-md package and wires up everything automatically:
- Creates or AST-merges into
middleware.tsorsrc/middleware.tsto add the proxy middleware. - Creates the API route handler at
app/api/site-md/[...path]/route.ts. - Wraps your existing
next.config.{ts,mjs,js,cjs}withwithNextMdto enable/llms.txtand/llms-full.txtgeneration.
After installation, restart your Next.js dev server.
You can test the functionality with commands like:
curl http://localhost:3000/ # Returns HTML
curl http://localhost:3000/index.md # Returns Markdown
curl http://localhost:3000/llms.txt # Returns Markdown site index
For CI or automated scripts, a non-interactive mode is available:
npx site-md --title "My Site" --description "Public docs for AI agents" --yes
If you prefer a manual setup, the CLI outputs three key files:
middleware.ts
export { proxy as middleware } from "site-md/proxy";
export const config = {
matcher: [
"/((?!api|_next|static|favicon.ico|.*\\.(?:js|css|json|xml|txt|map|webmanifest|png|jpg|jpeg|gif|svg|ico|woff|woff2|ttf|eot)$).*)",
],
};
API route handler:
export { GET } from "site-md/handler";
next.config.mjs (optional)
import { withNextMd } from "site-md/config";
export default withNextMd(
{
/* your existing config */
},
{
llmsTxt: {
title: "My Site",
description: "Public docs for AI agents",
},
},
);
Note that Next.js App Router excludes folders starting with an underscore from routing, so avoid such folder names for the API route.
Verdict
site-md is a pragmatic solution for Next.js projects wanting to serve Markdown versions of pages to AI agents and crawlers without the overhead of managing separate Markdown content. Its approach of middleware interception and dynamic HTML-to-Markdown conversion is elegant and lowers maintenance costs.
The main limitation is the added latency from internal self-fetching, which could be a bottleneck under very high volumes of Markdown requests. Also, the package is specifically designed for Next.js environments using the App Router, so it’s not applicable for other frameworks or older Next.js versions.
For teams focused on improving AI agent access to their content, or who want to adopt the emerging /llms.txt standard, site-md offers a clean, low-friction way to add this capability. It’s also a good example of creative middleware use to extend Next.js.
If you maintain a Next.js site and want to provide Markdown-friendly endpoints without rewriting your content pipeline, site-md is worth trying out.
Related Articles
- Qwen Code: A multi-provider terminal AI coding agent with unified config abstraction — Qwen Code is a TypeScript terminal AI coding agent that abstracts multiple LLM providers behind a unified config, enabli
- Context7: injecting real-time, version-specific docs into LLM workflows — Context7 tackles LLM hallucinations by injecting up-to-date, version-specific library docs directly into AI coding agent
- LLM-driven browser automation with Browser-Use: a hands-on look — Browser-Use is a Python library enabling LLM-powered AI agents to automate browsers efficiently. It features a custom Ch
- Awesome LLM Apps: a practical collection of runnable AI agent and RAG templates — Awesome LLM Apps offers 100+ runnable AI agent and RAG templates for quick LLM app development. It supports multiple pro
- AgentGPT: building autonomous AI agents with a full-stack web platform — AgentGPT offers a full-stack solution to deploy autonomous AI agents in the browser using Next.js, FastAPI, and Langchai
→ GitHub Repo: yazinsai/site-md ⭐ 48 · TypeScript