Noureddine RAMDI / site-md: Next.js middleware serving Markdown to AI agents without content duplication

Created Tue, 05 May 2026 13:37:39 +0000 Modified Sat, 23 May 2026 20:41:27 +0000

yazinsai/site-md

site-md solves a problem many developers face when optimizing web content for AI agents and crawlers: providing clean Markdown versions of pages while still serving rich HTML to human visitors. The clever part is how it uses Next.js middleware to intercept requests, fetch the HTML internally, and convert it to Markdown dynamically — no need for separate Markdown sources or content duplication.

How site-md serves content for AI agents and humans

site-md is a middleware package for Next.js applications that automatically detects requests from AI agents, crawlers, or large language models (LLMs) and serves them clean Markdown versions of your pages. Meanwhile, regular users receive the standard HTML pages.

Under the hood, it hooks into the Next.js middleware layer to intercept requests matching certain patterns — for example, URLs ending with .md, requests with an Accept: text/markdown header, query parameters, or known bot user agents. When such a request is detected, site-md internally performs a self-fetch of the HTML version of the page, bypassing the middleware to avoid recursion.

After fetching the HTML, it converts the content to Markdown on-the-fly using an internal HTML-to-Markdown converter. This means your site only maintains a single source of truth (the HTML pages), but can expose a Markdown version for agents that prefer or require it.

The package also generates /llms.txt and /llms-full.txt index files to aid LLM discovery, following the emerging standard for machine-readable site indexes.

The stack is TypeScript-based and built for Next.js 13+ environments, using the App Router convention. It leverages Next.js middleware API, API route handlers, and config wrappers to integrate cleanly into your app.

The technical strength: middleware interception and zero duplication

What sets site-md apart is its method of serving Markdown without duplicating content or forcing developers to maintain separate Markdown files.

Many tools that offer Markdown output require you to write Markdown separately or maintain a separate content pipeline. site-md avoids this by using Next.js middleware to intercept requests and perform an internal self-fetch of the original HTML. This self-fetch uses special bypass headers to prevent infinite loops and middleware re-execution.

Once it has the HTML, the package converts it to Markdown dynamically. This on-the-fly conversion means no extra build steps or content regeneration are needed, which reduces maintenance overhead.

The middleware component uses an AST merge approach when injecting itself into existing middleware.ts files. This means it can add its logic without overwriting your custom middleware or matchers, preserving your existing routing logic.

The tradeoff here is performance: the internal self-fetch adds latency because the server must handle a second request per Markdown request. However, since requests for Markdown are typically from bots or AI agents, this overhead is manageable and isolated from the human user experience.

The codebase is surprisingly clean and modular. It separates concerns well between middleware proxying, API route handling for the Markdown endpoints, and configuration wrapping for Next.js. The use of TypeScript adds reliability and clarity.

Quick start: one command install and usage

Installing site-md is straightforward and can be done with a single command:

npx site-md

This command detects your package manager (npm, pnpm, yarn, or bun) and your src/ directory layout. It then installs the site-md package and wires up everything automatically:

  • Creates or AST-merges into middleware.ts or src/middleware.ts to add the proxy middleware.
  • Creates the API route handler at app/api/site-md/[...path]/route.ts.
  • Wraps your existing next.config.{ts,mjs,js,cjs} with withNextMd to enable /llms.txt and /llms-full.txt generation.

After installation, restart your Next.js dev server.

You can test the functionality with commands like:

curl http://localhost:3000/               # Returns HTML
curl http://localhost:3000/index.md       # Returns Markdown
curl http://localhost:3000/llms.txt       # Returns Markdown site index

For CI or automated scripts, a non-interactive mode is available:

npx site-md --title "My Site" --description "Public docs for AI agents" --yes

If you prefer a manual setup, the CLI outputs three key files:

middleware.ts

export { proxy as middleware } from "site-md/proxy";

export const config = {
  matcher: [
    "/((?!api|_next|static|favicon.ico|.*\\.(?:js|css|json|xml|txt|map|webmanifest|png|jpg|jpeg|gif|svg|ico|woff|woff2|ttf|eot)$).*)",
  ],
};

API route handler:

export { GET } from "site-md/handler";

next.config.mjs (optional)

import { withNextMd } from "site-md/config";

export default withNextMd(
  {
    /* your existing config */
  },
  {
    llmsTxt: {
      title: "My Site",
      description: "Public docs for AI agents",
    },
  },
);

Note that Next.js App Router excludes folders starting with an underscore from routing, so avoid such folder names for the API route.

Verdict

site-md is a pragmatic solution for Next.js projects wanting to serve Markdown versions of pages to AI agents and crawlers without the overhead of managing separate Markdown content. Its approach of middleware interception and dynamic HTML-to-Markdown conversion is elegant and lowers maintenance costs.

The main limitation is the added latency from internal self-fetching, which could be a bottleneck under very high volumes of Markdown requests. Also, the package is specifically designed for Next.js environments using the App Router, so it’s not applicable for other frameworks or older Next.js versions.

For teams focused on improving AI agent access to their content, or who want to adopt the emerging /llms.txt standard, site-md offers a clean, low-friction way to add this capability. It’s also a good example of creative middleware use to extend Next.js.

If you maintain a Next.js site and want to provide Markdown-friendly endpoints without rewriting your content pipeline, site-md is worth trying out.


→ GitHub Repo: yazinsai/site-md ⭐ 48 · TypeScript