MD-This-Page solves a common pain point: extracting meaningful content from cluttered web pages and converting it into a format that large language models (LLMs) can digest efficiently.
What MD-This-Page does and how it works
MD-This-Page is a Chrome extension built with the Plasmo framework and React. Its primary function is to transform any webpage you visit into clean, well-structured Markdown with a single click. This Markdown output is specifically tailored to be “LLM-ready,” meaning it strips away unnecessary clutter like navigation menus, ads, and boilerplate content to preserve only the main article or content body.
Under the hood, the extension relies on two main libraries for its extraction pipeline. First, it uses Mozilla’s Readability library to identify and isolate the core content of the webpage. Readability parses the DOM and heuristically removes extraneous elements, leaving behind just the main article or relevant text.
Once Readability extracts the cleaned HTML, MD-This-Page converts this HTML into Markdown using Turndown, a well-known HTML-to-Markdown converter. This two-step process—content extraction followed by format conversion—is a clean pattern that effectively prepares web content for AI workflows where Markdown is preferred for its simplicity and readability.
The extension also offers customizable output options, letting users toggle whether to include images, links, and metadata in the exported Markdown. Export methods include copying the Markdown to the clipboard, saving it as a .md file, or generating a prompt-format version optimized for feeding into LLMs.
MD-This-Page is built as a Manifest V3 Chrome extension, reflecting the latest Chrome extension standards and security requirements. Styling is handled with Tailwind CSS, which keeps the UI simple and responsive.
Technical strengths and tradeoffs
The standout technical feature of MD-This-Page is the two-stage extraction pipeline combining Mozilla Readability and Turndown. This approach ensures that the content is not only clean but also semantically structured in Markdown, which is far easier for LLMs to parse compared to raw HTML.
The decision to build the extension with Plasmo and React offers a modern developer experience. Plasmo streamlines building Manifest V3 extensions with React, providing hot reloading, easy packaging, and a well-structured project setup. The codebase is surprisingly clean, with a clear separation between content extraction logic and UI components.
However, Manifest V3 introduces some constraints, such as stricter permissions and background service worker limitations, which can complicate extension behavior and performance. MD-This-Page handles these gracefully but it’s a tradeoff developers should understand when building similar extensions.
Customization options for output give the extension versatility but also add complexity to the UI and code paths. The code manages this well but there’s a balance between configurability and simplicity—users who want a no-frills experience may find toggles distracting.
One limitation is that content extraction relies heavily on Readability’s heuristics, which, while effective for many articles, may occasionally misidentify the main content or omit important context. This is a common challenge with any automated content extraction.
Getting started with MD-This-Page
This extension is built with Plasmo and React.
Prerequisites
- Node.js
- pnpm (or npm, yarn)
Installation & Development
Clone the repository and navigate to the project directory:
cd md-this-pageInstall dependencies:
pnpm installRun the development server:
pnpm devThis will run the Plasmo dev server and generate a
build/chrome-mv3-devdirectory.Load the extension in Chrome:
- Go to
chrome://extensions/ - Enable Developer mode
- Click Load unpacked
- Select the
build/chrome-mv3-devdirectory from this project.
- Go to
Building for Production
To create a production build of the extension:
pnpm build
This will output the production-ready extension into build/chrome-mv3-prod.
verdict: who should consider MD-This-Page?
MD-This-Page is a practical tool for developers and AI practitioners who frequently feed web content into language models and want that content in a clean Markdown format. Its approach to content extraction and conversion is straightforward and effective for many common article-style pages.
That said, it’s not a silver bullet for all types of web content. Pages with complex layouts, dynamic content, or non-article formats may see less reliable extraction results due to the reliance on Readability heuristics.
If you build or maintain AI tools that require efficient ingestion of web data, or if you want a quick way to generate Markdown from webpages without manual cleanup, this extension is worth a look. It also serves as a solid example of how to combine existing libraries into a cohesive, user-friendly developer tool.
For browser extension developers, the repo is useful as a hands-on reference for Manifest V3 development with Plasmo and React, showcasing a clean architecture and thoughtful UX design.
Overall, MD-This-Page fills a real need with a practical, well-engineered solution that balances functionality and developer experience.
Related Articles
- Context7: injecting real-time, version-specific docs into LLM workflows — Context7 tackles LLM hallucinations by injecting up-to-date, version-specific library docs directly into AI coding agent
- Browser Harness: a self-healing LLM agent for browser automation via Chrome DevTools — Browser Harness enables LLMs to automate browsers by dynamically generating helper functions using the Chrome DevTools P
- TYPO3 as a Content Management Framework: Architecture and Enterprise Flexibility Explained — TYPO3 is a PHP-based Content Management Framework emphasizing modularity and extensibility through a slim core and open
- LLM-driven browser automation with Browser-Use: a hands-on look — Browser-Use is a Python library enabling LLM-powered AI agents to automate browsers efficiently. It features a custom Ch
- How nix.dev documents the Nix ecosystem with MyST-enhanced Markdown — nix.dev uses MyST, a CommonMark superset, to deliver structured, community-driven documentation for the Nix ecosystem, b
→ GitHub Repo: Ademking/MD-This-Page ⭐ 636 · TypeScript