On-page SEO auditing often involves analyzing hundreds of pages with dozens of metrics, which can lead to long-running audits and a clunky user experience if progress isn’t surfaced live. This repo tackles that pain point head-on by streaming audit progress in real time via Server-Sent Events (SSE), making it a solid example of how to handle long-running background tasks with live UI updates.
What on-page-seo offers and how it’s built
This project is a full-stack TypeScript application designed for on-page SEO auditing. It’s structured as a monorepo that cleanly separates a React 19 frontend and an Express backend, both written in TypeScript. The frontend leverages modern tools like TanStack Router for routing, React Query for data fetching and caching, and Shadcn/ui components for UI consistency and speed. On the backend, Express handles API routes, orchestrates auditing jobs, and persists data in a SQLite database located in the data/ directory.
One key architectural detail is the use of a shared/ directory containing TypeScript types that are shared between client and server, ensuring type safety across the stack without duplicating types. This is a common, recommended pattern in modern full-stack TypeScript apps to reduce bugs and improve DX.
The app supports two modes: development, where client and server run on separate ports (3005 and 3001 respectively), and production, where a single server serves the built React client, simplifying deployment.
Under the hood, the backend integrates with two external APIs to handle the heavy lifting of SEO auditing: Firecrawl for page discovery (crawling the site to find pages) and DataForSEO for detailed analysis covering 74 SEO metrics. This division of responsibility means the app focuses on orchestrating audits and presenting results rather than building crawling or SEO analysis from scratch.
What makes this repo technically interesting
The standout technical feature is the real-time progress updates delivered via a Server-Sent Events (SSE) endpoint at /api/audits/:id/progress. SSE is a lightweight protocol that allows the server to push events to the client over a single HTTP connection without the complexity of WebSockets. This pattern is ideal for streaming audit progress updates as pages are analyzed, providing a smooth, live UI experience even for audits that handle up to 500 pages simultaneously.
Implementing SSE properly involves managing connection lifecycle, retries, and ensuring events are sent in a consumable format, which this repo handles in the backend route implementations. On the frontend, React Query and TanStack Router work together to keep UI state consistent and updated as events stream in.
Another strength is the monorepo TypeScript setup with shared types, which reduces friction between frontend and backend teams or codebases. It enforces contract correctness at compile time, preventing common bugs that arise from mismatched API expectations.
The integration with Firecrawl and DataForSEO offloads the complexity of crawling and SEO metric computation, but this also introduces external dependencies and potential rate limits or costs. The app mitigates this by caching results in SQLite and structuring audit jobs efficiently.
The UI uses Shadcn/ui components, which provide a clean, consistent design system with minimal custom styling. This choice balances developer speed and UI quality without heavy CSS frameworks.
Tradeoffs include the reliance on SSE, which is simpler than WebSockets but less flexible for bidirectional communication, and the use of SQLite which is lightweight but might become a bottleneck under heavy concurrent audit loads. However, these choices fit well for a tool focused on audit orchestration rather than large-scale distributed crawling.
Quick start with on-page-seo
The README provides clear, step-by-step commands to get the project running locally. Here’s the exact sequence to start developing:
# 1. Clone the repository
git clone https://github.com/AgriciDaniel/on-page-seo.git
cd on-page-seo
# 2. Install dependencies
npm run install:all
# 3. Configure environment
cp .env.example .env
# Edit .env with your Firecrawl API key and DataForSEO account details
# 4. Start development servers
npm run dev
This starts both the client on http://localhost:3005 and the server on http://localhost:3001. The dual server setup in development allows quick iteration and debugging.
You can also run client or server individually with npm run dev:client or npm run dev:server.
Building and running production mode is supported with:
npm run build
npm run start
which compiles both client and server and serves the frontend from the backend server.
Who should consider using on-page-seo?
This project is relevant for developers or small teams looking to build or extend a full-stack SEO auditing tool with real-time feedback on audit progress. It’s especially useful if you want a TypeScript monorepo architecture that shares types across front and backends and appreciate the pattern of streaming long-running task progress with SSE.
The repo’s reliance on external SEO APIs means it’s not a standalone crawler or SEO analyzer—it depends on Firecrawl and DataForSEO accounts, which may incur cost or rate limits. Also, while SQLite is fine for moderate workloads, scaling beyond a few hundred concurrent audits might require swapping to a more robust database.
The SSE pattern here is a practical example worth adopting in any app dealing with long-running jobs and UI progress indicators. It’s simpler and easier to maintain than WebSockets for one-way streaming.
Overall, this codebase is a solid foundation for an on-page SEO audit tool that balances modern frontend stack choices with a pragmatic backend and clear developer experience. It’s worth studying the SSE implementation and the API integration pattern even if you’re building a different type of long-running task manager.
Related Articles
- Crawlee: a TypeScript library for stealthy web scraping and browser automation — Crawlee is a TypeScript library for web scraping and browser automation with human-like stealth. Supports Playwright, Pu
- Headless Chrome Crawler: Simplifying Dynamic Web Scraping with Puppeteer — Headless Chrome Crawler offers a high-level API on Puppeteer for scraping dynamic JS-heavy websites with concurrency, ca
- Jan: a local-first desktop app for large language models with Tauri and Rust — Jan is an open-source desktop app that runs large language models locally using Tauri, Node.js, and Rust. It offers priv
- Colly: high-performance web scraping in Go with concurrency and ease — Colly is a Go web scraping framework offering fast, concurrent crawlers with a clean API. It handles cookies, sessions,
- Ferret v2: A declarative Go engine for web data extraction with a new API architecture — Ferret v2 is a Go-based declarative system for web scraping that introduces a native Go API and a compatibility layer to
→ GitHub Repo: AgriciDaniel/on-page-seo ⭐ 88 · TypeScript