Web scraping is always more complicated than it looks at first glance. Between dynamic content, browser automation, and data extraction, the process often leads to brittle scripts and tangled code. Ferret aims to cut through that complexity with a declarative query language and a Go-native engine designed for embedding in your apps. Its recent v2 release introduces a clean architectural shift that balances backward compatibility with a modernized API — a challenge many mature projects face.
What ferret does and how it works
Ferret is a declarative system for web data extraction, querying, and structuring. It targets workflows like testing, analytics, and machine learning that require programmatic access to web page content. Instead of scripting browser interactions imperatively, you write queries in Ferret’s own declarative language to describe what data to extract or how to transform it.
Under the hood, Ferret abstracts away browser automation, DOM traversal, and dynamic page interaction. It supports running these queries against live web pages, handling JavaScript rendering and complex page states transparently.
The project is written entirely in Go, emphasizing portability, speed, and embeddability. This makes it a good fit if you want to integrate scraping capabilities directly into Go applications without relying on external browser drivers or heavyweight frameworks.
The v2 release is a significant architectural revision. It introduces a new internal engine architecture and a revamped public API. The core flow in the new API is:
Engine -> compile query -> create session -> run
This design encapsulates query compilation and execution in a way that optimizes session reuse and resource management.
What makes ferret v2’s architecture and API stand out
The standout aspect of Ferret v2 is its approach to evolving the API while supporting existing users. The v1 API was tightly coupled and less modular, which made extending or embedding it in complex applications harder.
With v2, the team rewrote the internal architecture to deliver a native Go API that is more idiomatic and flexible. This API lets you create an engine instance, compile queries into execution plans, and run those plans in sessions with explicit lifecycle management. This approach provides clearer separation of concerns and better resource control.
To ease migration, Ferret v2 introduces a compat module that exposes the old v1-style API on top of the new engine. This compatibility layer lets existing projects switch imports and get back to a working state without a full rewrite. Over time, developers can incrementally migrate to the new API.
This dual-API strategy is a pragmatic tradeoff. It prevents breaking users immediately while encouraging adoption of a cleaner and more maintainable API. However, it also means maintaining two sets of interfaces, which can increase complexity in the codebase and documentation.
Code quality in the v2 engine is solid, with clear modularization and straightforward error handling. The query language compiler and runtime are well-factored, enabling easy extension and debugging. The project is actively developed, and some APIs are still labeled alpha, so expect potential changes as the team iterates.
Getting started with ferret v2 in Go
The project provides a simple way to pull in v2 using Go modules:
go get github.com/MontFerret/ferret/v2@latest
For new projects, the recommended path is using the native v2 API. Here’s a minimal example that compiles and runs a trivial query:
package main
import (
"context"
"fmt"
"log"
"github.com/MontFerret/ferret/v2/pkg/engine"
)
func main() {
ctx := context.Background()
eng, err := engine.New()
if err != nil {
log.Fatal(err)
}
defer eng.Close()
plan, err := eng.Compile(`RETURN 1 + 1`)
if err != nil {
log.Fatal(err)
}
session, err := plan.NewSession()
if err != nil {
log.Fatal(err)
}
defer session.Close()
result, err := session.Run(ctx)
if err != nil {
log.Fatal(err)
}
fmt.Println(result.Content)
}
For existing v1 users, the compat module lets you switch imports to github.com/MontFerret/ferret/v2/compat and get a familiar API surface. This incremental migration path reduces pressure to refactor everything at once.
Who should consider using ferret
Ferret is a solid choice if you want a Go-native, embeddable engine for scraping and querying web pages declaratively. Its design fits well in scenarios where you want to build data extraction into Go apps without spinning up separate browser automation infrastructure.
The v2 release is actively evolving, so while it’s stable enough for experimentation and early adoption, expect some API churn and improvements before the 1.0 stable release.
If you’re coming from v1, the compatibility module offers a pragmatic way to keep working while gradually moving to the new architecture.
Limitations include the alpha status of v2 and the ongoing need to maintain two APIs during migration. Also, the complexity of scraping highly dynamic sites remains challenging, and Ferret’s declarative language has a learning curve.
Overall, Ferret v2 balances embeddability, performance, and API evolution thoughtfully. It’s worth understanding if your projects involve Go-based web data extraction and you want more control than typical scraping frameworks provide.
Related Articles
- Gin: a zero-allocation, high-performance Go web framework for REST APIs — Gin is a Go HTTP web framework known for its zero-allocation router and up to 40x faster performance. It balances speed
- Browser Harness: a self-healing LLM agent for browser automation via Chrome DevTools — Browser Harness enables LLMs to automate browsers by dynamically generating helper functions using the Chrome DevTools P
- PinchTab: Token-efficient Chrome automation for AI agents with Go — PinchTab is a Go HTTP server enabling AI agents to control Chrome instances efficiently by extracting structured text, c
- OpenAI Codex CLI: local-first AI coding assistant with ChatGPT integration — OpenAI Codex CLI brings AI coding assistance local to your terminal, integrating with ChatGPT plans for powerful hybrid
- Hatchet: durable background task orchestration with Go and Postgres — Hatchet offers a durable, fault-tolerant background task and workflow engine built with Go and Postgres. It supports com
→ GitHub Repo: MontFerret/ferret ⭐ 5,971 · Go