bopscrk is a Python CLI tool for targeted password wordlist generation, combining user input and scraped song lyrics with mutations. Useful in pentesting and red teaming.
linkedin_scraper is a Python library using Playwright and async/await for structured LinkedIn scraping with typed Pydantic models, session management, and progress callbacks.
Sherlock is a Python CLI tool that checks username availability across 400+ social networks using a modular JSON-driven detection system. Practical, extensible, and flexible.
Firecrawl Web Agent combines LangChain’s Deep Agents with Firecrawl’s web tools in a layered architecture from Next.js UI to raw API, enabling autonomous web research.
Open Deep Research is a TypeScript Next.js 16 app that uses an LLM to plan, execute, and iterate web research via Exa and Upstash QStash, producing sourced reports with images.
xcrawl-skills defines standardized AI agent skills with normalized inputs/outputs for web data tasks like scraping and crawling via the XCrawl API. It enables multi-agent orchestration with minimal integration.
open-researcher is a TypeScript app combining AI APIs and web scraping to assist research workflows. It offers an extensible setup and local dev server for experimentation.
goscrapy is a Go framework for web scraping that includes a CLI scaffolding tool. It requires Go 1.23+ and offers a minimal setup for building scraping projects.
A Tampermonkey userscript adds a download button for IKEA’s 3D models, hooking into their viewer to fetch GLB URLs and enable direct downloads with smart filenames.
ai-marketplace-monitor automates Facebook Marketplace searches using Python and Playwright, enabling personalized item monitoring with notifications. Legal constraints limit its use to hobbyist scenarios.
Otakuapuri is a Python Tkinter app combining manga download, reading, and anime streaming with Cloudflare-bypass scraping and multithreaded UI for responsiveness.
Apify MCP Server exposes 8,000+ web automation tools as MCP tools to AI agents, featuring agentic payments allowing autonomous crypto payments for tool execution. Supports HTTPS and local modes.
A Python Playwright scraper automates Google Maps data extraction using XPath selectors. It reveals the real maintenance cost of brittle DOM scraping and dependency pinning.
AIHawk offers an open-source AI agent that automates job applications. Its architecture balances open AI automation with the legal realities of third-party integrations.
AutoScraper automates web scraping by learning extraction rules from sample data, avoiding manual CSS selectors. This Python tool eases scraping repetitive, similar web content.
A comprehensive, multi-language curated list of web scraping tools, services, and resources that acts as a vital reference for developers building scraping infrastructure.
Colly is a Go web scraping framework offering fast, concurrent crawlers with a clean API. It handles cookies, sessions, delays, and caching, suited for data mining and archiving.
Crawlee Python offers a dual approach to web scraping with lightweight HTML parsing and headless browser automation, balancing speed and interactivity for diverse scraping needs.