<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Web-Scraping on Noureddine RAMDI</title><link>https://ramdi.fr/tags/web-scraping/</link><description>Recent content in Web-Scraping on Noureddine RAMDI</description><generator>Hugo</generator><language>en</language><lastBuildDate>Sat, 23 May 2026 20:41:27 +0000</lastBuildDate><atom:link href="https://ramdi.fr/tags/web-scraping/index.xml" rel="self" type="application/rss+xml"/><item><title>bopscrk: targeted password wordlist generation with lyric-based OSINT</title><link>https://ramdi.fr/github-stars/bopscrk-targeted-password-wordlist-generation-with-lyric-based-osint/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/bopscrk-targeted-password-wordlist-generation-with-lyric-based-osint/</guid><description>bopscrk is a Python CLI tool for targeted password wordlist generation, combining user input and scraped song lyrics with mutations. Useful in pentesting and red teaming.</description></item><item><title>linkedin_scraper: async Playwright-powered LinkedIn scraping with typed data models</title><link>https://ramdi.fr/github-stars/linkedin-scraper-async-playwright-powered-linkedin-scraping-with-typed-data-models/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/linkedin-scraper-async-playwright-powered-linkedin-scraping-with-typed-data-models/</guid><description>linkedin_scraper is a Python library using Playwright and async/await for structured LinkedIn scraping with typed Pydantic models, session management, and progress callbacks.</description></item><item><title>Sherlock: A modular Python CLI tool for username reconnaissance across 400+ social networks</title><link>https://ramdi.fr/github-stars/sherlock-a-modular-python-cli-tool-for-username-reconnaissance-across-400-social-networks/</link><pubDate>Tue, 05 May 2026 18:13:32 +0000</pubDate><guid>https://ramdi.fr/github-stars/sherlock-a-modular-python-cli-tool-for-username-reconnaissance-across-400-social-networks/</guid><description>Sherlock is a Python CLI tool that checks username availability across 400+ social networks using a modular JSON-driven detection system. Practical, extensible, and flexible.</description></item><item><title>Exploring Firecrawl Web Agent: A layered autonomous web research agent built on LangChain Deep Agents</title><link>https://ramdi.fr/github-stars/exploring-firecrawl-web-agent-a-layered-autonomous-web-research-agent-built-on-langchain-deep-agents/</link><pubDate>Tue, 05 May 2026 16:46:42 +0000</pubDate><guid>https://ramdi.fr/github-stars/exploring-firecrawl-web-agent-a-layered-autonomous-web-research-agent-built-on-langchain-deep-agents/</guid><description>Firecrawl Web Agent combines LangChain&amp;rsquo;s Deep Agents with Firecrawl&amp;rsquo;s web tools in a layered architecture from Next.js UI to raw API, enabling autonomous web research.</description></item><item><title>Open Deep Research: A Next.js 16 agentic AI assistant for iterative web research</title><link>https://ramdi.fr/github-stars/open-deep-research-a-next-js-16-agentic-ai-assistant-for-iterative-web-research/</link><pubDate>Tue, 05 May 2026 16:46:42 +0000</pubDate><guid>https://ramdi.fr/github-stars/open-deep-research-a-next-js-16-agentic-ai-assistant-for-iterative-web-research/</guid><description>Open Deep Research is a TypeScript Next.js 16 app that uses an LLM to plan, execute, and iterate web research via Exa and Upstash QStash, producing sourced reports with images.</description></item><item><title>Standardizing AI agent workflows with xcrawl-skills for web data APIs</title><link>https://ramdi.fr/github-stars/standardizing-ai-agent-workflows-with-xcrawl-skills-for-web-data-apis/</link><pubDate>Tue, 05 May 2026 16:46:42 +0000</pubDate><guid>https://ramdi.fr/github-stars/standardizing-ai-agent-workflows-with-xcrawl-skills-for-web-data-apis/</guid><description>xcrawl-skills defines standardized AI agent skills with normalized inputs/outputs for web data tasks like scraping and crawling via the XCrawl API. It enables multi-agent orchestration with minimal integration.</description></item><item><title>open-researcher: AI-powered web research assistant with integrated scraping and summarization</title><link>https://ramdi.fr/github-stars/open-researcher-ai-powered-web-research-assistant-with-integrated-scraping-and-summarization/</link><pubDate>Tue, 05 May 2026 13:37:39 +0000</pubDate><guid>https://ramdi.fr/github-stars/open-researcher-ai-powered-web-research-assistant-with-integrated-scraping-and-summarization/</guid><description>open-researcher is a TypeScript app combining AI APIs and web scraping to assist research workflows. It offers an extensible setup and local dev server for experimentation.</description></item><item><title>Fire Enrich: a sequential multi-agent pipeline for enriched company profiles from emails</title><link>https://ramdi.fr/github-stars/fire-enrich-a-sequential-multi-agent-pipeline-for-enriched-company-profiles-from-emails/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/fire-enrich-a-sequential-multi-agent-pipeline-for-enriched-company-profiles-from-emails/</guid><description>Fire Enrich orchestrates 5 specialized AI agents in sequence to enrich company profiles from email addresses, using Next.js, OpenAI, and Firecrawl.</description></item><item><title>goscrapy: a Go-based web scraping framework with CLI scaffolding</title><link>https://ramdi.fr/github-stars/goscrapy-a-go-based-web-scraping-framework-with-cli-scaffolding/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/goscrapy-a-go-based-web-scraping-framework-with-cli-scaffolding/</guid><description>goscrapy is a Go framework for web scraping that includes a CLI scaffolding tool. It requires Go 1.23+ and offers a minimal setup for building scraping projects.</description></item><item><title>How IKEA-3D-Model-Download-Button extracts and downloads GLB models from IKEA product pages</title><link>https://ramdi.fr/github-stars/how-ikea-3d-model-download-button-extracts-and-downloads-glb-models-from-ikea-product-pages/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/how-ikea-3d-model-download-button-extracts-and-downloads-glb-models-from-ikea-product-pages/</guid><description>A Tampermonkey userscript adds a download button for IKEA&amp;rsquo;s 3D models, hooking into their viewer to fetch GLB URLs and enable direct downloads with smart filenames.</description></item><item><title>Automating Facebook Marketplace searches with ai-marketplace-monitor</title><link>https://ramdi.fr/github-stars/automating-facebook-marketplace-searches-with-ai-marketplace-monitor/</link><pubDate>Mon, 04 May 2026 10:23:01 +0000</pubDate><guid>https://ramdi.fr/github-stars/automating-facebook-marketplace-searches-with-ai-marketplace-monitor/</guid><description>ai-marketplace-monitor automates Facebook Marketplace searches using Python and Playwright, enabling personalized item monitoring with notifications. Legal constraints limit its use to hobbyist scenarios.</description></item><item><title>Otakuapuri: a Python desktop app for manga and anime with Cloudflare-bypass scraping and responsive Tkinter UI</title><link>https://ramdi.fr/github-stars/otakuapuri-a-python-desktop-app-for-manga-and-anime-with-cloudflare-bypass-scraping-and-responsive-tkinter-ui/</link><pubDate>Mon, 04 May 2026 10:23:01 +0000</pubDate><guid>https://ramdi.fr/github-stars/otakuapuri-a-python-desktop-app-for-manga-and-anime-with-cloudflare-bypass-scraping-and-responsive-tkinter-ui/</guid><description>Otakuapuri is a Python Tkinter app combining manga download, reading, and anime streaming with Cloudflare-bypass scraping and multithreaded UI for responsiveness.</description></item><item><title>Apify MCP Server: Enabling autonomous AI agent payments for web automation tools</title><link>https://ramdi.fr/github-stars/apify-mcp-server-enabling-autonomous-ai-agent-payments-for-web-automation-tools/</link><pubDate>Mon, 04 May 2026 10:09:30 +0000</pubDate><guid>https://ramdi.fr/github-stars/apify-mcp-server-enabling-autonomous-ai-agent-payments-for-web-automation-tools/</guid><description>Apify MCP Server exposes 8,000+ web automation tools as MCP tools to AI agents, featuring agentic payments allowing autonomous crypto payments for tool execution. Supports HTTPS and local modes.</description></item><item><title>Google Maps Scraper: navigating the fragility of XPath-based browser automation</title><link>https://ramdi.fr/github-stars/google-maps-scraper-navigating-the-fragility-of-xpath-based-browser-automation/</link><pubDate>Mon, 04 May 2026 10:07:00 +0000</pubDate><guid>https://ramdi.fr/github-stars/google-maps-scraper-navigating-the-fragility-of-xpath-based-browser-automation/</guid><description>A Python Playwright scraper automates Google Maps data extraction using XPath selectors. It reveals the real maintenance cost of brittle DOM scraping and dependency pinning.</description></item><item><title>AIHawk: An open-source AI agent tackling automated job applications under copyright constraints</title><link>https://ramdi.fr/github-stars/aihawk-an-open-source-ai-agent-tackling-automated-job-applications-under-copyright-constraints/</link><pubDate>Sat, 02 May 2026 20:07:04 +0000</pubDate><guid>https://ramdi.fr/github-stars/aihawk-an-open-source-ai-agent-tackling-automated-job-applications-under-copyright-constraints/</guid><description>AIHawk offers an open-source AI agent that automates job applications. Its architecture balances open AI automation with the legal realities of third-party integrations.</description></item><item><title>AutoScraper: simplifying web scraping through example-driven rule learning</title><link>https://ramdi.fr/github-stars/autoscraper-simplifying-web-scraping-through-example-driven-rule-learning/</link><pubDate>Sat, 02 May 2026 20:07:04 +0000</pubDate><guid>https://ramdi.fr/github-stars/autoscraper-simplifying-web-scraping-through-example-driven-rule-learning/</guid><description>AutoScraper automates web scraping by learning extraction rules from sample data, avoiding manual CSS selectors. This Python tool eases scraping repetitive, similar web content.</description></item><item><title>awesome-web-scraping: a curated hub for web scraping tools and resources</title><link>https://ramdi.fr/github-stars/awesome-web-scraping-a-curated-hub-for-web-scraping-tools-and-resources/</link><pubDate>Sat, 02 May 2026 20:07:04 +0000</pubDate><guid>https://ramdi.fr/github-stars/awesome-web-scraping-a-curated-hub-for-web-scraping-tools-and-resources/</guid><description>A comprehensive, multi-language curated list of web scraping tools, services, and resources that acts as a vital reference for developers building scraping infrastructure.</description></item><item><title>Camoufox: a stealthy Firefox fork for AI agents and web scraping</title><link>https://ramdi.fr/github-stars/camoufox-a-stealthy-firefox-fork-for-ai-agents-and-web-scraping/</link><pubDate>Sat, 02 May 2026 20:07:04 +0000</pubDate><guid>https://ramdi.fr/github-stars/camoufox-a-stealthy-firefox-fork-for-ai-agents-and-web-scraping/</guid><description>Camoufox is a Firefox fork optimized for AI agents and web scraping with stealth fingerprint injection at the C++ level and Python API support.</description></item><item><title>Colly: high-performance web scraping in Go with concurrency and ease</title><link>https://ramdi.fr/github-stars/colly-high-performance-web-scraping-in-go-with-concurrency-and-ease/</link><pubDate>Sat, 02 May 2026 20:07:04 +0000</pubDate><guid>https://ramdi.fr/github-stars/colly-high-performance-web-scraping-in-go-with-concurrency-and-ease/</guid><description>Colly is a Go web scraping framework offering fast, concurrent crawlers with a clean API. It handles cookies, sessions, delays, and caching, suited for data mining and archiving.</description></item><item><title>Crawlee Python: a flexible dual-crawler framework for web scraping and automation</title><link>https://ramdi.fr/github-stars/crawlee-python-a-flexible-dual-crawler-framework-for-web-scraping-and-automation/</link><pubDate>Sat, 02 May 2026 20:07:04 +0000</pubDate><guid>https://ramdi.fr/github-stars/crawlee-python-a-flexible-dual-crawler-framework-for-web-scraping-and-automation/</guid><description>Crawlee Python offers a dual approach to web scraping with lightweight HTML parsing and headless browser automation, balancing speed and interactivity for diverse scraping needs.</description></item><item><title>Crawlee: a TypeScript library for stealthy web scraping and browser automation</title><link>https://ramdi.fr/github-stars/crawlee-a-typescript-library-for-stealthy-web-scraping-and-browser-automation/</link><pubDate>Sat, 02 May 2026 20:07:04 +0000</pubDate><guid>https://ramdi.fr/github-stars/crawlee-a-typescript-library-for-stealthy-web-scraping-and-browser-automation/</guid><description>Crawlee is a TypeScript library for web scraping and browser automation with human-like stealth. Supports Playwright, Puppeteer, proxy rotation, and persistent queues.</description></item><item><title>Ferret v2: A declarative Go engine for web data extraction with a new API architecture</title><link>https://ramdi.fr/github-stars/ferret-v2-a-declarative-go-engine-for-web-data-extraction-with-a-new-api-architecture/</link><pubDate>Sat, 02 May 2026 20:07:04 +0000</pubDate><guid>https://ramdi.fr/github-stars/ferret-v2-a-declarative-go-engine-for-web-data-extraction-with-a-new-api-architecture/</guid><description>Ferret v2 is a Go-based declarative system for web scraping that introduces a native Go API and a compatibility layer to ease migration from v1. It balances embeddability, speed, and API evolution.</description></item><item><title>Headless Chrome Crawler: Simplifying Dynamic Web Scraping with Puppeteer</title><link>https://ramdi.fr/github-stars/headless-chrome-crawler-simplifying-dynamic-web-scraping-with-puppeteer/</link><pubDate>Sat, 02 May 2026 20:07:04 +0000</pubDate><guid>https://ramdi.fr/github-stars/headless-chrome-crawler-simplifying-dynamic-web-scraping-with-puppeteer/</guid><description>Headless Chrome Crawler offers a high-level API on Puppeteer for scraping dynamic JS-heavy websites with concurrency, caching, and jQuery injection. Ideal for complex scraping tasks.</description></item><item><title>Maigret: A resilient OSINT username scraper across thousands of sites</title><link>https://ramdi.fr/github-stars/maigret-a-resilient-osint-username-scraper-across-thousands-of-sites/</link><pubDate>Sat, 02 May 2026 20:07:04 +0000</pubDate><guid>https://ramdi.fr/github-stars/maigret-a-resilient-osint-username-scraper-across-thousands-of-sites/</guid><description>Maigret is a Python-based OSINT tool that scrapes public profiles by username from 3,000+ sites without API keys. It features adaptive scraping, anti-blocking, and a web interface.</description></item><item><title>Pydoll: Async-native Chromium automation with typed extraction for web scraping</title><link>https://ramdi.fr/github-stars/pydoll-async-native-chromium-automation-with-typed-extraction-for-web-scraping/</link><pubDate>Sat, 02 May 2026 20:07:04 +0000</pubDate><guid>https://ramdi.fr/github-stars/pydoll-async-native-chromium-automation-with-typed-extraction-for-web-scraping/</guid><description>Pydoll is a Python library for Chromium automation using Chrome DevTools Protocol. It offers async-native APIs and Pydantic-powered data extraction for structured, validated scraping.</description></item><item><title>undetected-chromedriver: patching Selenium to evade anti-bot detection</title><link>https://ramdi.fr/github-stars/undetected-chromedriver-patching-selenium-to-evade-anti-bot-detection/</link><pubDate>Sat, 02 May 2026 20:07:04 +0000</pubDate><guid>https://ramdi.fr/github-stars/undetected-chromedriver-patching-selenium-to-evade-anti-bot-detection/</guid><description>undetected-chromedriver patches Selenium&amp;rsquo;s Chromedriver to bypass anti-bot defenses like Distill Network and DataDome. It supports Chrome beta and Chromium-based browsers with ease.</description></item><item><title>Scrapy: a modular Python framework for scalable web scraping</title><link>https://ramdi.fr/github-stars/scrapy-a-modular-python-framework-for-scalable-web-scraping/</link><pubDate>Sun, 26 Apr 2026 23:47:28 +0000</pubDate><guid>https://ramdi.fr/github-stars/scrapy-a-modular-python-framework-for-scalable-web-scraping/</guid><description>Scrapy is a Python framework designed for efficient and extensible web scraping, featuring a powerful selector system and item pipelines for data extraction and processing.</description></item><item><title>Requests-HTML: Pythonic web scraping with built-in JavaScript rendering</title><link>https://ramdi.fr/github-stars/requests-html-pythonic-web-scraping-with-built-in-javascript-rendering/</link><pubDate>Sun, 26 Apr 2026 17:51:11 +0000</pubDate><guid>https://ramdi.fr/github-stars/requests-html-pythonic-web-scraping-with-built-in-javascript-rendering/</guid><description>Requests-HTML extends Python&amp;rsquo;s Requests library with Chromium-based JavaScript rendering, CSS/XPath selectors, and async support for scraping dynamic web pages easily.</description></item><item><title>Scrapling: adaptive web scraping with AI integration for resilient data extraction</title><link>https://ramdi.fr/github-stars/scrapling-adaptive-web-scraping-with-ai-integration-for-resilient-data-extraction/</link><pubDate>Sun, 26 Apr 2026 17:51:11 +0000</pubDate><guid>https://ramdi.fr/github-stars/scrapling-adaptive-web-scraping-with-ai-integration-for-resilient-data-extraction/</guid><description>Scrapling offers an adaptive web scraping framework with AI integration to handle site changes and anti-bot systems, supporting large-scale concurrent crawling with proxy rotation.</description></item></channel></rss>