Web-Scraping on Noureddine RAMDI

Web-Scraping on Noureddine RAMDIhttps://ramdi.fr/tags/web-scraping/Recent content in Web-Scraping on Noureddine RAMDIHugoenSat, 23 May 2026 20:41:27 +0000bopscrk: targeted password wordlist generation with lyric-based OSINThttps://ramdi.fr/github-stars/bopscrk-targeted-password-wordlist-generation-with-lyric-based-osint/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/bopscrk-targeted-password-wordlist-generation-with-lyric-based-osint/bopscrk is a Python CLI tool for targeted password wordlist generation, combining user input and scraped song lyrics with mutations. Useful in pentesting and red teaming.linkedin_scraper: async Playwright-powered LinkedIn scraping with typed data modelshttps://ramdi.fr/github-stars/linkedin-scraper-async-playwright-powered-linkedin-scraping-with-typed-data-models/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/linkedin-scraper-async-playwright-powered-linkedin-scraping-with-typed-data-models/linkedin_scraper is a Python library using Playwright and async/await for structured LinkedIn scraping with typed Pydantic models, session management, and progress callbacks.Sherlock: A modular Python CLI tool for username reconnaissance across 400+ social networkshttps://ramdi.fr/github-stars/sherlock-a-modular-python-cli-tool-for-username-reconnaissance-across-400-social-networks/Tue, 05 May 2026 18:13:32 +0000https://ramdi.fr/github-stars/sherlock-a-modular-python-cli-tool-for-username-reconnaissance-across-400-social-networks/Sherlock is a Python CLI tool that checks username availability across 400+ social networks using a modular JSON-driven detection system. Practical, extensible, and flexible.Exploring Firecrawl Web Agent: A layered autonomous web research agent built on LangChain Deep Agentshttps://ramdi.fr/github-stars/exploring-firecrawl-web-agent-a-layered-autonomous-web-research-agent-built-on-langchain-deep-agents/Tue, 05 May 2026 16:46:42 +0000https://ramdi.fr/github-stars/exploring-firecrawl-web-agent-a-layered-autonomous-web-research-agent-built-on-langchain-deep-agents/Firecrawl Web Agent combines LangChain’s Deep Agents with Firecrawl’s web tools in a layered architecture from Next.js UI to raw API, enabling autonomous web research.Open Deep Research: A Next.js 16 agentic AI assistant for iterative web researchhttps://ramdi.fr/github-stars/open-deep-research-a-next-js-16-agentic-ai-assistant-for-iterative-web-research/Tue, 05 May 2026 16:46:42 +0000https://ramdi.fr/github-stars/open-deep-research-a-next-js-16-agentic-ai-assistant-for-iterative-web-research/Open Deep Research is a TypeScript Next.js 16 app that uses an LLM to plan, execute, and iterate web research via Exa and Upstash QStash, producing sourced reports with images.Standardizing AI agent workflows with xcrawl-skills for web data APIshttps://ramdi.fr/github-stars/standardizing-ai-agent-workflows-with-xcrawl-skills-for-web-data-apis/Tue, 05 May 2026 16:46:42 +0000https://ramdi.fr/github-stars/standardizing-ai-agent-workflows-with-xcrawl-skills-for-web-data-apis/xcrawl-skills defines standardized AI agent skills with normalized inputs/outputs for web data tasks like scraping and crawling via the XCrawl API. It enables multi-agent orchestration with minimal integration.open-researcher: AI-powered web research assistant with integrated scraping and summarizationhttps://ramdi.fr/github-stars/open-researcher-ai-powered-web-research-assistant-with-integrated-scraping-and-summarization/Tue, 05 May 2026 13:37:39 +0000https://ramdi.fr/github-stars/open-researcher-ai-powered-web-research-assistant-with-integrated-scraping-and-summarization/open-researcher is a TypeScript app combining AI APIs and web scraping to assist research workflows. It offers an extensible setup and local dev server for experimentation.Fire Enrich: a sequential multi-agent pipeline for enriched company profiles from emailshttps://ramdi.fr/github-stars/fire-enrich-a-sequential-multi-agent-pipeline-for-enriched-company-profiles-from-emails/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/fire-enrich-a-sequential-multi-agent-pipeline-for-enriched-company-profiles-from-emails/Fire Enrich orchestrates 5 specialized AI agents in sequence to enrich company profiles from email addresses, using Next.js, OpenAI, and Firecrawl.goscrapy: a Go-based web scraping framework with CLI scaffoldinghttps://ramdi.fr/github-stars/goscrapy-a-go-based-web-scraping-framework-with-cli-scaffolding/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/goscrapy-a-go-based-web-scraping-framework-with-cli-scaffolding/goscrapy is a Go framework for web scraping that includes a CLI scaffolding tool. It requires Go 1.23+ and offers a minimal setup for building scraping projects.How IKEA-3D-Model-Download-Button extracts and downloads GLB models from IKEA product pageshttps://ramdi.fr/github-stars/how-ikea-3d-model-download-button-extracts-and-downloads-glb-models-from-ikea-product-pages/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/how-ikea-3d-model-download-button-extracts-and-downloads-glb-models-from-ikea-product-pages/A Tampermonkey userscript adds a download button for IKEA’s 3D models, hooking into their viewer to fetch GLB URLs and enable direct downloads with smart filenames.Automating Facebook Marketplace searches with ai-marketplace-monitorhttps://ramdi.fr/github-stars/automating-facebook-marketplace-searches-with-ai-marketplace-monitor/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/automating-facebook-marketplace-searches-with-ai-marketplace-monitor/ai-marketplace-monitor automates Facebook Marketplace searches using Python and Playwright, enabling personalized item monitoring with notifications. Legal constraints limit its use to hobbyist scenarios.Otakuapuri: a Python desktop app for manga and anime with Cloudflare-bypass scraping and responsive Tkinter UIhttps://ramdi.fr/github-stars/otakuapuri-a-python-desktop-app-for-manga-and-anime-with-cloudflare-bypass-scraping-and-responsive-tkinter-ui/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/otakuapuri-a-python-desktop-app-for-manga-and-anime-with-cloudflare-bypass-scraping-and-responsive-tkinter-ui/Otakuapuri is a Python Tkinter app combining manga download, reading, and anime streaming with Cloudflare-bypass scraping and multithreaded UI for responsiveness.Apify MCP Server: Enabling autonomous AI agent payments for web automation toolshttps://ramdi.fr/github-stars/apify-mcp-server-enabling-autonomous-ai-agent-payments-for-web-automation-tools/Mon, 04 May 2026 10:09:30 +0000https://ramdi.fr/github-stars/apify-mcp-server-enabling-autonomous-ai-agent-payments-for-web-automation-tools/Apify MCP Server exposes 8,000+ web automation tools as MCP tools to AI agents, featuring agentic payments allowing autonomous crypto payments for tool execution. Supports HTTPS and local modes.Google Maps Scraper: navigating the fragility of XPath-based browser automationhttps://ramdi.fr/github-stars/google-maps-scraper-navigating-the-fragility-of-xpath-based-browser-automation/Mon, 04 May 2026 10:07:00 +0000https://ramdi.fr/github-stars/google-maps-scraper-navigating-the-fragility-of-xpath-based-browser-automation/A Python Playwright scraper automates Google Maps data extraction using XPath selectors. It reveals the real maintenance cost of brittle DOM scraping and dependency pinning.AIHawk: An open-source AI agent tackling automated job applications under copyright constraintshttps://ramdi.fr/github-stars/aihawk-an-open-source-ai-agent-tackling-automated-job-applications-under-copyright-constraints/Sat, 02 May 2026 20:07:04 +0000https://ramdi.fr/github-stars/aihawk-an-open-source-ai-agent-tackling-automated-job-applications-under-copyright-constraints/AIHawk offers an open-source AI agent that automates job applications. Its architecture balances open AI automation with the legal realities of third-party integrations.AutoScraper: simplifying web scraping through example-driven rule learninghttps://ramdi.fr/github-stars/autoscraper-simplifying-web-scraping-through-example-driven-rule-learning/Sat, 02 May 2026 20:07:04 +0000https://ramdi.fr/github-stars/autoscraper-simplifying-web-scraping-through-example-driven-rule-learning/AutoScraper automates web scraping by learning extraction rules from sample data, avoiding manual CSS selectors. This Python tool eases scraping repetitive, similar web content.awesome-web-scraping: a curated hub for web scraping tools and resourceshttps://ramdi.fr/github-stars/awesome-web-scraping-a-curated-hub-for-web-scraping-tools-and-resources/Sat, 02 May 2026 20:07:04 +0000https://ramdi.fr/github-stars/awesome-web-scraping-a-curated-hub-for-web-scraping-tools-and-resources/A comprehensive, multi-language curated list of web scraping tools, services, and resources that acts as a vital reference for developers building scraping infrastructure.Camoufox: a stealthy Firefox fork for AI agents and web scrapinghttps://ramdi.fr/github-stars/camoufox-a-stealthy-firefox-fork-for-ai-agents-and-web-scraping/Sat, 02 May 2026 20:07:04 +0000https://ramdi.fr/github-stars/camoufox-a-stealthy-firefox-fork-for-ai-agents-and-web-scraping/Camoufox is a Firefox fork optimized for AI agents and web scraping with stealth fingerprint injection at the C++ level and Python API support.Colly: high-performance web scraping in Go with concurrency and easehttps://ramdi.fr/github-stars/colly-high-performance-web-scraping-in-go-with-concurrency-and-ease/Sat, 02 May 2026 20:07:04 +0000https://ramdi.fr/github-stars/colly-high-performance-web-scraping-in-go-with-concurrency-and-ease/Colly is a Go web scraping framework offering fast, concurrent crawlers with a clean API. It handles cookies, sessions, delays, and caching, suited for data mining and archiving.Crawlee Python: a flexible dual-crawler framework for web scraping and automationhttps://ramdi.fr/github-stars/crawlee-python-a-flexible-dual-crawler-framework-for-web-scraping-and-automation/Sat, 02 May 2026 20:07:04 +0000https://ramdi.fr/github-stars/crawlee-python-a-flexible-dual-crawler-framework-for-web-scraping-and-automation/Crawlee Python offers a dual approach to web scraping with lightweight HTML parsing and headless browser automation, balancing speed and interactivity for diverse scraping needs.Crawlee: a TypeScript library for stealthy web scraping and browser automationhttps://ramdi.fr/github-stars/crawlee-a-typescript-library-for-stealthy-web-scraping-and-browser-automation/Sat, 02 May 2026 20:07:04 +0000https://ramdi.fr/github-stars/crawlee-a-typescript-library-for-stealthy-web-scraping-and-browser-automation/Crawlee is a TypeScript library for web scraping and browser automation with human-like stealth. Supports Playwright, Puppeteer, proxy rotation, and persistent queues.Ferret v2: A declarative Go engine for web data extraction with a new API architecturehttps://ramdi.fr/github-stars/ferret-v2-a-declarative-go-engine-for-web-data-extraction-with-a-new-api-architecture/Sat, 02 May 2026 20:07:04 +0000https://ramdi.fr/github-stars/ferret-v2-a-declarative-go-engine-for-web-data-extraction-with-a-new-api-architecture/Ferret v2 is a Go-based declarative system for web scraping that introduces a native Go API and a compatibility layer to ease migration from v1. It balances embeddability, speed, and API evolution.Headless Chrome Crawler: Simplifying Dynamic Web Scraping with Puppeteerhttps://ramdi.fr/github-stars/headless-chrome-crawler-simplifying-dynamic-web-scraping-with-puppeteer/Sat, 02 May 2026 20:07:04 +0000https://ramdi.fr/github-stars/headless-chrome-crawler-simplifying-dynamic-web-scraping-with-puppeteer/Headless Chrome Crawler offers a high-level API on Puppeteer for scraping dynamic JS-heavy websites with concurrency, caching, and jQuery injection. Ideal for complex scraping tasks.Maigret: A resilient OSINT username scraper across thousands of siteshttps://ramdi.fr/github-stars/maigret-a-resilient-osint-username-scraper-across-thousands-of-sites/Sat, 02 May 2026 20:07:04 +0000https://ramdi.fr/github-stars/maigret-a-resilient-osint-username-scraper-across-thousands-of-sites/Maigret is a Python-based OSINT tool that scrapes public profiles by username from 3,000+ sites without API keys. It features adaptive scraping, anti-blocking, and a web interface.Pydoll: Async-native Chromium automation with typed extraction for web scrapinghttps://ramdi.fr/github-stars/pydoll-async-native-chromium-automation-with-typed-extraction-for-web-scraping/Sat, 02 May 2026 20:07:04 +0000https://ramdi.fr/github-stars/pydoll-async-native-chromium-automation-with-typed-extraction-for-web-scraping/Pydoll is a Python library for Chromium automation using Chrome DevTools Protocol. It offers async-native APIs and Pydantic-powered data extraction for structured, validated scraping.undetected-chromedriver: patching Selenium to evade anti-bot detectionhttps://ramdi.fr/github-stars/undetected-chromedriver-patching-selenium-to-evade-anti-bot-detection/Sat, 02 May 2026 20:07:04 +0000https://ramdi.fr/github-stars/undetected-chromedriver-patching-selenium-to-evade-anti-bot-detection/undetected-chromedriver patches Selenium’s Chromedriver to bypass anti-bot defenses like Distill Network and DataDome. It supports Chrome beta and Chromium-based browsers with ease.Scrapy: a modular Python framework for scalable web scrapinghttps://ramdi.fr/github-stars/scrapy-a-modular-python-framework-for-scalable-web-scraping/Sun, 26 Apr 2026 23:47:28 +0000https://ramdi.fr/github-stars/scrapy-a-modular-python-framework-for-scalable-web-scraping/Scrapy is a Python framework designed for efficient and extensible web scraping, featuring a powerful selector system and item pipelines for data extraction and processing.Requests-HTML: Pythonic web scraping with built-in JavaScript renderinghttps://ramdi.fr/github-stars/requests-html-pythonic-web-scraping-with-built-in-javascript-rendering/Sun, 26 Apr 2026 17:51:11 +0000https://ramdi.fr/github-stars/requests-html-pythonic-web-scraping-with-built-in-javascript-rendering/Requests-HTML extends Python’s Requests library with Chromium-based JavaScript rendering, CSS/XPath selectors, and async support for scraping dynamic web pages easily.Scrapling: adaptive web scraping with AI integration for resilient data extractionhttps://ramdi.fr/github-stars/scrapling-adaptive-web-scraping-with-ai-integration-for-resilient-data-extraction/Sun, 26 Apr 2026 17:51:11 +0000https://ramdi.fr/github-stars/scrapling-adaptive-web-scraping-with-ai-integration-for-resilient-data-extraction/Scrapling offers an adaptive web scraping framework with AI integration to handle site changes and anti-bot systems, supporting large-scale concurrent crawling with proxy rotation.