Google Maps Scraper: navigating the fragility of XPath-based browser automation

Google Maps Scraper stands as a practical example of the challenges in automating data extraction from dynamic web applications. It uses Playwright to control a browser and XPath selectors to parse Google Maps listings — but the tradeoff is clear: the scraping logic is fragile and constantly at risk from Google’s DOM changes. This repo’s approach, including maintaining multiple branches to handle dependency and OS issues, offers a window into the real overhead of browser automation scrapers.

how Google Maps Scraper works under the hood

At its core, this project is a Python-based scraper that automates a Chromium or Chrome browser session using Playwright. Instead of relying on an official API (which Google Maps doesn’t provide for full business data extraction), it simulates user browsing to collect business listings data.

The scraper navigates Google Maps’ web interface, extracts useful info like business names, contact details, reviews, operating hours, and service types. It achieves this by querying the page’s DOM using XPath selectors — a common approach to precisely target HTML elements but one that is brittle when page structure changes.

The output is cleaned and saved as CSV files, making it easy to plug into data analysis or CRM workflows.

The stack here is straightforward: Python 3.8 or 3.9 (not the latest Python 3.10+ due to dependency compatibility), Playwright for browser automation, and standard CSV handling. The repo uses XPath heavily for DOM parsing, reflecting a choice for precision over robustness.

The repo maintains three branches: main, latest-libs, and linux. This branching strategy addresses dependency version differences and OS-specific issues, highlighting the maintenance challenges imposed by upstream library churn and platform quirks.

technical strengths and tradeoffs in the scraper’s design

The use of Playwright is a strong point — it offers a stable, well-maintained API for controlling browsers programmatically. It supports Chromium, Firefox, and WebKit, though this repo primarily targets Chrome/Chromium.

Choosing XPath selectors enables targeting elements with fine granularity in the complex Google Maps DOM. However, XPath’s precision is also its Achilles’ heel: Google frequently updates the Maps UI, changing element hierarchies or class names. This breaks XPath queries, requiring manual updates.

This fragility means the scraper is not “set and forget.” It demands ongoing maintenance, testing, and adjustments. The repo’s three-branch structure is a direct response to this overhead, isolating fixes for library updates and platform-specific bugs. While this improves stability for certain environments, it adds complexity for developers trying to keep the project running.

Additionally, the repo requires running in non-headless mode by default. This is often necessary for scraping complex JavaScript-driven sites where headless mode triggers bot detection or incomplete rendering.

Python 3.8 or 3.9 is explicitly required as dependencies are not guaranteed compatible with newer Python versions. This is a reminder that dependency management is a significant concern in production scraping tools.

The repo’s code is surprisingly clear for a scraper of this nature, with well-defined steps for browser setup, navigation, extraction, and data cleaning. However, the reliance on brittle XPath selectors and pinned dependencies limits its long-term robustness.

quick start with Google Maps Scraper

Prerequisites

Python 3.8 or 3.9 (Python 3.10+ may not be compatible with some dependencies)
Google Chrome or Chromium browser installed (for Playwright)

Installation

Clone this repository:

git clone https://github.com/zohaibbashir/Google-Maps-Scrapper.git
cd google-maps-scraper

Install Python dependencies:
```
pip install -r requirements.txt
```
Install Playwright browsers:
```
playwright install
```

After these steps, you can run the scraper scripts as described in the repo documentation to start extracting business data.

who should consider using this scraper

This tool is relevant for developers or data engineers who need to extract business listing data from Google Maps without access to an official API. It demonstrates a working browser automation approach using Playwright and Python.

However, it’s best suited for scenarios where you can tolerate some maintenance overhead. The scraper’s XPath-based extraction is fragile and prone to breakage with changes in Google Maps’ DOM. Expect to spend time debugging and updating selectors.

If you need a more robust, scalable solution, consider API-based approaches where possible, or tools that use CSS selectors or more resilient scraping strategies. This repo is a good learning resource and starting point for browser automation scraping but not a turnkey production scraper.

In production, this means balancing the precision XPath offers against its brittleness. The project’s multi-branch maintenance approach is an honest acknowledgment of the real-world cost of keeping such scrapers alive.

Overall, Google Maps Scraper is a practical example of browser automation scraping with clear tradeoffs — valuable to practitioners who want to understand the maintenance realities behind the scenes.

Crawlee Python: a flexible dual-crawler framework for web scraping and automation — Crawlee Python offers a dual approach to web scraping with lightweight HTML parsing and headless browser automation, bal
Crawlee: a TypeScript library for stealthy web scraping and browser automation — Crawlee is a TypeScript library for web scraping and browser automation with human-like stealth. Supports Playwright, Pu
Scrapy: a modular Python framework for scalable web scraping — Scrapy is a Python framework designed for efficient and extensible web scraping, featuring a powerful selector system an
AutoScraper: simplifying web scraping through example-driven rule learning — AutoScraper automates web scraping by learning extraction rules from sample data, avoiding manual CSS selectors. This Py
WebMagic: a flexible Java web crawler framework with dual extraction modes — WebMagic is a Java web crawler framework offering both programmatic and annotation-driven extraction, supporting multi-t

→ GitHub Repo: zohaibbashir/Google-Maps-Scrapper ⭐ 624 · Python

Noureddine RAMDI / Google Maps Scraper: navigating the fragility of XPath-based browser automation