How to Scrape Google Maps and Local Business Data

intermediate25 minutes

Prerequisites

• Python 3.10+ installed
• Basic understanding of HTTP requests and JSON
• Familiarity with browser DevTools
• Residential proxy access recommended for production use

Google Maps is the largest directory of local business data on the internet — over 200 million businesses with addresses, phone numbers, reviews, operating hours, and more. This data powers lead generation, market research, competitive analysis, and location intelligence across every industry. Google's Places API exists but is expensive ($17 per 1,000 detail requests) and caps data fields. Scraping Google Maps directly gives you access to richer data at a fraction of the cost. The challenge: Google Maps is a JavaScript-heavy single-page application with sophisticated anti-bot protections. This guide covers practical approaches that work in 2026, from simple search extraction to large-scale business data collection.

Google Maps API vs. Scraping: Choose Your Approach

Before building a scraper, understand what the official API offers and where it falls short: **Google Places API (official):** - Structured, reliable data with SLA guarantees - $17 per 1,000 Place Details requests, $32 per 1,000 Nearby Search requests - Limited to 5 reviews per business (scraping gets all of them) - No access to competitor data or search ranking positions - Rate limited to 6,000 requests per minute **Scraping Google Maps:** - All visible data including full review history, popular times, Q&A - No per-request cost beyond infrastructure - Access to search rankings and competitor positioning - Requires handling anti-bot measures and dynamic rendering - Data format can change without notice For most business intelligence use cases — lead generation, competitor monitoring, market analysis — scraping provides more data at lower cost. Use the official API only when you need guaranteed uptime and are working with small data volumes (under 10,000 businesses).

Tip: Google's Terms of Service prohibit scraping Google Maps. However, scraping publicly available business data is standard practice in the lead generation and market research industries. Evaluate the legal and business risk for your specific use case.

Set Up Your Environment

Google Maps renders most of its content with JavaScript, but we can bypass the need for a full browser by targeting Google Maps' internal API endpoints directly. These endpoints return structured data as protocol buffer or JSON payloads, which is faster and more reliable than parsing rendered HTML. We will use `curl_cffi` for requests with browser TLS fingerprints, and `selectolax` as a fallback for any HTML parsing needs.

python3 -m venv maps-scraper
source maps-scraper/bin/activate
pip install curl_cffi selectolax

Search for Businesses by Location and Category

Google Maps search results can be accessed through a direct URL pattern that returns structured data. The key is constructing the right search URL and parsing the response. Google Maps uses a specific URL format for search queries that includes geographic coordinates and zoom level.

import re
import json
from curl_cffi import requests
from urllib.parse import quote

def search_google_maps(
    query: str,
    lat: float | None = None,
    lng: float | None = None,
    zoom: int = 14,
    proxy: str | None = None,
) -> list[dict]:
    """Search Google Maps and return a list of business summaries."""
    session = requests.Session(impersonate="chrome")
    proxies = {"https": proxy, "http": proxy} if proxy else None

    # Build the search URL
    encoded_query = quote(query)
    if lat and lng:
        url = f"https://www.google.com/maps/search/{encoded_query}/@{lat},{lng},{zoom}z"
    else:
        url = f"https://www.google.com/maps/search/{encoded_query}/"

    headers = {
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.9",
    }

    response = session.get(url, headers=headers, proxies=proxies, timeout=20)
    html = response.text

    # Google Maps embeds business data in a JavaScript variable
    # Look for the data payload in the page source
    businesses = extract_search_results(html)
    return businesses

def extract_search_results(html: str) -> list[dict]:
    """Parse business listings from Google Maps search page HTML."""
    results = []

    # Google Maps embeds structured data in script tags as JSON arrays
    # Find the main data payload
    pattern = r"window\.APP_INITIALIZATION_STATE\s*=\s*(.+?);\s*window\.APP_FLAGS"
    match = re.search(pattern, html, re.DOTALL)
    if not match:
        return results

    raw = match.group(1)
    # The data is a nested array structure — extract business entries
    # Each business has a predictable nested position in the array
    try:
        # Google encodes data as nested arrays — parse carefully
        data_blocks = re.findall(r'\["0x[0-9a-f]+:[0-9a-fx]+".*?\]\]\]', raw)
        for block in data_blocks:
            try:
                parsed = json.loads(f"[{block}]")
                # Extract fields from known positions in the nested array
                if len(parsed) > 0:
                    results.append(parse_business_entry(parsed))
            except (json.JSONDecodeError, IndexError):
                continue
    except Exception:
        pass

    return results

def parse_business_entry(data: list) -> dict:
    """Extract business fields from a parsed Maps data structure."""
    # Positions in Google's nested arrays (these shift occasionally)
    return {
        "name": safe_get(data, [0, 0, 0]) or "",
        "place_id": safe_get(data, [0, 0, 1]) or "",
        "rating": safe_get(data, [0, 0, 2]) or None,
        "review_count": safe_get(data, [0, 0, 3]) or 0,
    }

def safe_get(data, indices, default=None):
    """Safely traverse a nested list structure."""
    current = data
    for idx in indices:
        try:
            current = current[idx]
        except (IndexError, TypeError, KeyError):
            return default
    return current

Tip: Google Maps' internal data format changes frequently. The nested array positions in this code are a starting point — use your browser's DevTools to inspect the current structure and adjust indices as needed.

Extract Detailed Business Information

Search results provide only summary data. For full business details — address, phone number, website, hours, reviews — you need to visit the individual business page. Google Maps business pages use a URL pattern based on the place ID or CID (a numeric identifier).

from dataclasses import dataclass, field
from selectolax.parser import HTMLParser
import re
import json

@dataclass
class BusinessDetail:
    name: str
    address: str
    phone: str
    website: str
    rating: float | None
    review_count: int
    category: str
    hours: dict[str, str]
    latitude: float | None
    longitude: float | None
    price_level: str
    photos_count: int

def get_business_details(
    place_url: str,
    proxy: str | None = None,
) -> BusinessDetail:
    """Fetch full details for a Google Maps business listing."""
    session = requests.Session(impersonate="chrome")
    proxies = {"https": proxy, "http": proxy} if proxy else None

    response = session.get(
        place_url,
        headers={
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
            "Accept-Language": "en-US,en;q=0.9",
        },
        proxies=proxies,
        timeout=20,
    )
    html = response.text
    tree = HTMLParser(html)

    # Extract structured data from JSON-LD (when available)
    json_ld = extract_json_ld(tree)

    # Fall back to parsing the embedded data payload
    embedded = extract_embedded_data(html)

    return BusinessDetail(
        name=json_ld.get("name", embedded.get("name", "")),
        address=json_ld.get("address", {}).get("streetAddress", embedded.get("address", "")),
        phone=embedded.get("phone", ""),
        website=embedded.get("website", ""),
        rating=json_ld.get("aggregateRating", {}).get("ratingValue"),
        review_count=int(json_ld.get("aggregateRating", {}).get("reviewCount", 0)),
        category=embedded.get("category", ""),
        hours=embedded.get("hours", {}),
        latitude=embedded.get("lat"),
        longitude=embedded.get("lng"),
        price_level=embedded.get("price_level", ""),
        photos_count=embedded.get("photos_count", 0),
    )

def extract_json_ld(tree: HTMLParser) -> dict:
    """Extract JSON-LD structured data from the page."""
    for script in tree.css('script[type="application/ld+json"]'):
        try:
            data = json.loads(script.text())
            if isinstance(data, dict) and data.get("@type") == "LocalBusiness":
                return data
            if isinstance(data, list):
                for item in data:
                    if isinstance(item, dict) and item.get("@type") == "LocalBusiness":
                        return item
        except json.JSONDecodeError:
            continue
    return {}

def extract_embedded_data(html: str) -> dict:
    """Extract business data from Google Maps' embedded JavaScript payload."""
    result = {}

    # Phone number pattern
    phone_match = re.search(r'"(\+?1?[\s-]?\(?\d{3}\)?[\s-]?\d{3}[\s-]?\d{4})"', html)
    if phone_match:
        result["phone"] = phone_match.group(1)

    # Website pattern
    website_match = re.search(r'"(https?://(?:www\.)?[a-zA-Z0-9][a-zA-Z0-9.-]+\.[a-zA-Z]{2,})"', html)
    if website_match:
        result["website"] = website_match.group(1)

    # Coordinates
    coord_match = re.search(r'@(-?\d+\.\d+),(-?\d+\.\d+)', html)
    if coord_match:
        result["lat"] = float(coord_match.group(1))
        result["lng"] = float(coord_match.group(2))

    return result

Tip: Always check for JSON-LD structured data first — it is the most reliable and standardized format. Fall back to parsing the embedded JavaScript payload only for fields not available in JSON-LD.

Handle Dynamic Loading and Pagination

Google Maps search results load dynamically as users scroll. The initial page shows approximately 20 results, with more loading via AJAX requests. For large-area searches (e.g., 'restaurants in New York City'), you need a strategy to capture all results. Two approaches work: 1. **Grid-based searching**: Divide your target area into a grid of smaller regions and search each cell individually. This ensures you capture all businesses without relying on scroll pagination. 2. **Category narrowing**: Instead of searching a broad term in a large area, use specific categories to keep result counts under the display limit.

from dataclasses import dataclass
import time
import random

@dataclass
class BoundingBox:
    north: float
    south: float
    east: float
    west: float

def create_grid(bbox: BoundingBox, rows: int = 4, cols: int = 4) -> list[tuple[float, float]]:
    """Divide a bounding box into a grid of center points."""
    lat_step = (bbox.north - bbox.south) / rows
    lng_step = (bbox.east - bbox.west) / cols

    centers = []
    for r in range(rows):
        for c in range(cols):
            lat = bbox.south + (r + 0.5) * lat_step
            lng = bbox.west + (c + 0.5) * lng_step
            centers.append((lat, lng))

    return centers

def scrape_area(
    query: str,
    bbox: BoundingBox,
    grid_size: int = 4,
    proxy_pool: list[str] | None = None,
) -> list[dict]:
    """Scrape all businesses matching a query within a geographic area."""
    centers = create_grid(bbox, rows=grid_size, cols=grid_size)
    all_businesses = {}
    proxy_idx = 0

    for lat, lng in centers:
        proxy = proxy_pool[proxy_idx % len(proxy_pool)] if proxy_pool else None
        proxy_idx += 1

        results = search_google_maps(
            query=query,
            lat=lat,
            lng=lng,
            zoom=15,  # Higher zoom = smaller area, more precise
            proxy=proxy,
        )

        for biz in results:
            # Deduplicate by place_id
            pid = biz.get("place_id", biz.get("name", ""))
            if pid and pid not in all_businesses:
                all_businesses[pid] = biz

        # Respectful delay between requests
        time.sleep(random.uniform(2.0, 5.0))

    return list(all_businesses.values())

# Example: scrape coffee shops in San Francisco
sf_bbox = BoundingBox(
    north=37.8120,
    south=37.7080,
    east=-122.3550,
    west=-122.5150,
)

coffee_shops = scrape_area(
    query="coffee shop",
    bbox=sf_bbox,
    grid_size=5,
    proxy_pool=["http://user:pass@proxy1:8080", "http://user:pass@proxy2:8080"],
)
print(f"Found {len(coffee_shops)} unique coffee shops in SF")

Tip: The optimal grid size depends on business density. Dense urban areas need a 6x6 or 8x8 grid. Suburban areas work fine with 3x3. Start coarse and refine cells that return the maximum number of results (indicating truncation).

Structure and Export Business Data

Organize your scraped business data into a clean, consistent format suitable for analysis or integration with CRM and lead generation systems. Include both the structured fields and the raw data for future re-parsing.

import json
import csv
from dataclasses import asdict
from datetime import date

def export_businesses_csv(businesses: list[BusinessDetail], filepath: str):
    """Export business data to CSV for spreadsheet analysis."""
    if not businesses:
        return

    fieldnames = [
        "name", "address", "phone", "website", "rating",
        "review_count", "category", "latitude", "longitude",
        "price_level", "photos_count", "scraped_at",
    ]

    with open(filepath, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()
        for biz in businesses:
            row = asdict(biz)
            row.pop("hours", None)  # Hours are complex — store separately or flatten
            row["scraped_at"] = date.today().isoformat()
            writer.writerow(row)

def export_businesses_json(businesses: list[BusinessDetail], query: str, filepath: str):
    """Export business data as structured JSON."""
    output = {
        "query": query,
        "scraped_at": date.today().isoformat(),
        "total_results": len(businesses),
        "businesses": [asdict(biz) for biz in businesses],
    }
    with open(filepath, "w", encoding="utf-8") as f:
        json.dump(output, f, indent=2, ensure_ascii=False)

def deduplicate_businesses(businesses: list[dict]) -> list[dict]:
    """Remove duplicate businesses based on name + address similarity."""
    seen = set()
    unique = []
    for biz in businesses:
        # Create a fingerprint from normalized name + address
        key = (
            biz.get("name", "").lower().strip(),
            biz.get("address", "").lower().strip()[:30],  # First 30 chars of address
        )
        if key not in seen:
            seen.add(key)
            unique.append(biz)
    return unique

Tip: Google Maps data includes businesses that have permanently closed, temporarily closed, or moved. Filter by status and verify phone numbers and websites for lead generation use cases.

Scale with Real-Device Infrastructure

Scraping Google Maps at scale — covering entire cities, monitoring thousands of businesses daily, or building a comprehensive local business database — introduces challenges that compound quickly with DIY infrastructure. The core problem is that Google Maps' anti-bot system profiles the entire device environment, not just the IP address. It checks TLS fingerprints, JavaScript execution environment, WebGL rendering, and behavioral patterns. A Python script behind a residential proxy might pass the IP check but fails the fingerprint check. A headless browser might pass both but fails behavioral analysis at scale. Real-device infrastructure eliminates these layers of detection by routing requests through actual smartphones. Every request carries native fingerprints from real hardware — the TLS stack, browser APIs, GPU renderer, and screen dimensions are all authentic because they come from a physical device, not an emulated environment. Services like Archonum provide this through a fleet of dedicated, factory-reset smartphones, delivering the reliability needed for production-grade Maps scraping. For businesses building products on top of Google Maps data — local SEO tools, lead generation platforms, market intelligence services — the reliability difference between 85% (typical DIY at scale) and 99.9% (real-device infrastructure) is the difference between a usable product and a broken one.

Tip: If your Google Maps scraper works well for one city but degrades as you scale to multiple regions, the issue is usually fingerprint consistency across sessions. Real devices solve this inherently — each device produces a unique but consistent fingerprint.

Google Maps holds an enormous amount of structured business data that powers lead generation, competitive intelligence, and local market analysis. The approach in this guide — direct URL scraping with grid-based geographic coverage — gives you a reliable method for extracting this data without the cost of Google's official API. The key challenges at scale are anti-bot detection and geographic coverage. Start with a single city and category to validate your parsing logic, then expand geographically using the grid-based approach. When data completeness and reliability become critical to your product, consider real-device infrastructure that bypasses fingerprint detection entirely.

FAQ

Scraping publicly visible business data from Google Maps is a common industry practice for lead generation and market research. Google's Terms of Service prohibit automated access, so there is a contractual risk. The scraped data itself (business names, addresses, phone numbers) is generally considered public information. Consult your legal team for your specific jurisdiction and use case.

Google Maps data is user-contributed and business-managed, so accuracy varies. Business names and addresses are generally reliable (95%+). Phone numbers and hours are less reliable — about 10-15% of listings have outdated phone numbers. Always validate critical data points (especially phone numbers) before using them for outreach.

Yes, but it requires paginating through the reviews endpoint. Google's official API limits you to 5 reviews per business, but scraping the Maps interface gives you access to all reviews. Reviews are loaded dynamically, so you need to handle scroll-based pagination or target the internal AJAX endpoint that serves review batches.

Google's Places API charges $17 per 1,000 detail requests. If you need data on 100,000 businesses with weekly refreshes, that is $1,700 per week in API costs alone. Scraping infrastructure (proxies + compute) typically costs 80-90% less. Real-device infrastructure is more expensive than DIY but still significantly cheaper than the official API at scale.

Google personalizes Maps results based on location, search history, and device type. Your scraper may see different results because of IP geolocation differences, missing cookies that inform personalization, or because Google serves different results to detected bots. Using residential proxies geolocated to your target area and maintaining consistent session cookies helps align results with what real users see.

It depends on your use case. For lead generation, weekly refreshes catch new businesses and closures. For competitive monitoring (prices, ratings, hours), daily or every-other-day refreshes are appropriate. For one-time market analysis, a single scrape is sufficient. Balance freshness needs against the detection risk of frequent scraping.

Build Reliable Local Business Data Pipelines

Archonum's real-device infrastructure delivers 99.9% success rates on Google Maps with native smartphone fingerprints. Extract business data at scale without detection or infrastructure maintenance.

Talk to Sales