How to Scrape Google Maps and Local Business Data
Prerequisites
- • Python 3.10+ installed
- • Basic understanding of HTTP requests and JSON
- • Familiarity with browser DevTools
- • Residential proxy access recommended for production use
Google Maps API vs. Scraping: Choose Your Approach
Tip: Google's Terms of Service prohibit scraping Google Maps. However, scraping publicly available business data is standard practice in the lead generation and market research industries. Evaluate the legal and business risk for your specific use case.
Set Up Your Environment
python3 -m venv maps-scraper
source maps-scraper/bin/activate
pip install curl_cffi selectolaxSearch for Businesses by Location and Category
import re
import json
from curl_cffi import requests
from urllib.parse import quote
def search_google_maps(
query: str,
lat: float | None = None,
lng: float | None = None,
zoom: int = 14,
proxy: str | None = None,
) -> list[dict]:
"""Search Google Maps and return a list of business summaries."""
session = requests.Session(impersonate="chrome")
proxies = {"https": proxy, "http": proxy} if proxy else None
# Build the search URL
encoded_query = quote(query)
if lat and lng:
url = f"https://www.google.com/maps/search/{encoded_query}/@{lat},{lng},{zoom}z"
else:
url = f"https://www.google.com/maps/search/{encoded_query}/"
headers = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
}
response = session.get(url, headers=headers, proxies=proxies, timeout=20)
html = response.text
# Google Maps embeds business data in a JavaScript variable
# Look for the data payload in the page source
businesses = extract_search_results(html)
return businesses
def extract_search_results(html: str) -> list[dict]:
"""Parse business listings from Google Maps search page HTML."""
results = []
# Google Maps embeds structured data in script tags as JSON arrays
# Find the main data payload
pattern = r"window\.APP_INITIALIZATION_STATE\s*=\s*(.+?);\s*window\.APP_FLAGS"
match = re.search(pattern, html, re.DOTALL)
if not match:
return results
raw = match.group(1)
# The data is a nested array structure — extract business entries
# Each business has a predictable nested position in the array
try:
# Google encodes data as nested arrays — parse carefully
data_blocks = re.findall(r'\["0x[0-9a-f]+:[0-9a-fx]+".*?\]\]\]', raw)
for block in data_blocks:
try:
parsed = json.loads(f"[{block}]")
# Extract fields from known positions in the nested array
if len(parsed) > 0:
results.append(parse_business_entry(parsed))
except (json.JSONDecodeError, IndexError):
continue
except Exception:
pass
return results
def parse_business_entry(data: list) -> dict:
"""Extract business fields from a parsed Maps data structure."""
# Positions in Google's nested arrays (these shift occasionally)
return {
"name": safe_get(data, [0, 0, 0]) or "",
"place_id": safe_get(data, [0, 0, 1]) or "",
"rating": safe_get(data, [0, 0, 2]) or None,
"review_count": safe_get(data, [0, 0, 3]) or 0,
}
def safe_get(data, indices, default=None):
"""Safely traverse a nested list structure."""
current = data
for idx in indices:
try:
current = current[idx]
except (IndexError, TypeError, KeyError):
return default
return currentTip: Google Maps' internal data format changes frequently. The nested array positions in this code are a starting point — use your browser's DevTools to inspect the current structure and adjust indices as needed.
Extract Detailed Business Information
from dataclasses import dataclass, field
from selectolax.parser import HTMLParser
import re
import json
@dataclass
class BusinessDetail:
name: str
address: str
phone: str
website: str
rating: float | None
review_count: int
category: str
hours: dict[str, str]
latitude: float | None
longitude: float | None
price_level: str
photos_count: int
def get_business_details(
place_url: str,
proxy: str | None = None,
) -> BusinessDetail:
"""Fetch full details for a Google Maps business listing."""
session = requests.Session(impersonate="chrome")
proxies = {"https": proxy, "http": proxy} if proxy else None
response = session.get(
place_url,
headers={
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
},
proxies=proxies,
timeout=20,
)
html = response.text
tree = HTMLParser(html)
# Extract structured data from JSON-LD (when available)
json_ld = extract_json_ld(tree)
# Fall back to parsing the embedded data payload
embedded = extract_embedded_data(html)
return BusinessDetail(
name=json_ld.get("name", embedded.get("name", "")),
address=json_ld.get("address", {}).get("streetAddress", embedded.get("address", "")),
phone=embedded.get("phone", ""),
website=embedded.get("website", ""),
rating=json_ld.get("aggregateRating", {}).get("ratingValue"),
review_count=int(json_ld.get("aggregateRating", {}).get("reviewCount", 0)),
category=embedded.get("category", ""),
hours=embedded.get("hours", {}),
latitude=embedded.get("lat"),
longitude=embedded.get("lng"),
price_level=embedded.get("price_level", ""),
photos_count=embedded.get("photos_count", 0),
)
def extract_json_ld(tree: HTMLParser) -> dict:
"""Extract JSON-LD structured data from the page."""
for script in tree.css('script[type="application/ld+json"]'):
try:
data = json.loads(script.text())
if isinstance(data, dict) and data.get("@type") == "LocalBusiness":
return data
if isinstance(data, list):
for item in data:
if isinstance(item, dict) and item.get("@type") == "LocalBusiness":
return item
except json.JSONDecodeError:
continue
return {}
def extract_embedded_data(html: str) -> dict:
"""Extract business data from Google Maps' embedded JavaScript payload."""
result = {}
# Phone number pattern
phone_match = re.search(r'"(\+?1?[\s-]?\(?\d{3}\)?[\s-]?\d{3}[\s-]?\d{4})"', html)
if phone_match:
result["phone"] = phone_match.group(1)
# Website pattern
website_match = re.search(r'"(https?://(?:www\.)?[a-zA-Z0-9][a-zA-Z0-9.-]+\.[a-zA-Z]{2,})"', html)
if website_match:
result["website"] = website_match.group(1)
# Coordinates
coord_match = re.search(r'@(-?\d+\.\d+),(-?\d+\.\d+)', html)
if coord_match:
result["lat"] = float(coord_match.group(1))
result["lng"] = float(coord_match.group(2))
return resultTip: Always check for JSON-LD structured data first — it is the most reliable and standardized format. Fall back to parsing the embedded JavaScript payload only for fields not available in JSON-LD.
Handle Dynamic Loading and Pagination
from dataclasses import dataclass
import time
import random
@dataclass
class BoundingBox:
north: float
south: float
east: float
west: float
def create_grid(bbox: BoundingBox, rows: int = 4, cols: int = 4) -> list[tuple[float, float]]:
"""Divide a bounding box into a grid of center points."""
lat_step = (bbox.north - bbox.south) / rows
lng_step = (bbox.east - bbox.west) / cols
centers = []
for r in range(rows):
for c in range(cols):
lat = bbox.south + (r + 0.5) * lat_step
lng = bbox.west + (c + 0.5) * lng_step
centers.append((lat, lng))
return centers
def scrape_area(
query: str,
bbox: BoundingBox,
grid_size: int = 4,
proxy_pool: list[str] | None = None,
) -> list[dict]:
"""Scrape all businesses matching a query within a geographic area."""
centers = create_grid(bbox, rows=grid_size, cols=grid_size)
all_businesses = {}
proxy_idx = 0
for lat, lng in centers:
proxy = proxy_pool[proxy_idx % len(proxy_pool)] if proxy_pool else None
proxy_idx += 1
results = search_google_maps(
query=query,
lat=lat,
lng=lng,
zoom=15, # Higher zoom = smaller area, more precise
proxy=proxy,
)
for biz in results:
# Deduplicate by place_id
pid = biz.get("place_id", biz.get("name", ""))
if pid and pid not in all_businesses:
all_businesses[pid] = biz
# Respectful delay between requests
time.sleep(random.uniform(2.0, 5.0))
return list(all_businesses.values())
# Example: scrape coffee shops in San Francisco
sf_bbox = BoundingBox(
north=37.8120,
south=37.7080,
east=-122.3550,
west=-122.5150,
)
coffee_shops = scrape_area(
query="coffee shop",
bbox=sf_bbox,
grid_size=5,
proxy_pool=["http://user:pass@proxy1:8080", "http://user:pass@proxy2:8080"],
)
print(f"Found {len(coffee_shops)} unique coffee shops in SF")Tip: The optimal grid size depends on business density. Dense urban areas need a 6x6 or 8x8 grid. Suburban areas work fine with 3x3. Start coarse and refine cells that return the maximum number of results (indicating truncation).
Structure and Export Business Data
import json
import csv
from dataclasses import asdict
from datetime import date
def export_businesses_csv(businesses: list[BusinessDetail], filepath: str):
"""Export business data to CSV for spreadsheet analysis."""
if not businesses:
return
fieldnames = [
"name", "address", "phone", "website", "rating",
"review_count", "category", "latitude", "longitude",
"price_level", "photos_count", "scraped_at",
]
with open(filepath, "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
for biz in businesses:
row = asdict(biz)
row.pop("hours", None) # Hours are complex — store separately or flatten
row["scraped_at"] = date.today().isoformat()
writer.writerow(row)
def export_businesses_json(businesses: list[BusinessDetail], query: str, filepath: str):
"""Export business data as structured JSON."""
output = {
"query": query,
"scraped_at": date.today().isoformat(),
"total_results": len(businesses),
"businesses": [asdict(biz) for biz in businesses],
}
with open(filepath, "w", encoding="utf-8") as f:
json.dump(output, f, indent=2, ensure_ascii=False)
def deduplicate_businesses(businesses: list[dict]) -> list[dict]:
"""Remove duplicate businesses based on name + address similarity."""
seen = set()
unique = []
for biz in businesses:
# Create a fingerprint from normalized name + address
key = (
biz.get("name", "").lower().strip(),
biz.get("address", "").lower().strip()[:30], # First 30 chars of address
)
if key not in seen:
seen.add(key)
unique.append(biz)
return uniqueTip: Google Maps data includes businesses that have permanently closed, temporarily closed, or moved. Filter by status and verify phone numbers and websites for lead generation use cases.
Scale with Real-Device Infrastructure
Tip: If your Google Maps scraper works well for one city but degrades as you scale to multiple regions, the issue is usually fingerprint consistency across sessions. Real devices solve this inherently — each device produces a unique but consistent fingerprint.
FAQ
Scraping publicly visible business data from Google Maps is a common industry practice for lead generation and market research. Google's Terms of Service prohibit automated access, so there is a contractual risk. The scraped data itself (business names, addresses, phone numbers) is generally considered public information. Consult your legal team for your specific jurisdiction and use case.
Google Maps data is user-contributed and business-managed, so accuracy varies. Business names and addresses are generally reliable (95%+). Phone numbers and hours are less reliable — about 10-15% of listings have outdated phone numbers. Always validate critical data points (especially phone numbers) before using them for outreach.
Yes, but it requires paginating through the reviews endpoint. Google's official API limits you to 5 reviews per business, but scraping the Maps interface gives you access to all reviews. Reviews are loaded dynamically, so you need to handle scroll-based pagination or target the internal AJAX endpoint that serves review batches.
Google's Places API charges $17 per 1,000 detail requests. If you need data on 100,000 businesses with weekly refreshes, that is $1,700 per week in API costs alone. Scraping infrastructure (proxies + compute) typically costs 80-90% less. Real-device infrastructure is more expensive than DIY but still significantly cheaper than the official API at scale.
Google personalizes Maps results based on location, search history, and device type. Your scraper may see different results because of IP geolocation differences, missing cookies that inform personalization, or because Google serves different results to detected bots. Using residential proxies geolocated to your target area and maintaining consistent session cookies helps align results with what real users see.
It depends on your use case. For lead generation, weekly refreshes catch new businesses and closures. For competitive monitoring (prices, ratings, hours), daily or every-other-day refreshes are appropriate. For one-time market analysis, a single scrape is sufficient. Balance freshness needs against the detection risk of frequent scraping.
Build Reliable Local Business Data Pipelines
Archonum's real-device infrastructure delivers 99.9% success rates on Google Maps with native smartphone fingerprints. Extract business data at scale without detection or infrastructure maintenance.
Talk to Sales