Case study - Targeted leads Updated May 2026

From 9,400 places to 2,500 cold-call-ready leads, without scraping the whole internet.

For a US SaaS that helps local service businesses capture anonymous customer feedback via QR codes, Scrapingdome built a targeted lead-extraction pipeline across 5 US states, 8 verticals, and 20 metro areas. Filtered by rating, review volume, operating hours, and franchise status, the outreach list lands clean and HubSpot-ready.

Problem

Off-the-shelf lead lists fail in three predictable ways.

A US SaaS for customer-feedback collection via QR codes needed a high-quality cold-outreach list to fuel their sales pipeline. The off-the-shelf options had three failure modes that compound at scale.

Stale data. Purchased lists carry phone numbers that no longer route to the business, addresses for locations that closed, and active flags from a year ago. Wrong signal. Generic exports mix multi-billion-dollar national chains with the independent operators who actually have the autonomy to buy a SaaS subscription. No qualifying detail. Without rating, review volume, and operating hours, every lead looks the same, with no way to tell whether the business has an active customer base worth surveying.

The client needed a list where every row already passed a sanity check: phone valid, currently operating, mid-range rating with room to improve (a 5-star place has nothing to gain from a feedback tool), enough monthly traffic to make the SaaS pay back, and independently operated so the decision-maker actually picks up the phone.

Approach

Four layers, each narrowing the funnel with a different signal.

Aggregate discovery

The Google Area Insights surface (region or circle filters with rating and price-tier predicates) returns lists of place IDs that already satisfy the geographic and quality criteria. Twenty metro circles cover NY, NJ, PA, MA, and CT, paired with eight verticals (restaurants, cafes, hair salons, barber shops, gyms, car washes, dentists, beauty salons). The API caps each query result set at roughly 450 places, so larger metros are subdivided by rating band (0.3-star windows, then 0.1-star if needed) until every slice fits under the cap. Output: roughly 9,400 candidate place IDs, deduplicated across overlapping metro areas.

Detail extraction (resilient, multi-source)

Each place ID is enriched with full business detail: business name, phone, address, website, rating, review count, operating hours, primary category. Two extraction paths run in parallel for resilience: a headless, anti-bot resistant browser session (UC mode for page load, CDP for stealth DOM reads) handles the bulk, while a structured API path covers cases where the browser stalls or rate-limits. Checkpoints save every ten records so a crash never costs more than a few minutes of work. Result: 8,449 fully enriched business records out of the 9,400 candidates.

Quality filtering

The enriched set runs through a stack of programmatic filters: US phone-number format validation (regex against the digit pattern, rejecting placeholders and non-NA numbers); operating hours must be present (a business with no posted hours is unlikely to be actively operating); major-franchise name matching (Subway, Domino's, 7-Eleven, Starbucks, McDonald's, etc.) excluded by default but preserved in a separate bonus dataset; review-count band 60 to 800 (enough to prove ongoing traffic, not so many that the listing has saturated); rating band 3.3 to 4.2 (mid-range businesses where a feedback tool has room to move the needle). This stack drops the file from 8,449 enriched records to 2,498 cold-call-ready leads, plus 641 franchise locations kept aside as an optional secondary dataset.

Format and delivery

Final output is a HubSpot-ready CSV (UTF-8) with the columns the client's CRM consumes directly: business name, phone, address, city, state, zip, website, rating, review count, operating hours, category. Records are pre-sorted by rating descending so the best leads land at the top of the import. A second CSV holds the franchise locations, useful if the client wants to test outreach to franchisees that have local marketing autonomy.

Scale and outcome Delivered

Five states, eight verticals, one CSV ready for cold outreach.

The same pipeline scales horizontally. Adding a new state is a config change (a few more metro circles), not a code rewrite. Switching verticals is also config: the Google Maps type catalogue covers everything from auto repair to law firms to hotels. The extraction and filtering layers stay identical.

9,400

candidate place IDs discovered across 20 metro circles

8,449

records enriched with full business detail

2,498

cold-call-ready leads delivered (plus 641 franchise bonus)

States live: NY, NJ, PA, MA, CT. Eight verticals in a single batch. Twenty metro circles covering roughly 85 percent of population in scope. Pre-sorted CSV by rating descending so the best leads land at the top of the import.

What this proves

Layer signals, do not buy volume.

The Google Maps surface holds a high-quality lead signal if it is mined carefully. The trick is layering signals, not buying volume.

Geographic filtering at the API layer keeps the cost of extraction down by an order of magnitude versus blanket scraping. Rating and review-band filters do the work that no purchased list can do; they identify the businesses that have something to gain from the product being sold and quietly drop the ones that do not. Operating-hours required is a free liveness check. Franchise exclusion is a one-line filter that removes most of the noise in food and retail verticals.

New cities, new verticals, new countries with Google Maps coverage are all config changes against the same extraction core. The pipeline reruns end-to-end on a monthly or quarterly cadence; turnover is small and the rerun is cheap.

Questions answered in this engagement

How this pipeline works in practice.

How do you avoid hitting Google's bot defenses?

The browser layer runs SeleniumBase in UC mode for the page-load step (which is where most anti-bot fires) and then switches into CDP mode for DOM reads. Sessions are recycled every 50 requests, inter-request delay is randomized in the 1 to 2 second band, and incognito mode keeps cookies clean between place IDs. For the volumes in this engagement (roughly 9,000 page loads over a multi-day window) there were no captcha pages and no IP-level blocks.

Why filter by review-count band rather than 'more is better'?

Because the saturation matters. A salon with 50 reviews is plausibly an operator who wants to grow their customer voice. A salon with 4,000 reviews is national-chain adjacent, where local management has no purchasing authority and a feedback channel is irrelevant. The 60 to 800 band is calibrated for SaaS outreach to local services; it would shift higher for an enterprise tool and lower for early-stage referral campaigns.

What happens when a business closes or changes phone number?

The pipeline does not promise eternal accuracy; it captures the state of the business at extraction time. For monthly or quarterly refreshes the entire pipeline reruns end-to-end against the same metro circles and verticals. The delta is small (1 to 3 percent turnover per quarter in observed data) and the rerun is cheap.

Can this run for other countries or non-Google data sources?

Yes for both. Anywhere Google Maps has business coverage (most of North America, Western Europe, most APAC capitals) the discovery and extraction layers run unchanged; only the metro circles need editing. For non-Google sources the discovery layer is swapped (public business registries, Yelp, regional chambers of commerce) and the filtering and delivery layers stay identical.

Why exclude franchises by default?

Cold outreach to franchise locations has a structurally lower conversion rate for most SaaS, because the location manager rarely has purchasing authority. We exclude them at the filter layer rather than upstream, so the client can still receive them as a secondary dataset and decide case by case (some franchise systems do grant marketing autonomy to local owners).

Contact