Scrapingdome
Available for new engagements

Scraping infrastructure you don't have to think about.

Resilient extraction systems for protected platforms, hidden APIs, and complex sources. Built, monitored, and maintained for teams that need data, not another system to babysit.

150K+
retail products monitored every 2 to 5 hours behind enterprise bot management
125K+
automotive parts synchronized daily from protected marketplaces
100K+
real estate listings deduplicated and enriched with tax records
96K+
business entities extracted, classified with AI, and enriched with contact data
50K+
companies classified by LLM through a configurable framework
225+
counties monitored for court filings with ownership distress signals
What clients say

Voices from production engagements.

5 stars across 21 engagements

This guy really knows his stuff. We had used many different people before, but he really broke down the issue and thought of a long-term scalable solution. He will be a very important part of our project.

Founder, UK grocery intelligence platform

They get it done. They had an answer for every random question I threw at them, and a solution for every tricky problem I wanted to address.

Real estate investor, US forensic title research

An absolute expert in their field. They brought new ideas to the table and delivered way ahead of schedule.

Founder, US event intelligence and CAC tooling

Delivered on time. Any corrections needed, they quickly resolved per our timeline. Would hire again.

Operations lead, US municipal government data

From the proposal till delivery they were very professional, creative, and acted like part of the project bringing new ideas.

Founder, real estate with anti-bot circumvention

They patiently listened, answered very quickly, and delivered the project better than my initial projection, with additional ideas.

Founder, real estate intelligence engine

Focuses on defining successful outcome, which helped me think through my needs more.

Operations lead, US storage tech POC

Their ability to explain technical concepts in a way I could easily understand was invaluable, especially for someone like me who isn't a coding expert.

Founder, real-time game data extraction

The deliverables have been of excellent quality, as well as the speed of the work.

Founder, real estate properties extraction
Selected work

Production systems running today.

Each engagement is a system delivered into the data format the client already uses: database, dashboard, sheet, or scheduled report.

Government and public records

Court filings, business entities, municipal monitoring

Multi-jurisdiction court filings monitoring with ownership distress signals, live across NC and TX. 1,453 municipal meetings analyzed and indexed across five CMS platforms. Same platform-adapter architecture scales to 40+ counties without code changes.

Read the county minutes case study

Commercial data at scale

Retail, automotive, real estate

150K+ grocery products monitored every 2 to 5 hours across UK retailers behind enterprise bot management. 125K+ automotive parts synchronized daily. 100K+ real estate listings deduplicated and enriched with tax records.

Read the UK grocery case study

AI-augmented processing

Classification, normalization, entity extraction

50K+ companies classified via LLM with a configurable framework. 1,000+ medical clinic sources standardized for terminology. Entity extraction, validation, and classification pipelines in production.

Read the targeted leads case study

Open-data property intelligence

National AVM on open public data, France

End-to-end address-to-valuation AVM across the whole of France in six weeks: 4.5M cleaned DVF mutations, 30K+ communes addressable, three submodels reconciled by inverse-variance, 12.0 to 12.5 percent MdAPE on urban apartments. No commercial data licence cost.

Read the France AVM case study

Authenticated research datasets

Classical music conductor population, end to end

~10K conductor profiles harvested from a signed JavaScript application behind authentication. Nine parallel workers from three one-time passwords across three nodes. Two days of wall-clock for the full conductor category, zero unhandled errors, relational parent-child schema, fully resumable.

Read the Operabase case study

High-defense extraction

On-demand inventory API, ticket resale marketplace

Queryable inventory API against a Tier-1 secondary ticketing marketplace defended by DataDome, AWS WAF and Forter. 50K events per day, ~50ms POST round-trip once cookies warm, 700+ POSTs per datadome cookie, zero browser automation in the production hot path.

Read the ticketing case study

Anti-bot expertise

Protected platforms behave predictably.

Nine years in data extraction with a network security background. The systems we build keep running because the protections in front of them are part of the design, not an afterthought.

Cloudflare

Bot Management, Turnstile, JS challenges. Approached at the network layer when possible, browser-based when necessary.

Akamai Bot Manager

Sensor data analysis and sustained extraction at retail scale. Currently running 150K+ products at 2 to 5 hour intervals.

DataDome

Device fingerprinting, behavioral signals, request shaping. In production for daily synchronization at six-figure volumes.

PerimeterX / HUMAN

Persistent session strategy, environmental signal stability, and Human Challenge handling without breaking schedules.

Captcha workflows

reCaptcha v2 and v3, hCaptcha, Turnstile. Solver integration where appropriate, avoidance strategies where possible.

Hidden APIs

Mobile and web reverse engineering. Direct platform connections over headless browsers when the protocol allows it.

Productized initiative In production

CivicMine: US government data, on tap.

A library of platform adapters for the 90,000+ local governments in the United States. New government, same adapter. New data type, same adapters with different classifiers. The product is the accumulated knowledge of how these platforms work, packaged for reuse.

5
verticals validated with real clients in production
300+
Socrata data portals identified across the country
7K+
Granicus organizations covered by a single adapter
Platforms covered or in active mapping
  • Tyler Odyssey
  • Socrata
  • Granicus
  • PrimeGov
  • Legistar
  • CivicPlus
  • AgendaCenter
  • CivicClerk
  • eScribe
  • AgendaLink
  • Laserfiche
  • SwagIT
  • Hyland OnBase
  • IQM2
  • CivicWeb
  • BoardDocs
  • Diligent
  • Revize
  • Municode
  • Destiny
  • VEconnect
  • Clerk of Courts
  • Property Appraisers
  • Tyler EnerGov
  • ArcGIS Hub
  • CKAN
  • Accela
  • OpenGov

Permits in Florida. Court filings in North Carolina. Meeting minutes across Utah. Business registrations in New York. Different verticals, same pattern: identify the platform, activate the adapter, configure filters and classifiers, deliver.

Explore the CivicMine hub

How it works

You describe the outcome. We deliver the system.

Best fit for operations leads, founders, research teams, and technical leaders who want to delegate the problem, not collaborate on the solution.

01

You describe the problem

Outcome, scale, deadline. We do not need a specification, we need to understand the result you need.

02

We design and build

Architecture, stack, anti-bot strategy, scheduling, delivery format. If your stated approach has a better alternative, you hear it before any quote.

03

You receive the system running

Database, dashboard, sheet, scheduled report. Delivered into the format your team already uses, with the system running, not as code thrown over a wall.

Stack and capabilities

Built around the platform, not the framework.

Direct platform connections over browsers when possible. The result matters, the plumbing does not.

Languages
Python, TypeScript and JavaScript, with the right tool for each layer of the system.
Anti-bot
Cloudflare, Akamai, DataDome, PerimeterX. Captcha and reCaptcha workflows. Network-level reverse engineering.
AI integration
LLM-based classification, entity extraction, validation, and normalization. Production pipelines, not prototypes.
Data and delivery
PostgreSQL, Supabase, structured pipelines, scheduled extraction, dashboards, and reports your team already reads.
Contact

Tell us what you are trying to figure out.