Scrapingdome
Case study - Tier-1 ticket resale inventory API

Three anti-bot layers mapped, two cookies required, one queryable inventory API.

For a North American ticket data buyer, Scrapingdome delivered an on-demand inventory API against a Tier-1 secondary ticketing marketplace protected by DataDome, AWS WAF and Forter. Send an event identifier, receive the full live inventory snapshot as JSON, at a target of 50,000 events per day. No browser in the production hot path.

Problem

The signal is public; nobody outside the marketplace packages it end to end.

The buyer needed the same per-listing detail a human shopper sees on the event page: section, row, seat range, available quantity, raw price in USD, face value, delivery method, seller notes, listing creation time, and the seat-view image URL where the seller provided one. Across 50,000 events per day, that is roughly 700,000 individual paginated requests if the upstream cooperates, multiplied by N retries when it does not.

The marketplace cooperates with no one outside its own browsers. Three independent defensive systems run in parallel on every page load.

DataDome v2 issues a signed cookie after the client passes a JavaScript challenge. The challenge endpoint is proxied through a subdomain of the marketplace itself, not the public DataDome CDN - a custom integration that defeats off-the-shelf bypass libraries that target the default endpoint.

AWS WAF runs a Silent Challenge via an edge SDK script. The script issues an aws-waf-token cookie only after a successful client-side execution. The CDN in front of the marketplace rejects any inventory POST that arrives without that token.

Forter v2 fingerprints the device for transactional fraud detection. It refuses to issue its token to IP pools it recognizes as residential proxy rotations, regardless of how clean the browser fingerprint is.

Layered above all three sits CloudFront, which short-circuits with a generic 403 against any request its rules reject, masking which downstream layer actually flagged the traffic. There is no documented inventory endpoint. The event detail page is a React single-page application served on a SEO-friendly URL. Inventory does not render server-side, does not flow through GraphQL, and is not embedded in any hydration blob. The standard scraping playbooks return nothing useful.

Approach

Each block closes one unknown in the cost model before the next begins.

01

Endpoint discovery

One day with a real browser, network panel open, walking through a sample concert event end-to-end. Catalogued every Fetch/XHR call, every script tag, every cookie set during a normal session, then repeated the walkthrough with mobile device emulation. The marketplace uses an unusual pattern: a POST against the same URL as the event page, with a small JSON body declaring page size and sort order, returns the full inventory as JSON. The server-side route inspects the content type and branches between HTML and JSON output. No /api/v1/ prefix, no GraphQL, no separate microservice host. Once cookies are in hand, the extractor is plain requests.

02

Anti-bot stack mapping

Which cookies does the POST actually require? Public scraping discussions said "all three". We refused to take that on faith because it changed the cost basis by two orders of magnitude. A controlled test replayed the inventory POST with three increasingly minimal cookie subsets: datadome only, datadome plus session, everything. All three failed with HTTP 403 returning the CloudFront error page, not the DataDome challenge HTML - proving the block originated at the WAF layer, above DataDome. aws-waf-token was mandatory; forterToken was not, because Forter never enforces on read-only inventory routes. One test eliminated an entire class of expensive solutions.

03

Proxy provider qualification

Every residential proxy provider has its own destination blocklist. The first provider trialled returned CONNECT tunnel failed, response 403 on every request to the marketplace's domain - the block was at the provider's egress, not the marketplace's edge. A 30-second curl probe became the first step of any provider evaluation: confirm the proxy passes the target domain at the TCP/TLS layer before any production integration. The second provider tested cleared the domain cleanly, issued a datadome cookie reliably under a real headless browser session, and was about a quarter of the cost of the premium tier we would have needed for Forter compatibility.

04

Two-stage cookie warmer plus browserless extractor

A warmer opens one headless browser session through the qualified residential proxy. It navigates to the event URL, lets the DataDome JavaScript challenge complete, lets AWS WAF deliver its Silent Challenge token, and harvests both cookies. One warmer pays the cost of running a browser exactly once. A browserless extractor takes those cookies and a freshly minted aws-waf-token from a CAPTCHA solver service running the AWS WAF browserless task, attaches them to a requests.Session, and walks the paginated inventory POST through the same residential proxy. The solver contract is URL in, cookie out. No browser, no JavaScript runtime, no Chromedriver overhead inside the hot loop.

05

Cookie reuse and the unit economics

A single browser-issued datadome cookie sustained 700 consecutive inventory POSTs over twelve minutes against a clean residential egress, with zero blocks. A single solver-issued aws-waf-token sustained at least 120 consecutive POSTs in the same conditions; the upper bound was not exhausted in testing because the practical reuse rate already drove the per-event solver cost into the rounding error. The final operating model: one warmer session per ~700 paginated requests, one solver token per ~120 paginated requests, one residential proxy session pinned for the lifetime of the cookies, exponential backoff on transient CloudFront 503s, and an idempotency cache against the marketplace's listing IDs so duplicate work across overlapping queries does not double-bill the buyer.

Scale and outcome In production

50,000 events per day, no browser in the hot path.

The validation event was a 70,000-seat stadium concert with 693 active listings, paginated across 70 batches of 10 listings each. The extractor fetched the full inventory snapshot in 5.8 seconds of solver time plus ~42 seconds of pagination. Each listing returned fifty-plus fields including section, row, raw price in USD, face value, ticket class, delivery method, seller notes, and the seat-view image URL.

~50ms
average inventory POST round-trip once cookies are warm
700+
consecutive POSTs reusable per datadome cookie
0
browser automation in the production hot path

At target volume of 50,000 events per day, the system runs as roughly 700,000 inventory POSTs per day, served by approximately 70 warmer sessions and approximately 6,000 solver tokens per day - both well within the headroom of a single small VPS and a pay-as-you-go account on each external service. Three anti-bot layers mapped, of which two are actively defeated (DataDome, AWS WAF) and one is bypassed by route selection (Forter). Fourteen documented validation tests across the engagement, each closing a single unknown in the cost model.

What this proves

Anti-bot stack mapping is a discipline, not a guess.

Three defensive layers do not require three solutions. Each one is calibrated for a specific traffic class - checkout fraud, scraping pressure, edge filtering - and answering precisely which layer enforces on which route is what eliminated the most expensive component from the build. The buyer paid for one solver dependency, not three.

Validation-first cost engineering. No proxy was purchased, no solver balance was funded, no production architecture was committed before a focused 10-second test had ruled it in or out. The first residential proxy we trialled cost us 30 seconds of curl to disqualify. The Tier-1 cookie architecture was eliminated by one POST request comparing three cookie subsets. Every direction we did not take was disqualified explicitly and on the record.

Provider-agnostic architecture. The production code does not know which residential proxy provider it talks to, which CAPTCHA solver service issues the WAF token, or which headless browser library runs the warmer. Each is behind a one-function interface. A buyer who already has a contract with a different proxy provider can drop in their credentials. A buyer who later wants to switch solver vendors as one offers a better rate can do so without touching the extractor.

Questions answered in this engagement

How this pipeline works in practice.

How long does one extraction take, end-to-end?

For a freshly warmed session: about 6 seconds of solver wait plus 40 to 60 seconds of paginated extraction for a large event with 500+ listings, scaling linearly with listing count. Once the warmer cookies and a solver token are in hand, subsequent events that share the warmer session add only the per-event pagination time.

What happens when the marketplace updates its anti-bot stack?

Each layer is monitored independently. A change in DataDome's challenge endpoint, AWS WAF's challenge script URL, or Forter's fingerprint behaviour triggers a contained re-validation of one component, not a full rebuild. The discovery scripts we used to map the stack in the first place are reusable as monitoring probes.

How is the output normalized?

The marketplace returns prices in the visitor's locale currency in the formatted string fields, but the canonical rawPrice field is always USD regardless of the proxy geography. The extractor uses rawPrice and re-formats currency strings on the buyer's side. The buyer never receives the marketplace's internal telemetry fields (deal scoring, internal hash IDs, sponsored flags); those are filtered before output.

Can this approach be reused on adjacent ticketing marketplaces?

The discovery methodology - endpoint surface mapping, cookie-subset gating, provider qualification, two-stage warmer plus extractor split - generalizes. The specific anti-bot stack will differ. We treat each new marketplace as a fresh 2-to-3-day discovery engagement before quoting a build, because skipping that step is what causes scraping projects to overrun by 5x.

What about rate limits at scale?

The marketplace does not publish rate limits and does not return them in response headers. Empirical testing showed transient CloudFront 503s occur sporadically and recover within seconds; they are not predictive of cookie invalidation. The production code treats 503 as a retryable transient and uses exponential backoff. Cookie invalidation, when it does occur, surfaces as a clean 403 from the WAF or a DataDome challenge response and triggers a re-warm.

What is the buyer responsible for operating?

A small Linux VPS, an account with a residential proxy provider, and an account with a CAPTCHA solver vendor. The code, the configuration, the discovery documentation, and the operational runbook are delivered as part of the engagement. The buyer's monthly run cost is the sum of those three external services and is dominated by proxy bandwidth, not by solver fees, which sit in the single-digit dollars per day at target volume.

Contact

Need an on-demand inventory API behind a triple-layer anti-bot stack? Tell us about it.