Indexing Residential Proxy Services
You've probably seen services like Synthient, ipinfo.io and Spur.us advertise that they can tell you, in real time, whether an IP address belongs to a residential proxy network. It sounds impressive — and expensive. How do you even build that? You'd need to somehow observe millions of IPs across dozens of proxy providers, keep the data fresh, and still turn a profit.
The answer is surprisingly elegant, and it costs almost nothing.
A lesser known fact — even hCaptcha has their own operation following a similar format, using it to score IPs as part of their challenge decisions.
What Is a Residential Proxy, Briefly
A residential proxy routes traffic through a real consumer device — a laptop, phone, or home router — on a real ISP like Spectrum or AT&T. The exit IP looks indistinguishable from an ordinary household connection. That's the whole value proposition: you get a "clean" IP that fraud filters don't recognize as a datacenter.
The devices in these networks are either knowingly opted-in (a bandwidth-sharing SDK bundled in a free app) or quietly compromised. Either way, the IP belongs to a real person's broadband account, which makes it hard to block without collateral damage.
This is why ad verification companies, bot detection vendors, and fraud teams are willing to pay for a reliable proxy database — blocking a residential proxy node doesn't hurt a legitimate user, but blocking a real household does.
The Indexing Technique
Here's where it gets clever. You buy access to the proxy network yourself.
Not much bandwidth. Around $5 worth of traffic from a given provider is enough to index a substantial chunk of their pool.
The Setup
You run your own client and point it at the proxy network you want to index. Your client opens connections through the provider's nodes using HTTPS CONNECT tunnels — which is exactly how proxy clients are supposed to work. Nothing suspicious.
The destination you're tunnelling to, however, is a server you control.
Your Client
|
|-- [HTTPS CONNECT] Residential Proxy Node (the exit IP you want to log)
|
|-- (tunnel) Your Server
Once the CONNECT tunnel is established, the proxy's job is done. It's just forwarding raw bytes between your client and your server.
Writing Identifying Bytes
Instead of sending HTTPS traffic through the tunnel, your client writes a short identifying payload — something like:
MAGIC_BYTES || PROVIDER_ID
A few bytes is all it takes. The magic bytes let your server distinguish this traffic from noise or accidental connections. The provider ID tells you which network this node belongs to, since you're running the same indexing operation across multiple providers simultaneously.
Your server receives the payload, reads the peer address — the IP of the proxy node that opened the tunnel — and writes it to a database along with the provider tag and a timestamp.
That's the whole operation. Simple & minimal bandwidth per connection.
Worth noting: the proxy node can observe how many bytes were transferred per request, so an unusually small payload is a detectable signal. Padding your payload out to a more believable size reduces that fingerprint considerably.
Putting It Together
The full pipeline, start to finish:
- Buy a small amount of traffic from a target provider — $5 goes a long way when you're sending a handful of bytes per connection
- Open HTTPS CONNECT tunnels through provider nodes to a server you control
- Write a short identifying payload (
MAGIC || PROVIDER_ID) through the tunnel — no TLS or HTTP required - Server logs the peer address (the exit IP) along with provider tag and timestamp
- Enrich and score with ASN, WHOIS, recency, and observation count
- Publish as a queryable API with confidence scores and provider attribution
- Repeat continuously to keep the dataset fresh as pools rotate
If you're running detection infrastructure and doing something similar — or defending against it — I'd be interested to hear how your approach differs.
Author contact: dan@sys32.dev