Glossary

What Is Rate Limiting?

Rate limiting is a server-side defense that caps how many requests a single client (identified by IP, account, or session) can make in a given time window, used to protect infrastructure, prevent abuse, and ensure fair resource usage.

Understand how websites and APIs throttle high-volume clients, the algorithms behind rate limits (token bucket, sliding window, fixed window), and how rotating proxies defeat per-IP caps.

Explained

Rate limiting is how servers protect themselves from abuse. Every public-facing API and website caps the number of requests a single client can send in a window — `100 requests per IP per minute`, `5000 requests per API key per hour`, `30 logins per account per day`. When a client exceeds the cap, the server returns a 429 Too Many Requests response (or in some cases just silently slows replies, queues the request, or starts serving CAPTCHAs).

For scraping and data-collection workloads, rate limiting is the operational floor that determines how fast you can go. The naive solution — sending fewer requests — caps your throughput. The real solution is to distribute requests across many identifiers (IPs, accounts, session IDs) so that no single identifier exceeds its cap. That's the entire reason residential proxies exist as a product category.

Server-side rate limiting algorithms come in a few shapes. Token bucket lets the client accumulate tokens at a steady rate up to a cap, allowing bursts up to the cap size. Sliding window counts requests in a moving time window, smoothing the cap. Fixed window resets the count at clock boundaries (per minute, per hour). Each has different implications for how scrapers should pace requests.

How It Works

On every incoming request, the server identifies the client (by IP, API key, account ID, or session token) and checks a counter for that identifier in the current window. If the counter is under the limit, the request goes through and the counter increments. If the counter is over the limit, the server returns 429 with a `Retry-After` header indicating how long to wait.

Most large APIs use a combination of identifiers: the same IP and account hit different counters with different limits. Cloudflare's rate-limit rules, for example, can scope a limit by IP, by URL path, by session, or by any combination. Some advanced systems use leaky-bucket or sliding-window-counter variants for smoother enforcement.

Types

Token Bucket

A bucket fills with tokens at a steady rate up to a cap. Each request consumes one token. Allows bursts up to bucket size, smoothed over time. The most common algorithm in modern API rate limiting.

Leaky Bucket

Requests enter a queue (bucket) that drains at a fixed rate. Excess requests overflow and are rejected. Smooths bursts more strictly than token bucket.

Fixed Window

Counter resets at clock boundaries (every minute, every hour). Simple to implement but has burst issues at window boundaries (a client can send 2x the cap by spanning two windows).

Sliding Window

Counter looks at the last N seconds rather than discrete windows. Smoother enforcement than fixed window. Sliding-window log and sliding-window counter are common variants.

Adaptive / Behavioral Rate Limiting

Limits adjust based on request patterns and risk score. Used by anti-bot vendors (Cloudflare, Datadome) — the same client might be allowed 1000 requests/minute or 10 depending on how 'real' the traffic looks.

Common Use Cases

Protecting public APIs from abuse

Preventing credential-stuffing attacks (login rate limits)

Managing infrastructure cost at scale

Enforcing fair usage tiers across customers

Slowing down scrapers and content abuse

Throttling retry storms during outages

FAQ

Frequently asked FAQ questions

Common questions about rate limiting.

By distributing requests across many IPs. If a target rate-limits at 60 requests per IP per minute and you rotate per-request through a residential pool of 10,000 IPs, you can theoretically hit 600,000 requests per minute against the target without any single IP tripping its cap. In practice, behavioral signals also matter, but IP rotation is the foundational technique.

What Is Rate Limiting?

Explained

How It Works

Types

Token Bucket

Leaky Bucket

Fixed Window

Sliding Window

Adaptive / Behavioral Rate Limiting

Common Use Cases

Related Terms

Frequently asked FAQ questions

How do rotating proxies defeat IP rate limits?

What does a 429 response mean?

Can I bypass rate limits without proxies?

How do I detect rate limits without hitting them?

What's the difference between rate limiting and a CAPTCHA?

Does Shifter help with rate limit avoidance?