Explained
Rate limiting is how servers protect themselves from abuse. Every public-facing API and website caps the number of requests a single client can send in a window — `100 requests per IP per minute`, `5000 requests per API key per hour`, `30 logins per account per day`. When a client exceeds the cap, the server returns a 429 Too Many Requests response (or in some cases just silently slows replies, queues the request, or starts serving CAPTCHAs).
For scraping and data-collection workloads, rate limiting is the operational floor that determines how fast you can go. The naive solution — sending fewer requests — caps your throughput. The real solution is to distribute requests across many identifiers (IPs, accounts, session IDs) so that no single identifier exceeds its cap. That's the entire reason residential proxies exist as a product category.
Server-side rate limiting algorithms come in a few shapes. Token bucket lets the client accumulate tokens at a steady rate up to a cap, allowing bursts up to the cap size. Sliding window counts requests in a moving time window, smoothing the cap. Fixed window resets the count at clock boundaries (per minute, per hour). Each has different implications for how scrapers should pace requests.
How It Works
On every incoming request, the server identifies the client (by IP, API key, account ID, or session token) and checks a counter for that identifier in the current window. If the counter is under the limit, the request goes through and the counter increments. If the counter is over the limit, the server returns 429 with a `Retry-After` header indicating how long to wait.
Most large APIs use a combination of identifiers: the same IP and account hit different counters with different limits. Cloudflare's rate-limit rules, for example, can scope a limit by IP, by URL path, by session, or by any combination. Some advanced systems use leaky-bucket or sliding-window-counter variants for smoother enforcement.