Glossary

What Is a CAPTCHA?

A CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) is a challenge-response system websites use to distinguish human visitors from automated bots, typically by asking the visitor to solve a puzzle that's easy for humans but difficult for software.

Understand how CAPTCHAs work, why they appear, the modern variants (image, audio, invisible, behavioral), and how to avoid triggering them when scraping.

Explained

A CAPTCHA is the bot-detection challenge you see when a website thinks your traffic might be automated. The classic form is a distorted-text image you have to read; modern forms are image-grid puzzles ('select all squares with traffic lights'), audio challenges, and behavioral / invisible CAPTCHAs that watch how you interact with the page and silently grade you as human or bot before you even know there's a check.

The major CAPTCHA vendors today are Google reCAPTCHA (v2, v3, and the Enterprise tier) and hCaptcha, both used widely behind anti-bot stacks like Cloudflare and Akamai. Cloudflare also runs its own Turnstile challenge as a reCAPTCHA / hCaptcha alternative. Each system uses a different blend of image puzzles, browser fingerprinting, mouse/keyboard movement analysis, and IP reputation to compute a 'human likelihood' score.

For scraping and data-collection workflows, the right answer to a CAPTCHA challenge isn't to solve it — it's to avoid triggering it in the first place. CAPTCHAs fire when the request looks suspicious (datacenter IP, missing headers, mismatched fingerprint, bursty pacing), so the cleanest fix is hygiene: residential / mobile IPs, modern browser headers, realistic timing, and proper TLS fingerprints. When a CAPTCHA does appear, rotating to a fresh IP usually reaches a clean path.

How It Works

When a request arrives, the website (or its anti-bot vendor) computes a risk score using signals like: source IP reputation (datacenter? recent abuse? country?), browser fingerprint (User-Agent, sec-ch-ua, screen size, plugins), TLS handshake fingerprint (JA3/JA4), behavioral signals (mouse movement, key timing, time spent on page), and historical reputation of cookies / tokens.

If the score is below the human-confidence threshold, the system inserts a CAPTCHA challenge into the response. The challenge requires the client to perform a task (read text, click matching images, pass an invisible behavioral check) and submit a token back to the server. Without a valid token, the server rejects subsequent requests.

Types

Text-Based CAPTCHA

Distorted-text images the user has to read and type. The classic form, mostly retired today because OCR and ML can solve them.

Image-Grid CAPTCHA (reCAPTCHA v2)

'Select all squares containing traffic lights / crosswalks / fire hydrants.' Common as a fallback when invisible reCAPTCHA flags the request.

Invisible / Behavioral CAPTCHA (reCAPTCHA v3, Turnstile)

No user interaction required in most cases. The system fingerprints the page and visitor passively and returns a score (0.0 = bot, 1.0 = human). The site decides what threshold to enforce.

Audio CAPTCHA

Spoken-digit or word challenges, mostly an accessibility fallback for visual CAPTCHAs.

hCaptcha

An alternative to reCAPTCHA used by Cloudflare and many privacy-focused sites. Functionally similar — image-grid challenges + behavioral signals — with different policy and economics.

Common Use Cases

Form-spam prevention on signup and contact forms
Login throttling against credential stuffing
Comment / forum bot defense
Booking-bot defense (tickets, sneakers, reservations)
Scraper deterrence on protected APIs and pages
FAQ

Frequently asked FAQ questions

Common questions about captcha.

Your traffic is triggering a 'looks like a bot' signal. The most common causes: datacenter IPs (instantly flagged), missing or mismatched browser headers, no JavaScript engine when the site expects one, bursty same-IP pacing, or stale TLS fingerprints. Switch to residential proxies, send modern headers, and pace your requests.