Explained
The User-Agent header is one of the oldest pieces of identification on the web. Every HTTP request a client sends includes a `User-Agent:` line that names the client software (browser or library), its version, the operating system, and often the rendering engine. Servers use it for content negotiation (mobile vs desktop layouts), analytics, and increasingly bot detection.
For scraping, the User-Agent is the simplest signal you can get wrong. Default User-Agents from common HTTP libraries (`python-requests/2.31.0`, `axios/1.5.0`, `okhttp/4.10.0`) are dead giveaways. Even Playwright and Puppeteer's default User-Agent contains `HeadlessChrome` which anti-bot systems instantly flag.
The naive fix is to set a Chrome UA on every request. The catch is that User-Agent alone isn't enough — modern fingerprinting cross-references the UA against `sec-ch-ua` client hints, the TLS fingerprint, the JavaScript navigator object, and the Accept-Language header. Setting a Chrome UA but leaving Python's TLS handshake exposed creates a bigger red flag than just leaving the default UA in place.
How It Works
When your client opens a connection to a server, the request line includes headers. The `User-Agent` line is one of them. Servers parse it (or just log it) and route the request based on what they see. A modern Chrome User-Agent on Windows looks like: `Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36`. The 'Mozilla/5.0' prefix is historical; every browser sends it for legacy compatibility reasons.
Real browsers also send `sec-ch-ua` ('Sec-CH-UA' Client Hints) headers in modern HTTP, which carry structured browser/version/platform data. Anti-bot systems compare those to the User-Agent string and flag inconsistencies. So User-Agent rotation has to be paired with consistent `sec-ch-ua-*` updates and matching TLS fingerprints.