This article is general educational information, not legal advice. Laws differ by country and change over time, and how they apply depends on your specific facts. For your situation, consult a qualified lawyer.
“Is web scraping legal?” is one of the most-searched questions in the data world, and the honest answer is: usually yes, but it depends. Web scraping itself, the act of programmatically reading publicly available web pages, is broadly lawful in many jurisdictions. What turns a scrape from clearly fine into legally risky is the combination of three things: what you collect, how you collect it, and where you and your targets are located.
This is a plain-English overview of the legal landscape: the principles that decide most cases, the landmark rulings worth knowing, and the practical habits that keep scraping on the right side of the line. It won’t replace a lawyer, but it’ll help you ask the right questions.
The short version
For most scraping of publicly available, non-personal data, done without bypassing access controls and without overloading the target, courts in the US and elsewhere have generally been permissive. The risk rises sharply when you cross into any of these:
- Scraping personal data (names, emails, profiles) — triggers privacy law.
- Scraping behind a login or paywall you’re not authorized to bypass.
- Republishing copyrighted content rather than extracting facts.
- Degrading the target’s servers with excessive load.
- Violating a site’s Terms of Service in a way that creates contract liability.
Stay in the safe zone, public, non-personal, respectful, factual, and you’re on solid ground in most places. Step into the risk zone and “is it legal” becomes a real, fact-specific question.
The frameworks that actually decide it
Legality isn’t one law, it’s several overlapping bodies of law, and a given scrape can touch more than one.
1. Computer-access laws (e.g. the US CFAA). The Computer Fraud and Abuse Act penalizes “unauthorized access” to computer systems. The key question is whether scraping public pages is “unauthorized.” Recent US law has narrowed this considerably (see the cases below), public data accessible to anyone with a browser is generally not “unauthorized access.” Accessing data behind authentication you’re not entitled to use is a different story.
2. Contract law / Terms of Service. Most sites’ ToS prohibit automated access. Breaching ToS is generally a contract matter, not a crime, but it can expose you to civil liability (breach of contract). Courts treat “clickwrap” terms (you clicked “I agree”) more seriously than “browsewrap” (a link in the footer you never interacted with). Breaching ToS doesn’t make scraping criminal, but it’s a real civil risk.
3. Copyright. Facts and data are not copyrightable; creative expression is. Extracting prices, specs, or statistics is far safer than copying and republishing articles, photos, or other original content. If you reproduce copyrighted material, you’re in copyright territory, where fair use / fair dealing and licensing come into play.
4. Database rights (especially in the EU). The EU’s sui generis database right protects substantial investment in compiling a database, even when the individual facts aren’t copyrightable. Scraping and re-using a substantial part of a protected database can infringe this right in the EU, with no US equivalent.
5. Privacy / data-protection law (GDPR, CCPA, and others). This is the big one for personal data. The GDPR applies to personal data of people in the EU regardless of where you scrape from, and it generally requires a lawful basis, transparency, and respect for individuals’ rights. Scraping personal data, faces, profiles, contact details, is the highest-risk category, and several regulators have issued large fines over it. California’s CCPA/CPRA and a growing list of other privacy laws add their own requirements.
6. Trespass to chattels. An older doctrine that can apply when scraping harms the target’s systems, for example by overloading servers. The harm, not the access, is the trigger.
The takeaway: there’s no single “scraping law.” Whether a scrape is lawful depends on which of these it touches, and that’s driven by what data, how, and where.
Landmark cases worth knowing
A few rulings have shaped how this plays out in practice. (Case law evolves, treat these as orientation, not the current final word.)
hiQ Labs v. LinkedIn (US, 9th Circuit). hiQ scraped public LinkedIn profiles. The courts indicated that scraping publicly available data is unlikely to be “unauthorized access” under the CFAA, a major signal that public-data scraping isn’t criminal hacking. Notably, hiQ later faced liability on contract grounds for breaching LinkedIn’s terms, illustrating that the CFAA and ToS are separate questions.
Van Buren v. United States (US Supreme Court, 2021). The Court narrowed the CFAA’s “exceeds authorized access” clause: using access you legitimately have for an improper purpose isn’t automatically a CFAA violation. This reduced CFAA exposure for many scraping scenarios.
Meta v. Bright Data (US, N.D. Cal., 2024). A court found that scraping public Facebook and Instagram data did not breach Meta’s terms, in part because the scraper wasn’t logged in when collecting public data. Another data point that public, logged-out scraping sits on firmer ground than scraping behind authentication.
Clearview AI (EU/UK regulators). Regulators fined Clearview for scraping facial images, personal data, to build a recognition database without a lawful basis. A clear illustration that personal data scraping is governed by privacy law, where the rules are strict.
The pattern across these: public, logged-out, non-personal, factual scraping is the safest ground; authentication, personal data, and republished content are where the legal risk concentrates.
Where do proxies fit in?
A common misconception is that using a proxy changes the legal picture. It doesn’t, in either direction.
A residential proxy is a routing tool, the same kind of infrastructure that powers CDNs, VPNs, and corporate networks. Using one is lawful. But a proxy doesn’t launder legality: routing an unlawful scrape through a proxy doesn’t make it lawful, and routing a lawful scrape through a proxy doesn’t make it unlawful. Proxies change which IP a request comes from, not whether you should be making it.
What proxies legitimately help with is operating responsibly at scale, distributing load so you don’t hammer a single endpoint, and reaching geo-appropriate content. The legality of the underlying activity is unchanged. (Our acceptable use policy sets out what’s permitted on Shifter, and it tracks exactly these principles.)
Practical best practices to stay on the right side
You can’t get legal certainty from a blog post, but you can dramatically lower your risk by building these habits in. They also happen to be good engineering.
- Scrape public data, not data behind a login. Authentication is a bright line. If you have to log in or bypass an access control to reach it, treat it as high-risk and get advice.
- Avoid personal data unless you have a lawful basis. Names, emails, profiles, and especially biometric or sensitive data invoke privacy law. If you don’t need personal data, don’t collect it. If you do, get proper advice on your basis and obligations.
- Respect robots.txt where it’s load-bearing. It isn’t a law, but honoring robots.txt and a site’s stated wishes is strong evidence of good faith, and it’s the norm.
- Don’t degrade the target. Rate-limit, scrape during off-peak where reasonable, and never let your collection harm the site’s performance. Server harm is what trespass claims are built on. (Good scraping practices and lawful behavior overlap heavily.)
- Extract facts, don’t republish creative content. Prices, specs, and data points are far safer than copying articles, images, or other original expression.
- Read the Terms of Service. Know what you’re agreeing to, especially clickwrap terms, and weigh the contract risk of breaching them.
- Mind jurisdiction. EU data subjects bring GDPR into play wherever you operate; EU databases bring database rights; your own country’s laws apply too. Cross-border scraping multiplies the rulebooks.
- Document your purpose and process. Legitimate, well-documented use (price comparison, research, monitoring) is easier to defend than vague or aggressive collection.
These principles are the same ones behind responsible dataset building and training-data collection, compliance and quality pull in the same direction.
FAQ
Is web scraping legal? In general, scraping publicly available, non-personal data without bypassing access controls or harming the target is broadly lawful in many jurisdictions. It becomes legally risky when it involves personal data, authentication/paywalls, copyrighted content, server harm, or a Terms-of-Service breach. It always depends on the specific facts and the jurisdiction.
Is scraping public data legal? Public, logged-out data is the safest ground, US case law has repeatedly indicated that scraping publicly accessible pages is unlikely to be “unauthorized access.” But public doesn’t mean unrestricted: if that public data is personal data, privacy law still applies, and republishing copyrighted public content still raises copyright issues.
Does violating Terms of Service make scraping illegal? Not criminal, but potentially a civil problem. Breaching ToS is generally a contract matter that can expose you to breach-of-contract liability, separate from computer-access laws. Clickwrap terms (you actively agreed) carry more weight than browsewrap (a footer link).
Is scraping personal data legal? This is the highest-risk category. Personal data triggers privacy laws like the GDPR (for people in the EU, wherever you scrape from) and CCPA, which generally require a lawful basis and impose obligations. Several regulators have fined companies for scraping personal data without a basis. Get legal advice before scraping personal data.
Does using a proxy make scraping legal or illegal? Neither. A proxy is a lawful routing tool; it changes which IP a request comes from, not whether the underlying activity is permitted. It can’t make an unlawful scrape lawful, and it doesn’t make a lawful scrape unlawful.
Is it legal to scrape copyrighted content? Extracting facts (prices, specs, numbers) is generally safe because facts aren’t copyrightable. Copying and republishing original creative content, articles, photos, video, can infringe copyright unless covered by fair use / fair dealing or a license.
The bottom line
Web scraping is, for the most part, legal, especially when you collect public, non-personal, factual data without bypassing access controls or harming the site. The legal risk lives at the edges: personal data, authentication, copyrighted content, server overload, and contract terms. Most of staying compliant is simply staying out of those edges and acting in good faith.
None of this is a substitute for advice on your specific project, when in doubt, talk to a lawyer. But the principles are consistent and learnable: scrape what’s public, take only what you need, don’t hurt the target, respect privacy and copyright, and know your jurisdictions. Do that, and a quality residential proxy network is just responsible infrastructure for collecting public data at scale, the same way it’s meant to be used. For more on web scraping itself, start with what web scraping is and how it supports a business.