2026-03-28

10 min read

Web ScrapingLegalGDPRCFAAGuide

Is Web Scraping Legal in 2026? What the Google vs SerpApi Lawsuit Means for You

"Is web scraping legal?" is the most asked question on every scraping forum, Quora thread, and Reddit post about the topic. The answer hasn't changed: it depends. But the details of what it depends on shifted significantly in 2025-2026.

Google's DMCA lawsuit against SerpApi (which pulled 93 upvotes on r/webscraping and sparked weeks of debate) put a new wrinkle into the picture. So let's walk through what's actually legal, what's risky, and what will get you sued.

Disclaimer: This article is informational. It is not legal advice. If you're building a scraping operation with real legal exposure, talk to an attorney who specializes in internet law. Every jurisdiction and use case is different.

The short answer

Three rules cover most situations:

Public data is generally fair game. If anyone with a browser can see it without logging in, courts have repeatedly said scraping it is not unauthorized access.
Data behind authentication is risky. Logging into someone else's account, scraping behind paywalls, or circumventing technical access controls gets into CFAA and DMCA territory.
Personal data has its own rules. If you're scraping names, emails, or other PII -- especially of EU residents -- GDPR applies regardless of where your servers sit.

That's the framework. Everything below is the detail.

Key legal precedents you need to know

hiQ v LinkedIn (2022) -- public data wins

This is still the most important scraping case in US law. hiQ scraped public LinkedIn profiles to build workforce analytics tools. LinkedIn sent a cease-and-desist, then blocked hiQ's IP addresses. hiQ sued.

The Ninth Circuit ruled that scraping publicly available data does not violate the Computer Fraud and Abuse Act (CFAA). The logic: the CFAA prohibits access "without authorization," but you can't "lack authorization" to access something that's open to everyone. The Supreme Court declined to hear LinkedIn's appeal.

What it means for you: scraping public web pages -- data that any visitor can see -- is not a CFAA violation under current Ninth Circuit precedent.

Google v SerpApi (2025-2026) -- DMCA enters the chat

This is the case that changed the conversation. Google sued SerpApi not under the CFAA, but under the DMCA's anti-circumvention provisions (Section 1201). Google's argument: SerpApi bypassed technical measures designed to restrict automated access to search results, and that constitutes circumvention of a "technological protection measure."

This is a different legal theory than anything used in previous scraping cases. The CFAA asks "did you have authorization?" The DMCA asks "did you bypass a technical barrier?" Those are fundamentally different questions.

The case is still being litigated as of early 2026, so there's no final ruling. But the legal theory is already having a chilling effect. If courts accept that anti-bot measures are "technological protection measures" under the DMCA, then bypassing CAPTCHAs, fingerprinting, or rate limits could carry statutory damages up to $2,500 per violation.

Van Buren v United States (2021) -- narrowing the CFAA

The Supreme Court ruled that the CFAA's "exceeds authorized access" provision applies to people who access data they're not supposed to -- not people who use authorized data in unauthorized ways. This narrowed the CFAA significantly and made it harder to use as a weapon against scraping of public data.

US vs EU: different rules, different risks

United States

Two federal laws matter:

CFAA (Computer Fraud and Abuse Act) -- prohibits accessing a computer "without authorization" or "exceeding authorized access." After hiQ and Van Buren, this is hard to apply against scraping of public data. Still very much applies to scraping behind authentication.
DMCA Section 1201 -- prohibits circumventing technological protection measures. This is the new frontier, thanks to Google v SerpApi. If anti-bot systems count as TPMs, the DMCA could become the primary legal weapon against scraping.

State laws also matter. California's CCPA gives consumers rights over their personal data. Illinois's BIPA covers biometric data specifically. Virginia, Colorado, Connecticut, and others have their own privacy laws.

European Union

GDPR -- if you're scraping personal data of EU residents, GDPR applies. You need a legal basis for processing (legitimate interest is the usual claim, but it requires a balancing test). Data subjects have the right to object to scraping of their personal data. Fines can reach 4% of global annual revenue.
Database Directive -- the EU gives database creators a sui generis right to prevent extraction of a substantial part of their database, even if the individual data points aren't copyrighted. This has no US equivalent and is a real risk for large-scale scraping of EU-based databases.
AI Act -- the EU's AI Act includes provisions about data used for training AI models. If you're scraping data to feed into AI training, there are transparency and documentation requirements that apply.

What robots.txt actually means (legally)

Robots.txt is a voluntary standard. It's a request, not an access control mechanism. No court has ruled that violating robots.txt alone constitutes unauthorized access.

But courts do reference it. In several cases, judges have looked at whether a scraper respected or ignored robots.txt as evidence of good or bad faith. Ignoring robots.txt won't get you convicted, but it can make you look worse in front of a judge.

The practical advice: respect robots.txt when you can. If you have a legitimate reason to scrape pages blocked by robots.txt (and the data is public), document that reason. It's a factor, not a rule.

Terms of Service: contract law, not criminal law

Almost every website's ToS says "no scraping." Does that matter?

Legally, a ToS is a contract between you and the site operator. Violating it is a breach of contract -- a civil matter. It is not a criminal offense. After Van Buren, courts have largely rejected the idea that ToS violations equal CFAA violations.

That said, breach of contract is still a lawsuit. If you scrape a site whose ToS prohibits it, the site owner can sue you for breach. They'd need to show damages, which is often hard for public data. But the lawsuit itself is expensive to defend.

The takeaway: ToS violations are a civil risk, not a criminal one. The risk scales with how much money is at stake and how aggressively the site owner enforces their terms.

The "public data" safe harbor

This is the principle that matters most for web scraping in practice. If data is visible to any unauthenticated browser user, scraping it is generally defensible because:

The CFAA doesn't apply to accessing public data (hiQ v LinkedIn)
There's no "circumvention" if you're accessing data that's freely available
Copyright claims are weak on factual data (names, prices, addresses aren't copyrightable -- the arrangement might be)

This doesn't mean you can scrape anything that's technically reachable. "Public" means available without authentication, not "I found a way to access it."

When scraping gets risky

Here's where people get into trouble:

Behind authentication

Logging into accounts (yours or others') to scrape data behind a login wall is risky. It can trigger CFAA claims because you're accessing data that isn't public. If you're using credentials you aren't authorized to use, it's even worse.

Personal data at scale

Scraping and storing personal data -- names, emails, phone numbers, addresses -- puts you squarely in GDPR territory (for EU data) and state privacy law territory in the US. The Clearview AI lawsuits are a cautionary tale.

Circumventing access controls

This is the Google v SerpApi risk. If a site uses CAPTCHAs, browser fingerprinting, or other technical measures to restrict access, bypassing them could be DMCA circumvention. The law is unsettled, but the risk is real.

Rate abuse and denial of service

Slamming a site with thousands of requests per second isn't just bad manners -- it can be a crime. The CFAA covers "intentionally causing damage to a protected computer," and taking down a site with aggressive scraping could qualify.

Competitive harm

If you're scraping a competitor to replicate their core product or database, expect legal action. Even if the scraping itself is legal, using scraped data to build a competing database can trigger trade secret, unfair competition, or database right claims.

Best practices for staying on the right side

Only scrape public data. If it requires a login, don't scrape it. If it requires solving a CAPTCHA to view, think carefully.
Respect rate limits. Make requests at a pace that doesn't impact the site. A good rule: no faster than a human could browse.
Don't scrape PII unless you have a clear legal basis. If you must collect personal data, have a GDPR-compliant purpose and process.
Document your purpose. Courts look at intent. "We scraped product prices for market research" is more defensible than "we scraped their entire database to build a clone."
Check robots.txt. Respect it when possible. Document your reasoning if you don't.
Prefer APIs when available. If a site offers an API for the data you want, use it. Scraping around an available API looks bad in court.
Don't store more than you need. Collect the data you actually use. Don't build a stockpile "just in case."
Keep records. Log what you scraped, when, and why. If you're ever challenged, this documentation matters.

How scraping APIs affect the legal picture

Here's where tools like Last Crawler and similar headless browser APIs come into play.

When you use a browser-based scraping API, a real browser is making real HTTP requests -- the same kind your Chrome browser makes when you visit a page. There's no "circumvention" in the DMCA sense because the browser is loading the page the way the site intended it to be loaded. It renders JavaScript, loads assets, and processes the page exactly like a visitor would.

This is meaningfully different from reverse-engineering an API, cracking an encrypted feed, or spoofing authentication tokens. A real browser session accessing a public page is the same thing you do when you open Chrome and navigate somewhere.

That distinction matters under the DMCA's anti-circumvention provisions. You're not bypassing a protection measure -- you're using the front door.

Of course, this doesn't override other legal concerns. If the page requires authentication, the data contains PII, or you're hammering the site with abusive request volumes, a browser-based approach doesn't make it legal. The tool doesn't change the law -- it just avoids one specific legal tripwire.

What the Google v SerpApi case means going forward

If Google wins on the DMCA theory, it sets a precedent that anti-bot measures are "technological protection measures." That would mean:

Bypassing CAPTCHAs, browser fingerprinting, and bot detection could carry statutory damages
Sites could use DMCA takedowns against scraping services
The legal risk shifts from "unauthorized access" (hard to prove for public data) to "circumvention" (easier to prove if you're bypassing bot detection)

If SerpApi wins (or if the case settles without establishing precedent), the status quo continues: scraping public data through normal means remains broadly legal, and the DMCA stays focused on DRM and encryption rather than bot detection.

Either way, the safest approach is to scrape public data using methods that don't bypass technical protections. That's been the safe harbor since hiQ, and it's the part of scraping law that's most settled.

FAQ

Is it legal to scrape Google search results?

It's complicated. Google's ToS prohibit automated access, and they've now sued SerpApi under the DMCA. Scraping Google at a small scale for personal research is unlikely to draw legal action. Building a commercial product that scrapes Google at scale is exactly what the SerpApi lawsuit is about.

Can I scrape Amazon product listings?

Amazon's product pages are public. Under hiQ, scraping publicly accessible product data (prices, titles, descriptions) is likely legal. Amazon's ToS prohibit it, which is a civil contract issue. Scraping at abusive volumes or using the data to build a competing marketplace increases your legal risk.

Is scraping social media legal?

Public posts on Twitter/X, Reddit, or Instagram are generally scrapeable under the hiQ precedent. Data behind privacy settings is not public and should not be scraped. Personal data (usernames, bios, photos) triggers GDPR if you're collecting EU user data. Each platform also has specific API terms that affect commercial use.

Does GDPR apply if my company is in the US?

Yes. GDPR applies based on the data subject's location, not yours. If you scrape personal data of EU residents, GDPR applies to you regardless of where your company is incorporated.

Can a website sue me for ignoring robots.txt?

Not for violating robots.txt specifically -- there's no law that makes robots.txt legally binding. But a site could use your disregard of robots.txt as evidence in a broader claim of unauthorized access or bad faith.

What's the penalty for illegal web scraping?

It depends on the legal theory. CFAA violations carry fines and up to 10 years in prison (for criminal cases -- most scraping cases are civil). DMCA anti-circumvention carries statutory damages of $200 to $2,500 per act. GDPR fines can reach 4% of global annual revenue. Civil breach of contract damages vary by case.

Is scraping for AI training legal?

This is the biggest open question in 2026. Several lawsuits (NYT v OpenAI, Getty v Stability AI, and others) are testing whether scraping for AI training constitutes fair use. The answer will likely depend on whether the training data is copyrighted, whether the outputs compete with the original works, and what jurisdiction you're in. The EU's AI Act adds additional requirements around documentation and transparency.

Last Crawler

2026-03-28