Meta, Zuckerberg Sued by Publishers and Scott Turow Over Copyright

This post contains affiliate links, and I will be compensated if you make a purchase after clicking on my links, at no cost to you.

A group of major publishers and author Scott Turow have filed a proposed class-action against Meta and CEO Mark Zuckerberg in the Southern District of New York. They allege extensive copyright infringement tied to the training of Meta’s Llama AI.

The complaint claims Meta pirated millions of copyrighted works from pirate sites, scraped data without permission, and used those materials to train its AI model. Plaintiffs say Meta tried to conceal the sources and undermine licensing protections.

While a separate ruling earlier this year found that some uses of books to train Llama could fall under fair use, the new suit argues that deliberate circumvention and leadership directives push Meta’s actions outside those boundaries.

What the lawsuit claims and who is involved

The plaintiffs include five major publishers—Hachette, Macmillan, McGraw Hill, Elsevier, and Cengage—along with author Scott Turow. They accuse Meta and its CEO of orchestrating a massive program to harvest copyrighted content for training Llama.

The lawsuit says Meta used torrenting and automated scraping to build a dataset totaling hundreds of terabytes. Plaintiffs claim Meta stripped copyright management information to hide origins and enable unauthorized use.

Internal discussions allegedly referenced LibGen as “a dataset we know to be pirated.” Meta, according to the filing, briefly considered licensing the content and even budgeted up to $200 million for dataset licensing in early 2023.

Leadership allegedly ordered the company to drop licensing and instead pursue a fair-use defense. The suit claims Llama now produces verbatim and near-verbatim copies, derivative works, and other outputs that mimic authors’ writings.

Meta has pushed back, arguing that AI-enabled innovation is transformative. The company points out that courts have recognized training on copyrighted material as fair use in some cases and says it will “fight this lawsuit aggressively.”

Allegations in detail

Plaintiffs argue that piracy, data scraping, and deliberate circumvention add up to unlawful, systemic infringement with real market harm. They point to internal memos acknowledging pirated sources and say the training data substitutes for licensed content, undermining rights holders’ control.

Context: fair use, the legal backdrop, and Meta’s response

The legal landscape around AI training and copyright is still unsettled and changing quickly. In June 2025, a federal judge ruled that Meta’s use of about 200,000 books to train Llama qualified as fair use.

The new suit tries to set itself apart by arguing that Meta’s current conduct involved circumvention and leadership directives that go beyond fair-use protections. Meta maintains that training on copyrighted material can be fair-use and transformative, advancing innovation and productivity.

Meta says it will defend its position in court. The plaintiffs seek unspecified monetary damages and claim the alleged actions have hurt the market for licensed content and rights holders’ control over their works.

Implications for AI development, publishing, and policy

This case raises big questions for developers, publishers, and policymakers about how AI systems should be trained, licensed, and governed. The outcome could affect licensing standards, data provenance practices, and the economics of AI research.

  • Licensing and data provenance: Publishers might push for clearer licensing rules and stricter attribution requirements to stop unauthorized data use.
  • Risk management for AI developers: Companies building foundation models could face higher licensing costs, new compliance demands, or restrictions on data sources.
  • Open access and research practices: The tension between open datasets and copyright protections might change how researchers access and use text corpora.
  • Legal clarity on fair use: Courts may need to define more precise boundaries for training data and derivative outputs to reduce industry uncertainty.

What happens next and potential outcomes

As the case keeps moving in the SDNY, the court has to tackle questions of fact and law. They’ll look at whether Meta’s actions actually count as copyright circumvention and if licensing options were really blocked.

The outcome could go a few ways. The court might uphold fair-use in some situations, or it might find infringement, hand out damages, and issue injunctions that could seriously change how people train and use AI models.

Maybe the parties will talk settlement at some point. The plaintiffs seem pretty set on getting remedies that protect rights holders and keep the door open for licensed content.

For researchers, educators, and tech developers, this whole lawsuit highlights how much we need clear rules on copyright compliance in AI training. There’s also a real need for a solid dataset licensing framework that helps innovation but doesn’t leave creators out in the cold.

 
Here is the source article for this story: Meta, Zuckerberg Sued Over Alleged Copyright Infringement by Book Publishers and Scott Turow

Scroll to Top