Why “intermediate copying” is becoming the most important copyright issue of the AI age. By Lili Kazemi — General Counsel & AI Policy Leader at Anant Corporation, writer of The Human Edge of AI, and legal-tech analyst focused on AI governance, copyright, and the emerging law of model training.
Artificial intelligence has revived a legal concept that predates the internet, predates software, and even predates the modern fair use doctrine. You see it at the center of lawsuits involving Meta, Anthropic, Google Books, shadow libraries, and now—in one of the most consequential legal-tech cases in decades—Thomson Reuters v. ROSS Intelligence.
The problem is deceptively simple:
What happens when a company copies a protected work to build a tool, but the output of that tool never displays that work?
Courts have a name for this. And it’s the doctrine driving every major AI copyright fight today.
It’s called intermediate copying.
What Is Intermediate Copying?
Intermediate copying refers to:
The act of reproducing a copyrighted work as an internal or developmental step—even if the copied work never appears in the end product delivered to the user.
This doctrine emerged decades ago in the context of reverse engineering and search indexing. Yet it’s now through this lens that courts must decide whether training a model on copyrighted text is lawful.
You even see a version of it in music: in Tracy Chapman’s lawsuit against Nicki Minaj over the song “Sorry,” the court ultimately held that Minaj’s studio creation of a demo that interpolated “Baby Can I Hold You” was fair use as intermediate copying—even though any later leak or distribution of that demo would be analyzed separately. The law is increasingly willing to separate internal experimentation from external exploitation.
To understand the stakes for AI, it helps to trace the universe of cases where intermediate copying has shaped the legal landscape.
1. The Classic Software Cases: Sega and Sony
The Ninth Circuit introduced the modern version of intermediate copying in the early 1990s in two foundational cases:
- Sega v. Accolade (1992)
- Sony v. Connectix (1999)
In both, the defendant temporarily copied copyrighted software code to learn how it worked. The goal wasn’t to redistribute Sega’s or Sony’s code—it was to build compatible systems.
The courts held that intermediate copying can be fair use when:
- The copying is necessary to access functional elements.
- The purpose is legitimate (compatibility, interoperability).
- The final product does not substitute for the original.
These cases established the principle that copying something internally is still copying, but not all such copying is infringing.
That distinction is precisely what today’s AI companies rely on.
2. The Shadow Library and Mass Digitization Cases
A century after the first West headnotes appeared, the courts were suddenly confronted with Google scanning millions of books.
- Authors Guild v. Google (Google Books)
- Authors Guild v. HathiTrust
- The shadow library cases (Library Genesis, Z-Library, etc.)
Google created full digital copies of books—but only used them to:
- enable search
- create indexes
- display small snippets
- support research functions
Google did not distribute books to users.
The Second Circuit concluded that even though Google made complete unauthorized copies, the use was highly transformative and created no market substitute for the original works. Therefore, intermediate copying for search indexing was fair use.
Shadow libraries, on the other hand, host and distribute entire works—so the intermediate-copying defense collapses because their use is not transformative and directly competes with the market for books.
The line:
Copy for indexing = sometimes fair.
Copy for substitution = infringing.
That distinction is now the center of the AI debate.
3. The AI Training Cases: Meta, Anthropic, and Beyond
Fast-forward to 2023–2025. Courts are applying the same doctrine to LLM training.
In Kadrey v. Meta, the court acknowledged that training a model on copyrighted books could harm the market—even if the model never outputs the books themselves. Why? Because the model could generate competing text or reduce the economic value of the original works. Due to a technical default, that argument was left unresolved. It may live to see its day in the Third Circuit.
In Bartz v. Anthropic, Judge Alsup distinguished between:
- generative AI, where the model outputs new text, and
- legal search engines, which replicate the function of Westlaw.
The Bartz court emphasized that copying to build a substitute is not transformative—even if the output doesn’t display the copyrighted material.
This brings us to the most direct application yet.
For a deeper analysis of the Meta and Anthropic cases, please see my previous blog:
4. Thomson Reuters v. ROSS Intelligence: Is Competition the Main Question?
In the ROSS case, intermediate copying is not abstract. It is documented.
ROSS did not show Westlaw headnotes in its search engine outputs. But ROSS (through a contractor) copied:
- West headnotes
- West synopses
- West Key Number classifications
- linked case passages selected by West editors
These were used to create training data in the form of 25,000+ “bulk memos.”
Once training was finished, the system no longer displayed headnotes. But every legal question-and-answer pair in the training corpus originated from West’s editorial work.
What the Appellant (ROSS) Argued in Its Opening Brief
ROSS’s core arguments are:
A. Headnotes Are Not Copyrightable, or Even Creative
Underpinning this entire controversy is a critical question: Are headnotes really that special? Anyone who has been a law clerk frantically trying to assemble or navigate the never-ending branches of case law can relate: headnotes serve their purpose and are helpful for what they are. You will never replace reading a case and actually understanding the flow of the legal analysis, but headnotes function as significant guideposts in what can be sometimes a foreign country if you are a clerk or practice across a wide variety of issues. For instance, as a tax lawyer, contract law often arises. For one of my cases, I had to research what a novation truly meant, and you would not believe the rabbit holes I went down. So, headnotes—especially when you find cases that are binding or precedential—really help give you an idea of where you should dig deeper. Whether that has independent creative value or not is a separate question. From my own experience with Westlaw headnotes, I can say that it is definitely not verbatim copying, but I can also attest that in some instances, I have been misled by the headnotes concerning what the case actually states.
ROSS argues that:
- Headnotes are “facts,” “data,” or mere paraphrases of judicial opinions.
- They are dictated by legal conventions.
- Under the “merger doctrine,” the legal ideas and the headnote expression merge.
- TR attempting to monopolize the legal research market through anti-competitive practices, specifically by bundling its public law database with its proprietary search tools and using copyright law to stifle competition.
ROSS asks for a categorical rule: headnotes are not copyrightable.
B. ROSS’s Use Was Fair Use
ROSS argues:
- The use was transformative because it trained an AI system.
- ROSS did not provide Westlaw headnotes to users.
- Any copying was “intermediate copying,” which is allowed under software-reverse-engineering cases.
- ROSS’s model outputs were not infringing.
- There is no cognizable licensing market for “headnotes as a dataset.”
- AI innovation requires this type of training.
ROSS’s theme: AI needs this data, we didn’t substitute for Westlaw, and the district court’s ruling threatens AI development.
What the Appellee (Thomson Reuters “TR”) Argues in Response
A. Headnotes Are Copyrightable, and This Is Black-Letter Law
TR argues:
- The Supreme Court has already held headnotes copyrightable in Callaghan v. Myers (1888) and reaffirmed that holding in Georgia v. Public.Resource.org (2020).
- Even the dissent in Georgia agreed headnotes are copyrightable.
- ROSS’s own expert admitted thousands of headnotes differ from case language—contradicting ROSS’s claim they “parrot” the cases.
- The district court excluded any headnote that was verbatim from a case.
- TR says ROSS ignores that Lexis creates different headnotes—proof of creativity, not industry constraints.
- Headnotes involve creative choices—selection, synthesis, phrasing, arrangement—and easily pass Feist’s minimal originality test.
Bottom line: Headnotes have been copyrightable for 137 years. ROSS is trying to re-litigate settled law.
B. ROSS’s Use Was Not Fair Use
TR argues all four factors weigh against ROSS.
1. Factor Four (Market Harm) – The “Most Important” Factor
TR says:
- ROSS created a direct commercial substitute for Westlaw.
- ROSS marketed itself as a “replacement for Westlaw” and undercut pricing.
- ROSS actually took customers from Westlaw.
- ROSS harmed TR’s ability to license its content for AI training.
- ROSS deprived TR of exclusivity, which reduces the value of the content.
- If ROSS’s behavior became widespread, the entire incentive to create legal editorial systems collapses.
This factor alone defeats fair use.
2. Factor One (Purpose/Character)
- ROSS is a for-profit company that copied to sell a rival product.
- ROSS acted in bad faith—TR rejected its repeated requests for access, so ROSS hired LegalEase to secretly obtain Westlaw content.
- The use was not transformative:
- TR used headnotes to help users find relevant case law.
- ROSS copied them to train a system to help users… find relevant case law.
- Same purpose = not transformative (per Warhol, Texaco, Video Pipeline).
TR points out that even the Bartz v. Anthropic court said:
Using Westlaw to build a competitor to Westlaw is not transformative.
3. Factor Two (Nature of the Work)
- Even though law is “factual,” the editorial expression in headnotes is creative.
- TR invested years of attorney-editor work to synthesize legal concepts.
- Courts routinely find factual compilations to be creative for factor two.
4. Factor Three (Amount/Substantiality)
- ROSS copied “the heart of Westlaw”—its editorial analysis.
- The entire training dataset consisted of infringing Westlaw-derived memos.
- 25,000+ headnotes; 17,000 scraped annotated cases; hundreds of thousands copied indirectly via LegalEase.
What the District Court Already Found (and What the Third Circuit Is Reviewing)
The district court already held:
(1) 2,243 headnotes were original and were copied by ROSS
→ Summary judgment for TR.
The court determined that the legal summaries were copyrightable because the editors exercised distinct creative judgment in deciding exactly which points to highlight and how to articulate them. This discretionary process of selecting and arranging specific information was sufficient to establish originality, distinguishing the summaries from unprotectable facts or public laws.
(2) ROSS’s fair use defense fails as a matter of law
→ Summary judgment for TR.
Judge Bibas rejected Ross Intelligence’s fair use defense by identifying commercial competition as the fatal flaw in their strategy, noting that Ross used Westlaw’s proprietary content to build a rival legal research tool. The court was ultimately swayed by the fact that Ross’s AI was not “transformative” but rather a direct market substitute, serving the exact same intrinsic purpose as the original work without adding new meaning. Consequently, the fourth fair use factor—market harm—became the deciding element, as allowing a competitor to bypass licensing fees to create a substitute product would fundamentally undermine Thomson Reuters’ business model.
This appeal decides whether those rulings stand.
ROSS’s argument essentially boiled down to:
“Our output didn’t contain headnotes, so this should be allowed.”
The court’s response:
Copying is copying. Output is irrelevant if the copying serves the same purpose as the original work.
This is the exact opposite fact pattern from Google Books. Google copied to help people find books. ROSS copied to build a product that would replace Westlaw.
That difference—indexing vs. substituting—is everything.
There is another major tension at play here.
Unlike Meta or Anthropic — both well-capitalized frontier-model companies — ROSS was a small legal-tech startup with limited resources.
They did not have the scale, infrastructure, or capital base of the major AI players now defending their models in federal court.
The lawsuit forced ROSS to shutter operations entirely back in 2020. The founders publicly stated that the cost and burden of the litigation made it impossible to continue operating.
For those interested in why this matters for market forces, power asymmetry, and AI licensing economics, see my earlier article on arm’s-length settlements and AI value allocation, where I break down how scale and bargaining power shape these disputes in practice.
Why Intermediate Copying Is the Defining Issue of AI Copyright
This doctrine forces courts to confront a hard truth:
Most AI systems necessarily copy copyrighted works during training.
The question is not whether the model shows those works later.
The question is:
Did the intermediate copying create a market substitute?
Or did it create something truly new?
This is where courts divide:
| Case | Intermediate Copying | Output | Fair Use? | Why |
|---|---|---|---|---|
| Sega | Yes | No | Yes | Needed for compatibility |
| Sony | Yes | No | Yes | Transformative purpose |
| Google Books | Yes | Snippets only | Yes | Search index, not substitute |
| Shadow libraries | Yes | Yes | No | Literal substitution |
| Meta (Kadrey) | Yes | No | Yes | Possible market harm not sufficiently articulated or proven |
| Anthropic (Bartz) | Yes | No | Yes for lawfully acquired, no for pirated | Depends on whether use competes |
| ROSS | Yes | No | No | Built a competing legal research product |
The key pattern:
Intermediate copying becomes infringement when it is used to build a substitute rather than a tool.
What Happens Next
As AI companies push the boundaries of what can be ingested, indexed, or learned, intermediate copying is becoming the most consequential doctrine in copyright law. It draws the legal line between:
- training a model to create something new, and
- training a model to replace the creator of the input.
The Third Circuit’s decision in Thomson Reuters v. ROSS will be the first major federal appellate ruling to apply the doctrine directly to an AI-enabled legal research tool.
It will influence:
- AI training practices
- data licensing markets
- LLM infrastructure
- legal research platforms
- and the next wave of AI copyright litigation
Intermediate copying is no longer an obscure concept buried in Ninth Circuit software cases. It is the defining legal question of generative AI. And we’re way past code now..
And its resolution will shape what AI companies can—and cannot—build in the decade ahead.
Lili Kazemi is General Counsel and AI Policy Leader at Anant Corporation, where she advises on the intersection of global law, tax, and emerging technology. She brings over 20 years of combined experience from leading roles in Big Law and Big Four firms, with a deep background in international tax, regulatory strategy, and cross-border legal frameworks. Lili is also the founder of DAOFitLife, a wellness and performance platform for high-achieving professionals navigating demanding careers.
Follow Lili on LinkedIn and X

🔍 Discover What We’re All About
At Anant, we help forward-thinking teams unlock the power of AI—safely, strategically, and at scale.
From legal to finance, our experts guide you in building workflows that act, automate, and aggregate—without losing the human edge.
Let’s turn emerging tech into your next competitive advantage.
Follow us on LinkedIn
👇 Subscribe to our weekly newsletter, the Human Edge of AI, to get AI from a legal, policy, and human lens.
Subscribe on LinkedIn

Tune into the The Human Edge of AI for a deeper dive into the ethics of creator control, monetization, and how Sora’s new policy framework could shape the next era of human-machine collaboration.
#AI #HumanEdgeofAI #DigitalEconomy #Innovation #Governance #Creativity #Leadership #OpenAI #Sora #Anant #SmarterTechStrongerHumans
DAOFitLife is a revolutionary fitness and wellness platform that empowers individuals to take control of their health and well-being through the principles of decentralized autonomous organizations (DAOs). Our mission is to create a community-driven ecosystem that promotes holistic fitness, personalized nutrition, and sustainable lifestyle choices. By leveraging the power of blockchain technology, we provide our members with transparent, secure, and equitable access to a wide range of fitness resources, expert guidance, and innovative wellness solutions. At DAOFitLife, we believe that true well-being is a collective journey, and we are committed to fostering a supportive and inclusive environment where everyone can achieve their health goals and live their best lives. Visit daofit.life to learn more.


