Hearings to examine the AI industry's mass ingestion of copyrighted works for AI training.

Senate Subcommittee on Criminal Justice and Counterterrorism

2025-07-16

Summary

The Senate Judiciary Committee's Subcommittee on Crime and Counterterrorism held a hearing titled 'Too Big to Prosecute: Examining the AI Industry's Mass Ingestion of Copyrighted Works for AI Training.' Witnesses including attorneys from Boies Schiller, Professor Mike Smith, and author David Baldacci detailed how major AI companies like Meta and Anthropic have systematically pirated copyrighted material from illicit online repositories—such as Anna's Archive—using peer-to-peer torrenting networks. Evidence shows that Meta employees acknowledged the legality of their actions, with internal documents indicating that leaders like Mark Zuckerberg approved the use of stolen content. The hearing emphasized that these companies did not license works or compensate creators, instead relying on illegal downloads and actively hiding such activity by routing data through third-party servers. Witnesses argued that this conduct constitutes criminal copyright infringement and breaches the principle of fair use, with major implications for creators, especially emerging authors. The panel called for stronger enforcement of U.S. copyright law and greater transparency in AI training practices to protect the intellectual property of American creators and ensure innovation proceeds on equitable terms.

Participants

Transcript

Joshua (Josh) Hawley

hearing today, which is entitled Too Big to Prosecute, Examining the AI Industry's Mass Ingestion of Copyrighted Works for AI Training. This is the third hearing of the Senate Judiciary Committee's Subcommittee on Crime and Counterterrorism, which I'm delighted to work on with my colleague, Ranking Member Dermott. I want to thank you, say a special thank you to the witnesses for being here. Many of you, I think all of you, traveled in order to be here today. Thanks to everybody for accommodating our change in time. The Senate floor is going to be tied up here later today and thus no committee business is happening. So thanks all of you for being here and for accommodating us. I'm gonna make just a few opening remarks. Senator Durbin will do the same, then we'll swear in the witnesses and be off to the races. Let me just start by saying that today's hearing is about the largest intellectual property theft in American history. For all of the talk about artificial intelligence and innovation and the future that comes out of Silicon Valley, here's the truth that nobody wants to admit. AI companies are training their models on stolen material, period. That is just the fact of the matter. And we're not talking about these companies simply scouring the internet for what's publicly available. We're talking about piracy. We're talking about theft. For years, AI companies have stolen massive amounts of copyrighted material from illegal online repositories. Now, the FBI and the Department of Homeland Security regularly prosecute individuals who engage in exactly the same kind of behavior using platforms like LimeWire or Napster in the old days, using a process called torrenting. But have these big tech companies been prosecuted? No. Of course not. They're getting off scot-free. And this hearing will show us that Meta and Anthropic and other AI companies are willfully using these illegal networks, these torrenting networks as they're called, to steal vast swaths of copyrighted materials.

Joshua (Josh) Hawley

The amount of material that we're talking about is absolutely mind-boggling. We're talking about every book and every academic article ever written. Let me say that again. Every book. and every article ever written. Billions of pages of copyrighted works, enough to fill 22 libraries the size of the Library of Congress. Think about that, 22 libraries of Congresses full of works. That is how much has been stolen, and this theft was not some innocent mistake. They knew exactly what they were doing. They pirated these materials willfully. As the idea of pirating copyrighted works percolated through Meta, to take one example, employee after employee warned management that what they were doing was illegal. One Meta employee told management that, and I quote now, this is not trivial, and he shared an article asking, what is the probability of getting arrested for using torrents, illegal downloads, in the United States? Another Meta employee shared a different article saying that downloading from illegal repositories would open Meta up to legal ramifications. That's a nice way of saying that what they were doing was exactly, totally, 100% barred by copyright law. Did Meta Management listen? No. They bulldozed straight ahead. We'll see evidence today that Mark Zuckerberg himself approved the decision to use these pirated materials. And then, the best part, Meta Management tried to hide it. They tried to hide the fact that they were engaged in the illegal download of pirated works. And not just the illegal download, but the illegal distribution of these same works. They tried to hide it by using non-company servers. They went so far as to train their AI model. Get this. Meta trained its AI model to lie to users about what data it had been trained on. I mean, you talk about an inception level worthy deception, training the AI model to lie about what its own sources were.

Sign up for free to see the full transcript

Accounts help us prevent bots from abusing our site. Accounts are free and will allow you to access the full transcript.