Hearings to examine the AI industry's mass ingestion of copyrighted works for AI training.
Senate Subcommittee on Criminal Justice and Counterterrorism
2025-07-16
Source: Congress.gov
Summary
No summary available.
Participants
Transcript
hearing today, which is entitled Too Big to Prosecute, Examining the AI Industry's Mass Ingestion of Copyrighted Works for AI Training. This is the third hearing of the Senate Judiciary Committee's Subcommittee on Crime and Counterterrorism, which I'm delighted to work on with my colleague, Ranking Member Dermott. I want to thank you, say a special thank you to the witnesses for being here. Many of you, I think all of you, traveled in order to be here today. Thanks to everybody for accommodating our change in time. The Senate floor is going to be tied up here later today and thus no committee business is happening. So thanks all of you for being here and for accommodating us. I'm gonna make just a few opening remarks. Senator Durbin will do the same, then we'll swear in the witnesses and be off to the races. Let me just start by saying that today's hearing is about the largest intellectual property theft in American history. For all of the talk about artificial intelligence and innovation and the future that comes out of Silicon Valley, here's the truth that nobody wants to admit. AI companies are training their models on stolen material, period. That is just the fact of the matter. And we're not talking about these companies simply scouring the internet for what's publicly available. We're talking about piracy. We're talking about theft. For years, AI companies have stolen massive amounts of copyrighted material from illegal online repositories. Now, the FBI and the Department of Homeland Security regularly prosecute individuals who engage in exactly the same kind of behavior using platforms like LimeWire or Napster in the old days, using a process called torrenting. But have these big tech companies been prosecuted? No. Of course not. They're getting off scot-free. And this hearing will show us that Meta and Anthropic and other AI companies are willfully using these illegal networks, these torrenting networks as they're called, to steal vast swaths of copyrighted materials.
The amount of material that we're talking about is absolutely mind-boggling. We're talking about every book and every academic article ever written. Let me say that again. Every book ever and every article ever written. Billions of pages of copyrighted works, enough to fill 22 libraries the size of the Library of Congress. Think about that, 22 libraries of Congresses full of works. That is how much has been stolen, and this theft was not some innocent mistake. They knew exactly what they were doing. They pirated these materials willfully. As the idea of pirating copyrighted works percolated through meta, to take one example, employee after employee warned management that what they were doing was illegal. One Meta employee told management that, and I quote now, this is not trivial, and he shared an article asking, what is the probability of getting arrested for using torrents, illegal downloads, in the United States? Another Meta employee shared a different article saying that downloading from illegal repositories would open Meta up to legal ramifications. That's a nice way of saying that what they were doing was exactly, totally, 100% barred by copyright law. Did Meta Management listen? No. They bulldozed straight ahead. We'll see evidence today that Mark Zuckerberg himself approved the decision to use these pirated materials. And then, the best part, Meta Management tried to hide it. They tried to hide the fact that they were engaged in the illegal download of pirated works. And not just the illegal download, but the illegal distribution of these same works. They tried to hide it by using non-company servers. They went so far as to train their AI model. Get this. Meta trained its AI model to lie to users about what data it had been trained on. I mean, you talk about an inception level worthy deception, training the AI model to lie about what its own sources were.
Sign up for free to see the full transcript
Accounts help us prevent bots from abusing our site. Accounts are free and will allow you to access the full transcript.