Introduction to the Case
Recent court filings in a lawsuit involving AI copyright against Meta provide further evidence to previous reports that the company temporarily halted discussions with book publishers regarding licensing agreements to provide training data for some of its generative! AI models.
Background on the Case
These filings pertain to the case Kadrey v. Meta Platforms — one of numerous similar cases currently making their way through the U.S. court system, in which AI companies are being sued by authors and other intellectual property holders. Generally, the defendants in these cases — AI companies — argue that training on copyrighted content constitutes “fair use.” In contrast, the plaintiffs — copyright holders — strongly disagree with this assertion.
New Filings and Their Implications
New filings submitted to the court on Friday, which include partial transcripts of depositions of Meta employees taken by the plaintiffs’ attorneys, suggest that certain Meta staff members felt that negotiating licenses for AI training data for books might not be feasible on a large scale.
Testimony from Meta Employee
According to one transcript, Sy Choudhury, who is responsible for Meta’s AI partnership initiatives, stated that Meta’s outreach to various publishers was met with a “very slow uptake in engagement and interest.”
Details of the Outreach Efforts
“I don’t recall the entire list, but I remember we had compiled a long list from scouring the internet for top publishers, etc.,” Choudhury said, as per the transcript. “We didn’t get contact and feedback from a lot of our cold call outreaches to try to establish contact.”
Engagement with Publishers
Choudhury added, “There were a few that did engage, but not many.”
Pausing of Licensing Efforts
According to the court transcripts, Meta paused certain AI-related book licensing efforts in early April 2023 after encountering “timing” and other logistical setbacks. Choudhury stated that some publishers, particularly those of fiction books, turned out not to have the rights to the content that Meta was considering licensing, as per a transcript.
Rights to License Content
“I’d like to point out that, in the fiction category, we quickly learned from the business development team that most of the publishers we were talking to were representing that they did not have the rights to license the data to us,” Choudhury said. “And so it would take a long time to engage with all their authors.”
Previous Licensing Efforts
Choudhury noted during his deposition that Meta has, on at least one other occasion, paused licensing efforts related to AI development, according to a transcript.
Alternative Solutions
“I am aware of licensing efforts, such as when we tried to license 3D worlds from different game engine and game manufacturers for our AI research team,” Choudhury said. “And in the same way that I’m describing here for fiction and textbook data, we got very little engagement to even have a conversation […] We decided to build our own solution in that case.”
Plaintiffs’ Allegations
The counsel for the plaintiffs, who include bestselling authors Sarah Silverman and Ta-Nehisi Coates, have amended their complaint several times since the case was filed in the U.S. District Court for the Northern District of California, San Francisco Division in 2023. The latest amended complaint submitted by the plaintiffs’ counsel alleges that Meta, among other offenses, cross-referenced certain pirated books with copyrighted books available for license to determine whether it made sense to pursue a licensing agreement with a publisher.
Use of Shadow Libraries
The complaint also accuses Meta of using “shadow libraries” containing pirated e-books to train several of the company’s AI models, including its popular Llama series of “open” models. According to the complaint, Meta may have secured some of these libraries via torrenting. Torrenting, a method of distributing files across the web, requires that users simultaneously “seed,” or upload, the files they’re trying to obtain — which the plaintiffs asserted is a form of copyright infringement.
Source Link