Skip to main content

Introduction to the Issue

The pursuit of free access to books is being compromised from two opposing sides. On one hand, the U.S. government, now heavily influenced by tech oligarchs, poses a threat. On the other hand, big tech companies, including Meta and OpenAI, are also implicated in this issue. These corporations have utilized millions of books from piracy sites to develop their AI technology. However, if one prefers not to consume content generated by AI, the administration under President Donald Trump is seeking to dismantle a significant source of funding for public libraries, making it a challenging time for book enthusiasts.

Analysis of the Problem

Over the past two years, The Atlantic has been examining and compiling repositories of publicly accessible data used to train AI models. The site focused on LibGen, an archive of pirated media that encompasses millions of books, academic papers, and articles. Recently, The Atlantic released its findings along with a tool for searching through the vast archive of pirated works. This tool allows users to look for their favorite authors and determine if their work has been used to train AI models from companies like OpenAI, Mistral, and Meta.

Understanding LibGen

LibGen, short for Library Genesis, is referred to as a "shadow library" due to its illicit yet open nature. It contains nearly 7.5 million books and 81 million academic papers, according to The Atlantic’s report. While it holds a vast amount of copyrighted material, this does not diminish its actual benefits to society. Library Genesis has been utilized by scientists to access academic works without incurring exorbitant fees from publishers. Other shadow libraries, like Sci-Hub, have been recognized by groups such as the Electronic Frontier Foundation as a positive force for scientific progress.

Big Tech’s Involvement

Gizmodo reached out to Meta for comment but did not receive an immediate response. When asked to comment on its use of LibGen, OpenAI stated that the models powering ChatGPT and its API were not developed using these datasets. However, a former OpenAI employee previously mentioned that the company was breaking copyright law, although OpenAI has defended itself in court, arguing that using copyrighted works for AI training constitutes fair use.

The Law and AI-Generated Content

While the law has not yet determined whether AI’s consumption of copyrighted data is legal, it is clear where the creative community stands on this issue. Many authors, including Michael Chabon, have sued Meta for using their copyrighted work to train AI. The Atlantic’s latest revelations have left authors displeased, with some finding their books and articles used for training Llama 3 without their consent.

Impact on Public Libraries

The irony of pirating books to train AI becomes more pronounced as the Trump administration works to undermine the financial support for public libraries while relying on AI for services traditionally performed by humans. On March 14, Trump issued an executive order that would effectively kill the Institute of Museum and Library Services, an agency that provides grants and funding to public libraries across the U.S. This move could lead to libraries scaling back or eliminating their digital services, resulting in longer wait times for e-books and reduced availability of certain titles.

Conclusion

The interference from tech oligarchs, including Musk and DOGE, will likely suppress access to literature. First, by stealing authors’ work and hurting the book industry, and then by limiting people’s access to books altogether. The pursuit of AI efficiency may come at the cost of the very services and resources that make literature accessible to the public.


Source Link