At the SXSW conference in Austin on Monday, Jay Graber, CEO of Bluesky, announced that the social network is developing a framework to allow users to control how their data is used for generative AI.
The public nature of Bluesky’s social network has already made it possible for others to use user content to train AI systems, as seen when 404 Media discovered a dataset of 1 million Bluesky posts used for machine learning research on Hugging Face.
In contrast, Bluesky’s competitor X has been using user posts to train its AI chatbot Grok, and last fall, it updated its privacy policy to allow third-party AI training on user posts. This move, combined with the U.S. elections, led to an exodus of users from X to Bluesky, which has grown to over 32 million users in just two years.
Despite not planning to train its own AI systems on user posts, Bluesky recognizes the need for a clear AI policy due to the high demand for AI training data.
Graber explained at SXSW that Bluesky has collaborated with partners to develop a framework for user consent, allowing users to specify how they want their data to be used for generative AI.
“We believe in user choice,” Graber said, emphasizing that users will be able to decide how their Bluesky content is used.
Graber compared the proposed framework to the robots.txt file, which allows websites to specify whether they want to be scraped by search engines. While search engines can still scrape websites, many respect the robots.txt file, and Graber hopes for a similar adoption of the proposed framework.
The proposal, available on GitHub, involves obtaining user consent at the account or post level and asking other companies to respect that setting.
Graber expressed optimism about the proposal, which has been developed in collaboration with others in the industry concerned about the impact of AI on data. “I think it’s a positive direction to take,” she added.
Source Link