Skip to main content

Bluesky, a social network, has recently released a proposal on GitHub that outlines new options for users to control whether their posts and data can be used for purposes such as training generative AI and public archiving.

This proposal was discussed by CEO Jay Graber earlier this week at South by Southwest, and gained renewed attention after she shared it on Bluesky. Some users expressed concern over the plan, perceiving it as a deviation from Bluesky’s previous stance on not selling user data to advertisers and not training AI on user posts.

A user named Sketchette responded strongly, saying “Oh, hell no! The beauty of this platform was the NOT sharing of information. Especially gen AI. Don’t you cave now.”

Graber replied that since all content on Bluesky is public, like a website, it’s already being scraped by generative AI companies. She emphasized that Bluesky aims to establish a new standard, similar to robots.txt, to guide this scraping process.

The debate around AI training and copyright has highlighted the limitations of robots.txt, including its lack of legal enforceability. Bluesky’s proposed standard seeks to provide a similar mechanism, offering a machine-readable format that “good actors” are expected to respect, though it would not be legally binding.

Under this proposal, users of Bluesky, or apps utilizing the ATProtocol, would have the option to allow or disallow the use of their data across four categories: generative AI, protocol bridging, bulk datasets, and web archiving, such as the Wayback Machine.

If a user opts out of allowing their data to be used for training generative AI, companies and research teams are expected to respect this choice when scraping or transferring data in bulk using the protocol.

Molly White, author of the Citation Needed newsletter and the Web3 is Going Just Great blog, described the proposal as positive, stating it was odd to see criticism towards Bluesky as the proposal is more about adding a consent mechanism for existing scraping rather than inviting AI scraping.

White noted, however, that the weakness of Bluesky’s proposal, and similar initiatives like Creative Commons’ ‘preference signals’, lies in their reliance on scrapers’ goodwill to respect these signals, given that some companies have ignored robots.txt or pirated material in the past.


Source Link