Skip to main content

OpenAI has revamped its Preparedness Framework, which is the company’s internal system for evaluating the safety of AI models and determining necessary safeguards during development and deployment. The update includes a statement that OpenAI may “modify” its safety requirements if a competing AI lab releases a “high-risk” system without similar protections in place.

This change is a response to the increasing competitive pressures on commercial AI developers to deploy models quickly. OpenAI has faced criticism for allegedly lowering its safety standards in favor of faster releases and failing to deliver timely reports detailing its safety testing. Recently, 12 former OpenAI employees filed a brief in a case against OpenAI, arguing that the company would be encouraged to cut even more corners on safety if it completes its planned corporate restructuring.

In anticipation of potential criticism, OpenAI claims that it would not make these policy adjustments lightly and would maintain its safeguards at a level that is “more protective” than before.

In a blog post published on Tuesday, OpenAI stated, “If another frontier AI developer releases a high-risk system without comparable safeguards, we may adjust our requirements. However, we would first rigorously confirm that the risk landscape has actually changed, publicly acknowledge that we are making an adjustment, assess that the adjustment does not meaningfully increase the overall risk of severe harm, and still keep safeguards at a level more protective.”

The revised Preparedness Framework also emphasizes OpenAI’s increased reliance on automated evaluations to accelerate product development. Although the company has not abandoned human-led testing entirely, it has developed “a growing suite of automated evaluations” that can supposedly “keep up with [a] faster [release] cadence.”

However, some reports contradict this claim. According to the Financial Times, OpenAI gave testers less than a week to conduct safety checks for an upcoming major model, which is a compressed timeline compared to previous releases. The publication’s sources also alleged that many of OpenAI’s safety tests are now conducted on earlier versions of models rather than the versions released to the public.

In response to these allegations, OpenAI has disputed the notion that it is compromising on safety.

Other updates to OpenAI’s framework relate to how the company categorizes models according to risk, including models that can conceal their capabilities, evade safeguards, prevent their shutdown, and even self-replicate. OpenAI will now focus on whether models meet one of two thresholds: “high” capability or “critical” capability.

OpenAI defines the former as a model that could “amplify existing pathways to severe harm.” The latter refers to models that “introduce unprecedented new pathways to severe harm,” according to the company.

“Covered systems that reach high capability must have safeguards that sufficiently minimize the associated risk of severe harm before they are deployed,” OpenAI stated in its blog post. “Systems that reach critical capability also require safeguards that sufficiently minimize associated risks during development.”

These updates are the first changes OpenAI has made to the Preparedness Framework since 2023.




Source Link