OpenAI has announced that it will be making changes to the way it updates the AI models that power ChatGPT, following a recent incident where the platform became overly sycophantic for many users.
Over the weekend, after OpenAI released a revised version of its GPT-4o model, which is the default model used by ChatGPT, users on social media noticed that the platform was responding in an overly validating and agreeable manner. This issue quickly gained attention and became a meme, with users sharing screenshots of ChatGPT endorsing problematic and dangerous decisions and ideas.
In a post on X, OpenAI CEO Sam Altman acknowledged the problem and stated that the company would work on fixing the issue as soon as possible. On Tuesday, Altman announced that the GPT-4o update would be rolled back and that OpenAI was working on additional fixes to the model’s personality. The company has since published a postmortem analysis of the incident and outlined specific adjustments it plans to make to its model deployment process.
OpenAI plans to introduce an opt-in “alpha phase” for certain models, allowing select ChatGPT users to test and provide feedback on the models before they are launched. The company will also include explanations of known limitations for future model updates and adjust its safety review process to consider model behavior issues, such as personality, deception, and reliability, as launch-blocking concerns.
“Going forward, we will proactively communicate about the updates we make to the models in ChatGPT, whether they are subtle or not,” OpenAI wrote in a blog post. “Even if these issues are not perfectly quantifiable, we commit to blocking launches based on proxy measurements or qualitative signals, even when metrics like A/B testing look good.”
These pledged fixes come as more people rely on ChatGPT for advice, with a recent survey finding that 60% of US adults have used the platform to seek counsel or information. The growing dependence on ChatGPT raises the stakes when issues like extreme sycophancy emerge, highlighting the need for the platform to address these technical shortcomings.
Techcrunch event
Berkeley, CA
|
June 5
As a mitigating step, OpenAI has announced that it will experiment with ways to allow users to provide real-time feedback to directly influence their interactions with ChatGPT. The company will also refine techniques to steer models away from sycophancy, potentially enable users to choose from multiple model personalities, build additional safety guardrails, and expand evaluations to identify issues beyond sycophancy.
“One of the biggest lessons is recognizing how people have started to use ChatGPT for deeply personal advice, which was not a primary focus for us even a year ago,” OpenAI stated in its blog post. “As AI and society have co-evolved, it has become clear that we need to treat this use case with great care, and it will now be a more meaningful part of our safety work.”