OpenAI, in response to claims that it isn’t taking AI safety seriously, has launched a new page called the Safety Evaluations Hub.
This will publicly record things like hallucination rates of its models, likelihood to publish harmful content, and how easily the model can be circumvented.
“This hub provides access to safety evaluation results for OpenAI’s models. These evaluations are included in our system cards, and we use them internally as one part of our decision-making about model safety and deployment,” the new page states.
“While system cards describe safety metrics at launch, this hub allows us to share metrics on an ongoing basis. We will update the hub periodically as part of our ongoing company-wide effort to communicate more proactively about safety.”
System cards are reports that are published alongside AI models, explaining the testing process, limitations, and where the model could cause problems.
Why is this important?
OpenAI, alongside competitors like xAI (creators of Grok) and Google’s Gemini, have all been accused in recent months of not taking AI safety seriously.
Reports have been missing at the launch of new models and can often take months before they are published, or are skipped altogether.
In April, the Financial Times reported that OpenAI employees were concerned about the speed of model releases and did not have enough time to complete tests properly.
Google’s Gemini also raised alarms when it was revealed that one of its more recent models performed worse on safety tests than previous models.
It was also reported yesterday that, despite promising a safety report on Grok AI, xAI has now missed its deadline to do so.
All of this is to say that OpenAI’s attempt to improve transparency and publicly release information on the safety of its models is much needed and is an important step. As the race to be the best speeds up, with AI competitors battling it out at speed, these steps can be easily missed.
How to use the page?
The hub splits safety evaluations into four sections: Harmful content, jailbreaks, hallucinations, and instruction hierarchy.
OpenAI’s new safety hub has a lot of information, but it isn’t instantly clear what it all means. Luckily, the company also includes a helpful guide on how to use the page.
The hub splits safety evaluations into four sections: Harmful content, jailbreaks, hallucinations, and instruction hierarchy.
These more specifically mean:
Harmful content: Evaluations to check that the model does not comply with requests for harmful content that violates OpenAI’s policies, including hateful content.
Jailbreaks: These evaluations include adversarial prompts that are meant to circumvent model safety training and induce the model to create harmful content.
Hallucinations: How much OpenAI’s models make factual errors.
Instruction hierarchy: How the model values instructions from different sources (can’t be overridden by 3rd party sources).
For each of these measurements, OpenAI includes its own testing scores with explanations of what they were checking and how each of their different models ranks.
This new hub also includes information on how OpenAI approaches safety and its privacy and security policies.
More from Tom's Guide
- I've tried every ChatGPT prompt trick — this is the only one that gets me the best results
- I tested ChatGPT vs Gemini with 7 image prompts — and one completely crushed the other
- 5 smart ways to use Gemini Live with your phone right now
Back to Laptops
Show more