OpenAI just published a new safety report on AI development — here's what you need to know

1 month ago 4

(Image credit: Shutterstock)

OpenAI, in response to claims that it isn’t taking AI safety seriously, has launched a new page called the Safety Evaluations Hub.

This will publicly record things like hallucination rates of its models, likelihood to publish harmful content, and how easily the model can be circumvented.

“This hub provides access to safety evaluation results for OpenAI’s models. These evaluations are included in our system cards, and we use them internally as one part of our decision-making about model safety and deployment,” the new page states.

“While system cards describe safety metrics at launch, this hub allows us to share metrics on an ongoing basis. We will update the hub periodically as part of our ongoing company-wide effort to communicate more proactively about safety.”

System cards are reports that are published alongside AI models, explaining the testing process, limitations, and where the model could cause problems.

Why is this important?

OpenAI, alongside competitors like xAI (creators of Grok) and Google’s Gemini, have all been accused in recent months of not taking AI safety seriously.

Reports have been missing at the launch of new models and can often take months before they are published, or are skipped altogether.

Get instant access to breaking news, the hottest reviews, great deals and helpful tips.

In April, the Financial Times reported that OpenAI employees were concerned about the speed of model releases and did not have enough time to complete tests properly.

Google’s Gemini also raised alarms when it was revealed that one of its more recent models performed worse on safety tests than previous models.

It was also reported yesterday that, despite promising a safety report on Grok AI, xAI has now missed its deadline to do so.

All of this is to say that OpenAI’s attempt to improve transparency and publicly release information on the safety of its models is much needed and is an important step. As the race to be the best speeds up, with AI competitors battling it out at speed, these steps can be easily missed.

How to use the page?

The hub splits safety evaluations into four sections: Harmful content, jailbreaks, hallucinations, and instruction hierarchy.

OpenAI’s new safety hub has a lot of information, but it isn’t instantly clear what it all means. Luckily, the company also includes a helpful guide on how to use the page.

The hub splits safety evaluations into four sections: Harmful content, jailbreaks, hallucinations, and instruction hierarchy.

These more specifically mean:

Harmful content: Evaluations to check that the model does not comply with requests for harmful content that violates OpenAI’s policies, including hateful content.

Jailbreaks: These evaluations include adversarial prompts that are meant to circumvent model safety training and induce the model to create harmful content.

Hallucinations: How much OpenAI’s models make factual errors.

Instruction hierarchy: How the model values instructions from different sources (can’t be overridden by 3rd party sources).

For each of these measurements, OpenAI includes its own testing scores with explanations of what they were checking and how each of their different models ranks.

This new hub also includes information on how OpenAI approaches safety and its privacy and security policies.

More from Tom's Guide

Back to Laptops

Alex is the AI editor at TomsGuide. Dialed into all things artificial intelligence in the world right now, he knows the best chatbots, the weirdest AI image generators, and the ins and outs of one of tech’s biggest topics.

Before joining the Tom’s Guide team, Alex worked for the brands TechRadar and BBC Science Focus.

In his time as a journalist, he has covered the latest in AI and robotics, broadband deals, the potential for alien life, the science of being slapped, and just about everything in between.

Alex aims to make the complicated uncomplicated, cutting out the complexities to focus on what is exciting.

When he’s not trying to wrap his head around the latest AI whitepaper, Alex pretends to be a capable runner, cook, and climber.

Read Entire Article