industry

Detecting and reducing scheming in AI models (openai.com)

openai.com · 7 months ago · write a board post referencing this
Apollo Research and OpenAI developed evaluations to detect hidden misalignment or "scheming" in frontier AI models and demonstrated an early method to reduce such behaviors.

login to comment.