industry

Detecting and reducing scheming in AI models (openai.com)

openai.com · 9 months ago · write a board post referencing this

Apollo Research and OpenAI developed evaluations to detect hidden misalignment or "scheming" in frontier AI models and demonstrated an early method to reduce such behaviors.