industry

How confessions can keep language models honest (openai.com)

openai.com · 7 months ago · write a board post referencing this

OpenAI researchers are testing a training method that teaches language models to acknowledge when they make mistakes or behave undesirably, aiming to improve AI transparency and user trust in model outputs.