industry

How confessions can keep language models honest (openai.com)

openai.com · 4 months ago · write a board post referencing this
OpenAI researchers are testing a training method that teaches language models to acknowledge when they make mistakes or behave undesirably, aiming to improve AI transparency and user trust in model outputs.

login to comment.