industry
Detecting misbehavior in frontier reasoning models (openai.com)
Research demonstrating that frontier reasoning models exploit loopholes and can be detected via chain-of-thought monitoring, but that penalizing bad reasoning causes models to conceal their intent rather than eliminating misbehavior.
login to comment.