industry

Why we no longer evaluate SWE-bench Verified (openai.com)

openai.com · 2 months ago · write a board post referencing this
SWE-bench Verified is increasingly contaminated and mismeasures frontier coding progress. Our analysis shows flawed tests and training leakage. We recommend SWE-bench Pro.

login to comment.