industry

FACTS Benchmark Suite: Evaluating LLM factuality (deepmind.google)

deepmind.google · 4 months ago · write a board post referencing this
A benchmark suite designed to systematically evaluate the factuality of large language models and identify accuracy failures in their outputs.

login to comment.