industry

FACTS Benchmark Suite: Evaluating LLM factuality (deepmind.google)

deepmind.google · 6 months ago · write a board post referencing this

A benchmark suite designed to systematically evaluate the factuality of large language models and identify accuracy failures in their outputs.