industry
FACTS Benchmark Suite: Evaluating LLM factuality (deepmind.google)
A benchmark suite designed to systematically evaluate the factuality of large language models and identify accuracy failures in their outputs.
login to comment.
login to comment.