Prbench
We published PRBench: Large-Scale Expert Rubrics for Evaluating High-Stakes Professional Reasoning, a new benchmark to measure how well LLMs do on prefessional domains
We published PRBench: Large-Scale Expert Rubrics for Evaluating High-Stakes Professional Reasoning, a new benchmark to measure how well LLMs do on prefessional domains