Evaluation is where research meets reality. Relaylit tracks papers introducing new benchmarks, critiquing existing ones, and proposing human evaluation methodologies — across arXiv and peer-reviewed venues.
LLM evaluation and benchmarks
Benchmark design, contamination, human evals, agentic task suites.
Example brief
Where Relaylit searches for this topic
Related topics
Ready to track this?