I’ve always been skeptical of benchmarks in software engineering and technology in general. Over my 5+ decades of software engineering, I’ve watched industry benchmark activities and reports. You probably have read technical articles and advertisements that often use benchmark results in reviews and comparison ads. During different hyperactive periods of hardware and software technology advances, benchmarks and their results have tried to adjust use cases and their efforts to keep up with current (point in time) results.
Now that we are in the era of Modern-AI, what are some of the representative AI benchmarks? What are some of the use cases? And, with the pace of AI technology advances, how does one (or anyone) keep their articles/reports/results up to date?
A short list of LLM benchmarking articles and YouTube videos
What are Large Language Model (LLM) Benchmarks? (IBM Technology – YouTube video)
https://youtu.be/kDY4TodQwbg
What Makes a Good AI Benchmark? (Stanford Institute for Human-Centered AI – HAI)
https://hai.stanford.edu/policy/what-makes-a-good-ai-benchmark
The AI Index: A Compass for Navigating AI’s Future (Stanford HAI YouTube video)
https://youtu.be/ABxQBIBsBHY
The 2025 Stanford Institute AI Index Report (Stanford Institute for Human-Centered AI – HAI)
https://hai.stanford.edu/ai-index/2025-ai-index-report
Key findings from Stanford’s 2025 AI Index Report (IBM Technology article)
https://www.ibm.com/think/news/stanford-hai-2025-ai-index-report
Additional AI benchmarking sites and “leader boards”
MLCommons benchmarks
https://mlcommons.org/benchmarks/
AI Benchmarking – Epoch.ai database of benchmark results
https://epoch.ai/benchmarks
LiveBench – A Challenging, Contamination-Free LLM Benchmark
https://livebench.ai/#/
BetterBench – repository of AI benchmark assessments
https://betterbench.stanford.edu/
LLM Leaderboard – updated November 25, 2025 (vellum.ai)
https://www.vellum.ai/llm-leaderboard
Artificialanalysis.ai – includes leader boards for models, API endpoints, text to image, music and more
https://artificialanalysis.ai/leaderboards/models
