ARC Prize - Leaderboard

Understanding the Leaderboard

ARC-AGI has evolved from its first version (ARC-AGI-1) which measured basic fluid intelligence, to ARC-AGI-2 which challenges systems to demonstrate both high adaptability and high efficiency.

The scatter plot above visualizes the critical relationship between cost-per-task and performance - a key measure of intelligence efficiency. True intelligence isn't just about solving problems, but solving them efficiently with minimal resources.

Interpreting the data

Human Performance solutions represent first-hand data collected by the Prize Foundation, showing performance across different human groups including PhD graduates, PhD students, and members of the general public. These data points establish important benchmarks for human-level problem-solving capabilities.
Reasoning Systems Trend Line solutions display connected points representing the same model at different reasoning levels. These trend lines illustrate how increased reasoning time affects performance, typically showing asymptotic behavior as thinking time increases.
Base LLMs solutions represent single-shot inference from standard language models like GPT-4.5 and Claude 3.7, without extended reasoning capabilities. These points demonstrate raw model performance without additional reasoning enhancements.
Kaggle Systems solutions showcase competition-grade submissions from the Kaggle challenge, operating under strict computational constraints ($50 compute budget for 120 evaluation tasks). These represent purpose-built, efficient methods specifically designed for the ARC Prize.

For more information on our reporting process, see our testing policy.

ARC-AGI Leaderboard

Understanding the Leaderboard

Interpreting the data

Leaderboard Breakdown

ARC-AGI Leaderboard

Understanding the Leaderboard

Interpreting the data

Leaderboard Breakdown

ARC Prize 2025: Get Started

ARC Prize : Newsletter