AGI remains unsolved.
New ideas still needed.

ARC-AGI — the only unbeaten benchmark that's easy for humans, but hard for AI.

ARC Prize 2025 is Now Open

Join the $1M+ global competition and compete with thousands of AI researchers and frontier AI labs on the world's most important unbeaten benchmark, ARC-AGI-2. Open source progress towards AGI and win prizes!

Grand Prize: $700,000
Paper Awards $75,000
Top Scores: $50,000
To Be Announced: $175,000

ARC-AGI-2 Leaderboard

AI System Score $/Task
o3 (low)* < 5.0% 200.00
o1-pro* < 5.0% $39.00
o1 (high) 3.0% $4.50
ARChitects (2024) 2.5% $0.20
o3-mini 1.7% $0.28
Icecuber 1.6% $0.13
DeepSeek R1 1.3% $0.08
Gemini 2.0 Flash 1.3% $0.004

* Estimate based on partial testing results and o1-pro pricing.

Introducing ARC-AGI-2

A new benchmark that challenges frontier AI reasoning systems.

ARC-AGI-1 was created in 2019 (before LLMs even existed). It endured 5 years of global competitions, over 50,000x of AI scaling, and saw little progress until late 2024 with test-time adaptation methods pioneered by ARC Prize 2024 and OpenAI.

ARC-AGI-2 - the next iteration of the benchmark - is designed to stress test the efficiency and capability of state-of-the-art AI reasoning systems, provide useful signal towards AGI, and re-inspire researchers to work on new ideas.

Pure LLMs score 0%, AI reasoning systems score only single-digit percentages, yet extensive testing shows that humans can solve every task.

Can you create a system that can reach 85% accuracy?

Efficiency Test

ARC-AGI-2: Scale is Not Enough

Log-linear scaling is insufficient to beat ARC-AGI-2.

New test-time adaptation algorithms or novel AI systems are needed to bring AI efficiency inline with human performance.

ARC-AGI-2: Scale is Not Enough

Capability Test

ARC-AGI-2: Symbolic Interpretation

Tasks requiring symbols to be interpreted as having meaning beyond their visual patterns.

Current systems attempt to check symmetry, mirroring, and other transformations, and even recognize connecting elements, but fail to assign semantic significance to the symbols themselves.

Capability Test

ARC-AGI-2: Compositional Reasoning

Tasks requiring simultaneous application of a rules, or application of multiples rules that interact with each other.

In contrast, if a task has very few global rules, current systems can consitently discover and can apply them.

Capability Test

ARC-AGI-2: Contextual Rule Application

Tasks where rules must be applied differently based on context.

Systems tend to fixate on superficial patterns rather than understanding the underlying selection principles.

The North Star to AGI

ARC Prize Foundation

Founded by Mike Knoop (Co-founder, Zapier) and François Chollet (Creator of ARC-AGI, Keras), the ARC Prize Foundation is a non-profit organization with the mission to guide researchers, industry, and regulators towards AGI through enduring benchmarks.

Featured Donors

Toggle Animation