ARC-AGI has evolved from its first version (ARC-AGI-1) which measured basic fluid intelligence, to ARC-AGI-2 which challenges systems to demonstrate both high adaptability and high efficiency.
The scatter plot above visualizes the critical relationship between cost-per-task and performance - a key measure of intelligence efficiency. True intelligence isn't just about solving problems, but solving them efficiently with minimal resources.
For more information on our reporting process, see our testing policy.
Only systems which required less than $10,000 to run are shown.
For models that were not able to produce full test out puts, remaining tasks were marked as incorrect.
Results marked as "preview" are unofficial and may be based on incomplete testing.
1 ARC-AGI-2 score estimate based on partial testing results and o1-pro pricing.
2 Provisional cost estimates based on Gemini 3 Pro pricing. Model to be retested once released.