Measuring interactive human-like intelligence in AI
Coming March 25, 2026
ARC-AGI-3 is the first interactive reasoning benchmark designed to measure human-like intelligence in AI. Launching March 25, 2026, it will include 1,000+ levels across 150+ environments that require agents to explore, learn, plan, and adapt. ARC-AGI-3 will display the most authoritative evidence of AI generalization to date.
ARC-AGI-3 uses video-game-like environments where agents must act across multiple steps to achieve long-horizon goals. The games provide no instructions, so players must explore and discover the rules to succeed. Each environment is hand-crafted and novel, so systems cannot memorize their way to success.
Every environment (100%) is human-solvable.
When testing AI, the question isn't whether it solves the environment, it's how efficiently it does so. We measure this through action efficiency: how many actions does it take to complete a goal? This shows how effective a test-taker (human or AI) is at converting environment information into a working strategy.
Humans do this well. AI does not.
The ARC-AGI-3 developer toolkit is a series of tools that allows you to play and interact with ARC-AGI-3 environments either locally (up to 2000 FPS), online, or via a hosted API. The toolkit is the best way to get started with research on ARC-AGI-3. See the documentation.
Public Environment #1
Navigating conditional interactions
Planning and memory in an environment governed by latent state.
Public Environment #2
Use budgets and logic to complete a puzzle
Budget gates to complete a goal across the map.
Public Environment #3
Abstract logic and pattern matching with new mechanics
Complete the pattern while dealing with unified goals.
Public release of ARC-AGI-3 benchmark and competition.
ARC-AGI-3 Developer Toolkit released. View docs
Presentation at MIT. Watch video
Measuring AGI, Interactive Reasoning Benchmarks presentation at AI World's Fair. Watch video