ARC-AGI Community Leaderboard

ARC-AGI has gained significant popularity over the past two years, and we've been overwhelmed by the number of researchers and builders who want to showcase their work to the community. The ARC-AGI Community Leaderboard provides a landing spot for these submissions, where the community can review, discuss, and verify results together.

Community Leaderboard submissions must be general purpose and reproducible. Scores are self-reported unless noted otherwise. Results on the ARC-AGI-1 and ARC-AGI-2 semi-private sets are run and verified by ARC Prize. Everything else is scored on a public set and self-reported. We aren't able to verify the authenticity of self-reported scores and won't independently verify submissions except in extraordinary cases, so we encourage the community to explore each submission and validate the results for themselves. We reserve the right to determine what qualifies. For more on how we approach testing, see our testing policy.

To submit your work, head to the ARC-AGI Community Leaderboard repo on GitHub.

Name	Authors	Benchmark	Score	Cost		Links
Tycho A multimodal model maintains one growing conversation per game and can delegate a falsification-tested executable world model to a builder for planning.	Jens Lehmann, Andrei Aioanei, Sahar Vahdati	ARC-AGI-3Public Demo	100.0%	$2,986	2026-07-29	codescorecard
Retrodict A general LLM agent that logs every frame and requires rule hypotheses to retrodict recorded history before spending live actions.	Ryan Brown	ARC-AGI-3Public Demo	99.9%	$654	2026-07-19	codescorecard
baseline1 Coding agent that builds and verifies an executable Python world model, then plans through it.	Sergey Rodionov	ARC-AGI-3Public Demo	99.0%	$400	2026-07-15	codepaper scorecard
NOOA (NVIDIA-labs OO Agents) A general CodeAct agent that builds reusable NumPy world-model helpers and persists learning through memory or Markdown knowledge files.	Gal Kaplun, Elad Sarafian, Paul Furgale, Ricardo Silveira Cabral, Ron Banner	ARC-AGI-3Public Demo	85.1%	$332	2026-07-09	codepaper scorecard
Polyphony Agent - ARC A coding agent grows a verified per-game Heuristic System for state, dynamics, planning, and action selection as readable Python files.	Ruiyang Yu, Anyang Su, Chenxu Zhao, Tianyu Fu, Shuo Wang, Minghui Wu	ARC-AGI-3Public Demo	19.8%	$115	2026-07-07	codescorecard
OPINE-World Two agents iterate in a CEGIS-style loop: one acts while the other rewrites and adversarially tests an executable game engine used for planning.	David Courtis, Wenhao Li, Scott Sanner, D3M Lab	ARC-AGI-3Public Demo	78.4%	$1,040	2026-07-01	codepaper scorecard
ARM (Abstraction Reasoning Model) A Tiny Recursive Model variant whose H-level replaces spatial compression with 32 learned Perceiver/Set-Transformer-style latent tokens.	Marcel Siegert	ARC-AGI-1Public Train	40.3%	$3	2026-06-23	code
Continual Harness A self-improving ARC-AGI-3 orchestrator whose Refiner rewrites its base policy, memory, skills, and subagents from trajectory evidence.	Ruirong Feng, Seth Karten, Wenzhe Li, Chengshuai Shi, Joel Zhang, Tersoo Upaa Jr, Kiran Vodrahalli, Chi Jin	ARC-AGI-3Public Demo	20.5%	$774	2026-06-18	codepaper scorecard
Vision - Continual Learning v1 Multimodal agent with continual-learning weights carried across games and levels.	Vansh	ARC-AGI-3Public Demo	63.1%	$4,788	2026-05-18	codescorecard
OpenClaw OpenClaw Harness adapted to play ARC-AGI-3 allowed memory and code execution tools.	ARC Prize Foundation	ARC-AGI-3Public Demo	5.2%	$2,912	2026-05-15	codescorecard
DreamTeam Six fixed agent roles coordinate through a shared file workspace and an executable world model that they build and revise at run time.	Elad Sarafian, Gal Kaplun, Ron Banner, Daniel Soudry, Boris Ginsburg	ARC-AGI-3Public Demo	38.1%	$18,000	2026-05-04	codepaper scorecard
Human Intelligence Harness Maximum human intelligence built into an agent harness.	ARC Prize Foundation	ARC-AGI-3Public Demo	95.3%	-	2026-04-14	codescorecard
Noemon's agentic ARC-AGI 2 Solver (Gemini 3.1 Pro) A Reasoner → Validator → Refine loop that learns an instruction set with persistent thought history, self-consistency, and a judge for candidate selection.	Noemon	ARC-AGI-2Public Train	92.5%	$4	2026-04-09	code
TELL Single-conversation agent that compounds confirmed knowledge in a MEMORY.md file.	Dots Post-train Team	ARC-AGI-3Public Demo	43.9%	$1,406	2026-04-09	codescorecard
a-evolve MAS Evolved Evolved multi-agent orchestrator with 9 learned skills mined from competition logs.	Zhan Shi, Hanqing Lu, Bing He, Yisi Sang, Minhua Lin	ARC-AGI-3Public Demo	12.3%	$5,300	2026-04-09	codescorecard
Read-Grep-Bash Agent A coding agent that uses search and Python scripting over game logs.	Alexis Fox, Junlin Wang, Paul Rosu, Bhuwan Dhingra	ARC-AGI-3Public Demo	50.2%	-	2026-03-13	codepaper scorecard
Evolutionary Test-Time Compute with Natural Language Instructions Evolves natural language instructions instead of code.	Jeremy Berman	ARC-AGI-2Semi-Private	29.4%	$3,648	2025-09-16	codepaper
Efficient Evolutionary Program Synthesis Evolves a growing library of Python programs with an LLM.	Eric Pang	ARC-AGI-2Semi-Private	26.0%	$476	2025-09-01	codepaper
Tiny Recursive Model (TRM) 7M parameter recursive model with think-act refinement loops.	Alexia Jolicoeur-Martineau	ARC-AGI-2Public Train	7.8%	$252	2025-07-01	codepaper
Hierarchical Reasoning Model (HRM) Brain-inspired 27M parameter model with iterative refinement.	Sapient Intelligence	ARC-AGI-2Semi-Private	2.0%	$201	2025-06-08	codepaper
Evolutionary Test-time Compute Genetic algorithm over LLM-generated Python transforms.	Jeremy Berman	ARC-AGI-1Semi-Private	53.6%	$2,900	2024-12-18	codepaper
Ryan Greenblatt LLM generates and refines thousands of candidate programs per task.	Ryan Greenblatt	ARC-AGI-1Semi-Private	43.0%	$40,000	2024-06-17	codepaper

ARC-AGI Community Leaderboard

ARC Prize 2026

ARC Prize: Newsletter