AGI remains unsolved.
New ideas still needed.
Published 12 Mar 2025

Benchmarks Matter: ARC Prize Foundation's Recommendations for the U.S. AI Action Plan

AI benchmarks are not just measurement tools. They build ground truth about model capabilities and accelerate progress toward AGI, shining a light on technological gaps. The way we evaluate models shapes investment cycles, research priorities, government procurement, trade policy, and international competition.

At ARC Prize Foundation, we’ve been focused on launching ARC-AGI-2 benchmark but know the next four years will be crucial for pushing AI forward. As part of this, we believe the U.S. has a unique opportunity to strengthen its AI benchmarking infrastructure, ensuring progress is measured in ways that unlock innovation.

That’s why we submitted a formal response to the Office of Science and Technology’s (OSTP) Request for Information (RFI) on the Artificial Intelligence Action Plan. We make the case that benchmarks should be a core part of the U.S. AI strategy and are integral to maintaining AI leadership.

Here are three steps the government should take:

1. Assert U.S. leadership in defining global AI standards

AI standards are being set now — through international bodies like ISO, the ITU, and the IEC — that will shape global markets and regulations for years to come. Rather than adapting to rules created by other countries, the U.S. must drive these discussions to promote open scientific progress and a thriving innovation ecosystem. Benchmarks lay the foundation for AI standards-setting and our work on ARC-AGI bolsters these efforts.

2. Establish a U.S. AI benchmarking hub

The government needs a dedicated, centralized office with experts who can track, interpret, and apply AI benchmarks to support decision making. We recommend the Administration commit to maintaining an AI benchmarking hub within AISI/NIST, or another agency, to:

3. Cultivate an ecosystem of AI benchmarking organizations

AI benchmarking expertise already exists in research labs, independent organizations, academia, and open source communities – the government should leverage this ecosystem rather than build from scratch. We propose a public-private collaboration model, similar to how AI R&D is funded through NSF and DARPA. Partnering with external benchmarking organizations ensures evaluations remain credible, adaptive, and rigorous.

These recommendations would influence how AI systems are evaluated, inform policy, and help define international AI frameworks.

A robust benchmarking infrastructure means:

Read our full OSTP RFI submission here: Development of an Artificial Intelligence Action Plan

Toggle Animation