Announcing ARC Prize Verified

Today we're announcing ARC Prize Verified, a program to increase the rigor of evaluating frontier systems on the ARC-AGI benchmark.

In addition to certified score verification, this program adds a third-party academic panel to audit and open source our testing process.

We are also excited to welcome 5 new AI labs as sponsors of ARC-AGI-3, our third-generation Interactive Reasoning Benchmark.

Updates

5 frontier AI labs are sponsoring ARC Prize Foundation to support ARC-AGI-3 development
A newly formed Independent Academic Panel will audit the ARC Prize verification process
The ARC Prize Verified program will certify benchmark scores using official ARC-AGI hidden datasets

New Lab Partnerships

As a nonprofit organization, ARC Prize Foundation is able to grow our impact because of the generous support of our donors. What started with just our co-founders grew to include the support mission-aligned individuals including Tyler Cowen, Dharmesh Shah, and Aaron Levie. Now our donor base is expanding to include the organizations leading the charge to AGI.

We're excited to announce our first external donations from the following AI research labs.

All new funds will specifically be used to:

Increase the quality of ARC-AGI-3 — More games, higher quality
Build ARC-AGI-3 technical infrastructure — API, offline game engine
Expand researcher access & adoption — Academic engagement, outreach, education to ensure our benchmarks are used, not just built

As always, ARC Prize Foundation remains an independent organization maintaining a bias-free testing policy for models selected for verification. These funds do not influence our testing or score verification. All lab donors have agreed to our standard testing policy. In plain language: donating to the nonprofit does not impact verification scoring for donors.

Independent Academic Panel

We are also recruiting notable AI and human psychology leaders from academia to contribute to our mission.

We're excited to welcome Todd Gureckis (Professor of Psychology at NYU), Guy Van den Broeck (Professor of Computer Science at UCLA), Melanie Mitchell (Professor at the Santa Fe Institute), and Vishal Misra (Vice Dean of Computing and AI at Columbia) to serve as independent validators of the ARC Prize testing process. We expect more panel members to be added soon.

Academic Advisor Panel — From left to right: Todd Gureckis/NYU, Guy Van den Broeck/UCLA, Melanie Mitchell/Santa Fe Institute, Vishal Misra (Vice Dean of Computing and AI at Columbia)

The academic panel will:

Provide external oversight of our hidden test set benchmarking protocol
Audit and validate our testing methodology to ensure fairness and consistency
Co-author open source best practices for benchmark testing integrity

We're looking for additional academic leaders with expertise in AI evaluation, research methodology, and academic integrity. If you're interested in serving on our academic panel, please reach out to team@arcprize.org.

ARC Prize Verified

As ARC Prize Foundation, and especially the ARC-AGI family of benchmarks, have gained popularity, organizations have been eager to announce the ARC-AGI scores of their models (or systems). Reasons for doing so include leveling up open source state-of-the-art solutions, promoting frontier model performance for product launch events, and even early-stage startup fundrasing. Our mission is to drive open AGI progress, so we love seeing the benchmark and our efforts provide so much value to the larger research community.

It's very important to note, however, that self‑reported or third‑party figures often vary in dataset curation, prompting methods, and many other factors, which prevents an apples‑to‑apples comparison of results. This causes confusion in the market and ultimately detracts from our goal of measuring frontier AI progress.

From the beginning, ARC Prize Foundation has evaluated select models on a hidden test set so that reported scores reflect generalization rather than overfitting to public tasks. Testing models on tasks they have never seen before ensures that systems are truly demonstrating general reasoning capabilities rather than memorizing specific examples from training data. This process is fundamental to the integrity of the benchmark and outlined in our testing policy.

How Verified Scores Work

Official Verification: Only scores evaluated on our hidden test set through our official verification process will be recognized as verified performance scores on ARC-AGI
Transparency: While the test set remains hidden, our testing methodology and verification process are documented and validated by our academic panel
Verification Badge: Models that pass our verification process will be included on our official leaderboard and receive a verification badge to display alongside their results

If your team is interested in working with us to verify a score, please reach out to team@arcprize.org.