AI Agents Enter the Proving Grounds

2025-08-12 03:39:00

In Brief

Recall Labs’ CEO, Andrew Hill, discusses the challenge of building and trusting AI agents, highlighting the creation of an onchain arena for public performance verification.

What inspired you to create Recall Network, and why structure it as an onchain arena for AI agents?

The internet is shifting from information to action. Not just searching, but delegating. Agents are already writing code, managing portfolios, summarizing research. The problem isn’t creation. It’s trust. Anyone can spin up an agent. Few can prove performance.

Recall was built to solve that — not as a product, but as protocol infrastructure. A credibly neutral network where agents prove their capabilities in public and onchain. Competitions and evaluations become proof. Reputation becomes portable. And discovery is no longer a guessing game.

We don’t just want more agents. We want a system that connects and incentivizes AI to solve humanity’s problems.

What core problem in today’s AI landscape are you aiming to solve — and how does Recall uniquely address it?

We have too many agents and too little trust. The bottleneck isn’t capability. It’s evaluation. Which agents are real? Which are just a prompt and a landing page? Right now, the answers come from marketing and hype. We want them to come from proof.

Recall turns benchmarking into a living and evolving network function. Agents earn reputation by competing. Curators earn tokens for surfacing performance. Consumers follow rankings, not hype.

In a world where most AI systems operate as black boxes, how realistic is the shift toward full transparency and public performance metrics?

It’s already happening. The shift is demand-driven. Users want to know what an agent can do before they integrate it. Smart users want to take part in testing and benchmarking their limits. Builders want real benchmarks, not vague comparisons.

On Recall, every agent action is logged. Every competition is replayable. Evaluations are composable and changing. We expect other systems to adopt this standard over time because it works.

How do you prevent gaming or manipulation in a system where agents are rewarded for performance?

You can’t prevent attempts, but you can make them unprofitable — using a token to back honest agents and slashing them for dishonesty. Gaming and manipulation are surfaced through a combination of automated systems and humans in the loop, allowing humans to weed out the bad. We’ve already seen this in action in our live competitions, where curators identify dishonest behaviors and kick the agents from the leaderboard

What kinds of tasks or competitions are most meaningful for evaluating AI agents today — and how do those evolve as models get smarter?

Tasks that stress reasoning, context, or real-world judgment are the highest signal in chat agents. For us, we’re focused on trading right now because there are a lot of agents to evaluate, a lot to understand regarding AI’s ability to manage crypto portfolios, and a lot of uncertainty from consumers about whether any agent can successfully trade. For us, AI-based crypto trading for the masses is not an if but a when. We hope to play a major role in accelerating this through benchmarking and competitions

How do you see the role of blockchain in the future of AI — infrastructure layer, governance layer, accountability layer?

Blockchain can function as all three of these layers. The competitions take place onchain, with governance rails for rules of engagement and a shared ledger for agent behavior. Blockchain gives us public memory, verifiable history, and programmable trust

However, its most important role is economic. It lets us reward the human layer that keeps AI honest.

Do you see onchain agents replacing traditional SaaS models — or complementing them?

They’ll start by complementing. Then outperforming. Finally, they’ll replace whole categories.

What role do you think crypto primitives — like tokens, staking, or slashing — will play in managing AI behavior at scale?

Tokens let creators pay for visibility. They let curators earn for surfacing value. They create durable records of conviction. Staking binds belief to cost. Slashing turns failure into feedback.

How should we think about interoperability between agent systems — will standards emerge or stay fragmented?

Fragmentation is the default. Interoperability emerges when it’s easier to plug in than to rebuild. A2A is still pretty unproven as a protocol for crossing organizational boundaries. But to use agents across organizations, consumers and businesses need systems like Recall to create trustworthy benchmarks and security for users

How do you see the relationship between foundation models and agent frameworks evolving over the next few years?

Foundation models will continue to improve. We think of the agent-layer as everything built on top of raw models; agents are the software models that are the database. The interface we use is going to continue to abstract more and more routing and variation beneath. Grok 4 Heavy is already a swarm of agents. Rumors suggest that GPT-5 will be an advanced routing system with many models and agents being used to solve the right tasks. Open systems will follow suit

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

0/400

No comments

Topic
#July PPI Beats Expectations
15k Popularity
#ETH ETFs Top $30B
15k Popularity
#Gate Alpha Peak Trading Competition
139k Popularity
#Bessent on BTC Reserves
2k Popularity
#Gate Releases August Reserves Report
20k Popularity

sitemap