editorialJune 11, 2026

MIT Studied 30 AI Agents. We Review 307. Here Is What They Found and What They Missed.

The MIT 2025 AI Agent Index catalogued 30 agents across 45 fields. We cover over 300 across eight business categories. Here is where the two projects agree, where they differ, and what buyers still need to know.

By Heather MacAvelia

MIT Studied 30 AI Agents. We Review 307. Here Is What They Found and What They Missed.

A team of researchers from MIT, Cambridge, Harvard, Stanford, and several other institutions published the 2025 AI Agent Index earlier this year. The paper first appeared in February, was updated in May, and is being presented at the ACM FAccT conference in Montreal later this month. It is a systematic catalogue of 30 AI agents, annotated across 45 fields by seven subject matter experts, covering everything from autonomy levels to safety disclosures to foundation model dependencies.

I have been building a different kind of AI agent index since early 2026. Ours covers over 300 agents across eight business categories, with editorial ratings, verified pricing, integration depth, and buyer decision data. The MIT team and I are clearly looking at the same space, but from very different angles and for very different audiences.

Their work is worth reading carefully. Here is what stands out to me as someone who spends every day inside the data these agents produce, the pricing pages they publish, and the gaps they leave for buyers to fill.

What MIT Got Right

The transparency gap is accurate, and it is worse than most people realize. MIT found that 25 of 30 agents disclose no internal safety evaluation results. 23 of 30 have no third-party testing information. Developers are happy to publish capability benchmarks but go quiet when asked about safety.

I see a parallel version of this every week when I audit agent listings. Vendors will put detailed feature comparisons on their marketing pages but hide pricing behind a demo request. They will publish a security badge on their homepage, but the actual SOC 2 report is nowhere to be found. They will claim thousands of customers without naming one. Transparency is selective everywhere in this industry, not just on safety.

Foundation model concentration is a structural risk nobody talks about. MIT found that almost all 30 agents depend on GPT, Claude, or Gemini model families. Only frontier labs and Chinese developers run their own models. This creates a dependency chain where a pricing change, a policy shift, or a model deprecation at OpenAI, Anthropic, or Google ripples through hundreds of downstream products overnight.

We track this in our listings. When I audit an agent, I note which models it runs on, because a tool built entirely on one model family carries a different risk profile than one with multi-model architecture. Spellbook, for example, runs on both GPT-5 and Claude Opus. That is a deliberate hedge. Most agents do not make that choice.

The autonomy spectrum is more nuanced than marketing suggests. MIT classified agents into five autonomy levels and found that the gap between what vendors advertise and what the product does is significant. Enterprise platforms are configured at Level 1 or 2 by users, but the deployed agents run at Level 3 to 5 triggered by events without human involvement. That disconnect matters for buyers who think they are getting a copilot and are actually deploying an autonomous system.

What They Missed

MIT chose 30 agents based on three criteria: agency, impact, and practicality. Impact required either 10,000+ Google searches, 20,000+ GitHub stars, or a developer valuation over $1 billion. That filter is reasonable for academic research, but it excludes the overwhelming majority of agents that businesses actually evaluate and buy.

The mid-market is invisible in their analysis. Agents like Balto (real-time call guidance for contact centers), Richpanel (self-service for ecommerce support), Custify (customer success for SaaS), or Pipedrive AI (CRM with AI deal coaching) will never hit 10,000 monthly searches. They serve thousands of paying customers who found them through word of mouth, vendor directories, or G2. These tools are where most B2B buying decisions happen, and they are entirely absent from the MIT dataset.

Pricing reality is outside their scope. MIT does not evaluate pricing structures, contract flexibility, or cost-per-seat calculations. For a research index focused on safety and transparency, that makes sense. But for the buyer trying to decide between Intercom Fin at $0.99 per resolution versus Zendesk AI at $1.00 per automated resolution versus Freshdesk Freddy at included-with-plan pricing, the MIT index offers no help.

We publish pricing transparency ratings (public, partial, quote-only), contract types (monthly, annual-only, both), and starting prices for every listing. These are the fields that drive actual purchase decisions.

MCP compatibility is noted but not mapped. MIT found that 20 of 30 agents support MCP for tool integration, with enterprise agents leading at 13 of 13. That is a useful data point, but it does not tell you which agents have official MCP servers you can connect today versus which ones have community wrappers that may break tomorrow. We verify MCP compatibility against official vendor documentation only, and we track it as a filterable field across over 300 agents.

Safety disclosure gaps also exist in how vendors treat buyer data. MIT focused on whether developers publish AI safety frameworks and agentic evaluations. That is important research. But there is a more immediate transparency question for enterprise buyers: does this vendor train on my data? We added a data_training field to every listing (no, opt-out, yes, or not-disclosed) because that is a purchasing blocker for regulated industries. It turns out that "not-disclosed" is the most common value, which tells its own story.

Where the Two Indexes Complement Each Other

MIT is asking: are these systems safe, transparent, and accountable? We are asking: which one should your team actually buy, and what will it cost?

Those are not competing questions. A buyer evaluating Salesforce Agentforce needs both the MIT finding that enterprise agents often operate at higher autonomy levels than their configuration interfaces suggest, and our finding that Agentforce pricing starts at $2 per conversation with a $5 per conversation option for sales coaching, with no monthly seat fee but a Salesforce Enterprise or Unlimited license required.

The MIT team indexed 30 agents in depth across 45 fields with seven expert annotators. We index over 300 agents across a different set of fields, verified weekly through live Chrome audits, with editorial ratings calculated from a locked formula. Their research is a point-in-time snapshot as of December 31, 2025. Our data updates continuously, and every listing shows a "last verified" date so you know exactly how current the information is.

If you are a policymaker or researcher studying the AI agent ecosystem at a structural level, the MIT AI Agent Index is the right starting point. If you are a revenue leader evaluating which AI sales agent to buy for your team, or a support director comparing autonomous resolution rates and per-ticket costs, that is what we built this directory for.

The Bigger Picture

The fact that two separate projects called "AI Agent Index" exist and cover different ground is itself a signal. The agent ecosystem is growing faster than any single effort can document. MIT has 7 annotators. I have Viktor and a Chrome browser. Between the two projects, over 300 commercial agents and 30 frontier systems are covered, and there are still hundreds more shipping every month.

The transparency problems MIT identified do not fix themselves. Vendors will not voluntarily publish safety evaluations or stop hiding pricing behind demo requests. Independent documentation, whether it comes from an academic research team or a solo operator with a Supabase database, is what keeps this market honest.

You can explore the MIT AI Agent Index at aiagentindex.mit.edu. You can explore ours at theaiagentindex.com. We cover every agent they studied, plus 277 more.

mitresearchtransparencysafetyindustry-analysis

Explore the directory

Search, filter, and compare over 300 AI agents across eight business categories.

Browse AI Agents