Microsoft has taken the unusual step of turning two of its fiercest AI competitors into collaborators. On March 30th, the company unveiled "Critique," a new feature within Microsoft 365 Copilot's Researcher agent that pairs OpenAI's GPT and Anthropic's Claude in a sequential workflow designed to catch the errors that neither model can reliably fix on its own.
The concept is deceptively simple. When a user submits a research query, GPT handles the initial draft — synthesizing information from web sources and enterprise connectors like Salesforce, ServiceNow, and Confluence. Before the response reaches the user, Claude steps in as an independent reviewer, auditing the draft for factual accuracy, analytical completeness, and citation integrity. Only after passing this second-model review does the response get delivered.
It is a pattern familiar to developers who have long used competing models as informal checks on each other. Microsoft has simply industrialized it.
The results appear to justify the approach. On the DRACO benchmark — a 100-task evaluation framework developed by Perplexity AI that measures deep research accuracy, completeness, and objectivity across domains like law, medicine, finance, and technology — Researcher with Critique scored 57.4. That figure significantly outperforms Claude Opus 4.6 as a standalone model (42.7), Perplexity's own Deep Research mode (50.4), and every other competing research tool from OpenAI, Google, or Anthropic. The benchmark was not developed by Microsoft, lending the numbers additional credibility.
"GPT drafts, Claude reviews for accuracy, completeness, and citation integrity before it's delivered," Microsoft said in its announcement. "We expect this workflow to be bidirectional in the future." That last detail matters. Currently, the relationship is fixed — GPT always writes, Claude always reviews. A bidirectional version would let either model take the lead depending on the task, potentially unlocking further performance gains.
Critique was not the only announcement. Microsoft simultaneously introduced "Council," a companion mode that lets users run the same query through both GPT and Claude independently, then view side-by-side reports with a summary highlighting where the models agree and where they diverge. For enterprise users dealing with high-stakes research — legal analysis, medical literature reviews, competitive intelligence — this kind of transparency could prove invaluable.
The company also announced that Copilot Cowork, a Claude-powered agent capable of planning and executing multi-step tasks autonomously, has entered early access through Microsoft's Frontier program. Taken together, the announcements paint a clear picture of where Microsoft's AI strategy is heading: away from single-model dependence and toward a multi-vendor architecture.
The implications extend well beyond Microsoft's product line. For months, the enterprise AI market has operated on an implicit assumption that companies would pick a model provider and build around it. Critique challenges that premise entirely. If the best results come from running rival models against each other, then the competitive moat shifts from model quality alone to orchestration — the ability to combine models intelligently.
Nicole Herskowitz, corporate vice president of Microsoft 365 and Copilot, told Reuters that the strategy goes beyond simply offering multiple models. The goal, she said, is making them actively collaborate to produce outcomes that neither could achieve independently.
For OpenAI and Anthropic, the dynamic is complicated. Both companies benefit from the expanded distribution that comes with deep Copilot integration. But they are also being positioned as complementary halves of a system that Microsoft controls. The more enterprises come to rely on the Critique workflow, the more leverage shifts toward the orchestrator and away from any individual model provider.
Whether Critique represents a genuine inflection point or a clever product feature will depend on adoption. But the benchmark numbers are hard to dismiss, and the underlying logic — that two competing perspectives produce better results than one — is about as old as peer review itself. Microsoft has simply found a way to automate it at enterprise scale, and in doing so, may have quietly redefined what "best AI" means for the businesses that actually pay for it.










