When AI companies talk about building autonomous agents — software that can independently research, code, and solve complex problems — they inevitably run into two walls. The first is what researchers call "context explosion": multi-agent systems generate up to fifteen times more data than a standard chatbot conversation, resending full histories, tool outputs, and reasoning chains at every turn. The second is the "thinking tax," where using powerful reasoning models for every subtask makes the whole operation prohibitively slow and expensive.

On Wednesday, NVIDIA launched Nemotron 3 Super, an open-weight model that attacks both problems simultaneously. The result is already turning heads across the industry.

The model contains 120 billion total parameters but activates only 12 billion at any given moment, using a mixture-of-experts architecture that routes each query to the most relevant specialists. This design choice is not merely about efficiency — it fundamentally changes the economics of running autonomous AI systems at scale. NVIDIA claims the model delivers five times the throughput of its predecessor while doubling accuracy on key benchmarks.

What makes Nemotron 3 Super architecturally distinctive is its hybrid backbone. Rather than relying solely on the transformer architecture that powers most large language models, NVIDIA interleaves Mamba layers — based on state space models that process sequences in linear time — with traditional transformer attention layers. The Mamba layers handle the bulk of sequence processing, making the model's one-million-token context window practically usable rather than a theoretical ceiling. The transformer layers, meanwhile, preserve the precise recall capabilities needed when an agent must locate one specific fact buried in thousands of pages of documentation.

A new technique called Latent MoE compresses tokens before they reach the expert layers, effectively activating four specialist networks for the computational cost of one. Combined with multi-token prediction, which generates multiple future words in a single forward pass, the model achieves roughly three times faster inference than conventional approaches.

The performance numbers are striking. Nemotron 3 Super has claimed the top position on Artificial Analysis for efficiency among open models and powered NVIDIA's AI-Q research agent to first place on both the DeepResearch Bench and DeepResearch Bench II leaderboards — benchmarks that measure an AI system's ability to conduct thorough, multistep research across large document sets while maintaining reasoning coherence. On PinchBench, which evaluates how well language models perform as the reasoning core of autonomous agents, the model scored 85.6 percent, the highest among open models in its class.

Enterprise adoption has been immediate. Perplexity is offering the model to its users for search and as one of twenty orchestrated models powering its Computer product. Software development companies including CodeRabbit, Factory, and Greptile are integrating it into their coding agents. On the enterprise side, Palantir, Siemens, Cadence Design Systems, and Dassault Systèmes are deploying and customizing the model for workflows spanning cybersecurity, semiconductor design, and manufacturing automation.

Perhaps most significant is NVIDIA's decision to release the model with fully open weights under a permissive license. The company is publishing the complete training methodology, including over ten trillion tokens of pre- and post-training datasets, fifteen training environments for reinforcement learning, and full evaluation recipes. In an industry where the most capable models are increasingly locked behind API paywalls, NVIDIA is betting that openness — combined with hardware optimization for its Blackwell platform, where the model runs in NVFP4 precision at four times the speed of FP8 on the previous Hopper generation — will drive both adoption and GPU sales.

The release signals a broader shift in how the industry thinks about AI models. Rather than chasing ever-larger general-purpose systems, NVIDIA has built something purpose-designed for the emerging reality of autonomous agents: efficient enough to run continuously, capable enough to reason through complex multi-step problems, and open enough for developers to adapt to their specific needs. As the AI industry moves from chatbots to systems that can independently carry out hours of work, models like Nemotron 3 Super may prove to be the infrastructure that makes that transition viable.

// LATEST INTELLIGENCE