When the Robots Run Their Own Experiments
At Nvidia's GEAR lab, a fleet of eight robot arms spent the past few weeks figuring out, entirely on their own, how to insert pins, seat graphics cards and cut zip ties. The only humans in the picture were the ones who sat down afterward to write the paper.
That capability came from ENPIRE, a framework laid out in a paper published Tuesday by researchers at Nvidia, Carnegie Mellon University and UC Berkeley. ENPIRE hands the entire job of training a robot over to AI coding agents, the same software that already writes and tests its own code, and then lets that process run directly on physical hardware.
Dragging the Loop Off the Screen
Coding agents such as OpenAI's Codex, Anthropic's Claude Code and Moonshot's Kimi Code have spent the past year doing what researchers call autoresearch, writing code, testing it and rewriting it without a person in the loop. Until now that loop has mostly lived on a screen, where restarting a failed experiment costs nothing at all. ENPIRE pulls it into the physical world, where resetting an experiment means physically moving a real robot arm.
How ENPIRE Splits the Job in Two
The system breaks the work into two stages. In the first, a human walks the agent through building two permanent tools. One is a reset routine that returns the workspace to a fresh starting position, and the other is a reward function that watches camera footage and scores how well a task went, essentially a referee that never blinks and never breaks for lunch. That groundwork happens just once and is then reused for every attempt that follows.
Once those tools exist, the agent takes over completely. It digs through published research for ideas, chooses among training methods like imitation learning, reinforcement learning or hand-written rules, rewrites its own code and tests the outcome on the robot. None of that requires a person to watch, which feels either liberating or faintly unsettling depending on how you feel about a robot holding a pair of scissors unsupervised.
A Fleet That Shares What It Learns
Nvidia ran the experiment across eight bimanual robot stations, each with its own hardware, computer and coding agent. The stations swap progress through Git, the same tool coders use to merge code, so a winning idea spreads across the whole fleet within minutes.
The Numbers Behind the Speed-Up
Researchers measured the payoff on two tasks. The first was “Push-T,” where a robot slides a T-shaped block into a target zone using only pushes, and the second was pin insertion, where it threads pins into 4-millimeter holes. Going from one robot to eight cut the time to master Push-T from roughly five hours down to two, and pin insertion from more than 90 minutes to about 40.
Across the four real-world tasks tested, the agents drove their policies to a 99% success rate, according to the paper. On pin insertion, the agents reached near-perfect reliability faster than a comparable human-in-the-loop method, the kind that still needs someone to show up every morning.
What the Researchers Say
Nvidia's Jim Fan, the GEAR Lab co-lead who directs the company's AI research, described the project as an effort to enable AutoResearch in the physical world for the first time. Fan said the team handed the agents a fleet of robots, a GPU allocation and a token budget, then stepped back and let the robots take over. On June 16, 2026, he wrote:
Today, we enable AutoResearch in the physical world for the first time! Introducing ENPIRE: we give 8 Codex agents a fleet of robots, an allocation of GPUs, and generous token budget. We set them free with a simple goal: solve the task as quickly as possible, keep the robots busy…
Where Simulation Stops and Reality Begins
The gap between simulation and reality showed up almost immediately. According to the paper, all three coding agents solved Push-T inside a simulator, yet two of the three failed once the same task moved onto a physical robot. Simulators do not have friction problems. Real tables do.
Nvidia also put ENPIRE through RoboCasa, a simulated kitchen benchmark that grades robots on chores like opening cabinets or turning off stoves by success rate, mercifully with no risk of burning the place down. There, ENPIRE beat both Nvidia's own end-to-end model GR00T and CaP-X, a tool-using agent that skips the autoresearch loop entirely.
From Eureka to Real Hardware
ENPIRE builds on an idea Nvidia first floated with Eureka, a 2023 system that used a language model to write reward functions for robots inside a simulator instead of relying on human engineers to do it by hand. ENPIRE takes that self-improvement loop off the simulator and onto real hardware, with the agent designing its own tests rather than just its own rewards.
A Race Taking Shape Across the Industry
The release arrived the same week Alibaba unveiled its own embodied-AI push, the Qwen-Robot Suite, a trio of foundation models for robot navigation, manipulation and physics simulation. Alibaba is building software brains for robot bodies it does not manufacture, while Nvidia is testing whether agents can run the entire research loop on hardware it owns end to end. Both point to the same trend, that physical robots are fast becoming the next arena for coding agents to compete in.













