TrendKia
AllLiveNational
World
All World
PakistanChinaAmericaEuropeAsia
Politics
Uttar Pradesh
Uttar Pradesh
Uttar PradeshBiharMadhya PradeshRajasthanDelhiMaharashtraGujaratPunjabHaryanaWest BengalTamil NaduKeralaKarnatakaTelanganaAndhra PradeshJharkhandChhattisgarhOdishaAssamUttarakhandHimachal PradeshJammu & KashmirGoaChandigarhPuducherry
Travel
Travel
Business
MarketMoneyAutoBenefitsSuccess StoriesCryptoAI
Sports
CricketTennisFootball
EntertainmentMovies, TV & celebrities
BollywoodOTTBhojpuriMovie ReviewsTVHollywood
TechnologyGadgets, apps & innovation
AccessoriesLaunch & ReviewDIY
HealthHealth, fitness & wellness
LifestyleFashion, relationships & lifestyle
Fashion & BeautyCultureRelationshipsTrendsParenting
FoodRecipes, food & restaurants
ReligionFaith, belief & spirituality
FestivalsVastuSpirituality
TravelDestinations & travel guides
Travel Tips
EducationJobs, exams & results
VacanciesAdmissionExamResultsCareer
Live
National
World
Pakistan China America Europe Asia
Politics
Business
Market Money Auto Benefits Success Stories Crypto AI
Sports
Cricket Tennis Football
Entertainment
Bollywood OTT Bhojpuri Movie Reviews TV Hollywood
Technology
Accessories Launch & Review DIY
Health
Lifestyle
Fashion & Beauty Culture Relationships Trends Parenting
Food
Religion
Festivals Vastu Spirituality
Travel
Travel Tips
Education
Vacancies Admission Exam Results Career
Uttar Pradesh Bihar Madhya Pradesh Rajasthan Delhi Maharashtra Gujarat Punjab Haryana West Bengal Tamil Nadu Kerala Karnataka Telangana Andhra Pradesh Jharkhand Chhattisgarh Odisha Assam Uttarakhand Himachal Pradesh Jammu & Kashmir Goa Chandigarh Puducherry
About Contact Privacy Cookies Terms Advertise
TrendKia logo Hindi • English News Platform

TrendKia

Fast • Fresh • Always Trending

TrendKia is a free bilingual Hindi–English news platform — trending stories from India and around the world. Sign in with Google to comment and follow topics.

About Us
TrendKia news app preview
TrendKia
AboutContactPrivacyCookiesTermsAdvertise
A Free Open-Source Coding Model From DeepReinforce Just Outscored Claude Opus 4.7 on Two BenchmarksAI
2 hours ago· 3

A Free Open-Source Coding Model From DeepReinforce Just Outscored Claude Opus 4.7 on Two Benchmarks

DeepReinforce's new Ornith-1.0 family of open-source coding models is built for autonomous agents rather than chat, and it beats Claude Opus 4.7 on two coding benchmarks, though it is aimed strictly at developers already running agent infrastructure.

Amit PatelAmit PatelBusiness Correspondent 5 min read For AI
Share

An AI research lab called DeepReinforce, the team behind the earlier CUDA-L1 project and the IterX code-agent optimization loop, quietly shipped Ornith-1.0 late last week. It is not a single model but a whole family of open-source coding models, now live on Hugging Face in four different sizes measured by parameter count: a 9 billion version, a 31 billion version, a 35 billion mixture-of-experts version, and a 397 billion mixture-of-experts flagship. Every one of them ships under an MIT license with no regional restrictions attached.

Parameters are essentially the dials and configurations a model can juggle while it learns. As a rule, the more parameters a model carries, the more capable it tends to be. A 9 billion parameter model counts as small. It is light enough to run on a decent smartphone, yet it cannot be trusted with genuinely heavy reasoning. The 397 billion flagship is far more powerful, but it demands serious computing muscle, the sort of hardware you will not find in a consumer laptop.

What "agentic" actually means here

The lab calls Ornith "a self-improving family of open-source models specially for agentic coding tasks." That single word, agentic, carries most of the weight. The launch note put it plainly: Ornith-1.0 covers the full span of sizes, from 9B Dense and 31B Dense to 35B MoE and the 397B MoE, and claims state-of-the-art results among open-source models of similar size.

Most of the AI people deal with day to day is conversational. You type something, it answers, and the exchange is over. Agentic AI works differently. It receives a task and then takes its own actions to finish it, without a person steering every step. In a coding setting, that looks like an AI that opens files, runs the tests, works out what broke, rewrites the code, and goes around the loop again until the job is actually done.

In other words, nobody has to sit at the keyboard for most of the process, and that is the entire point. It is also where the most commercially meaningful progress is landing in 2026. A model that can grind unsupervised through a 20-step development workflow is simply worth more than one that writes a tidy function when you ask.

Letting the model build its own playbook

The catch is that most large language models are still designed around human feedback. Most AI coding agents come bolted to a human-designed harness, a fixed rulebook that dictates how the agent should organize its work: when to reach for a tool, how to react to an error, how to break a multi-step problem into pieces. Ornith takes a different route. It "treats the scaffold as a learnable object that co-evolves with the policy." Put simply, instead of borrowing someone else's playbook, it writes its own.

This happens during reinforcement learning, where every training step splits into two stages. First the model reads the task and drafts a sharpened strategy for tackling it. Then it follows that strategy to produce an actual solution. Crucially, the reward from the final outcome feeds back into both stages, so the model learns to write better strategies, not just better code. Repeat that loop thousands and then millions of times, and task-specific approaches start to surface on their own, with no engineer hand-crafting them.

Guarding against reward hacking

DeepReinforce treats reward hacking as a real danger. If a model is allowed to write its own training scaffold, it could in theory build one that cheats the verifier, say by touching a file so a task looks finished when no real work happened. Three layers stand in the way. The environment and the test suite are locked and kept out of the model's reach. A deterministic monitor raises a flag the moment anything tries to reach restricted paths or tamper with the verification scripts. And a frozen judge model sits above the automated verifier with veto power.

The benchmark numbers

The 397 billion parameter flagship scores 82.4 on SWE-bench Verified. That test hands an AI a real bug pulled from an open-source GitHub repository and asks it to fix the problem without ever seeing the test suite, then scores it on the share of issues it actually resolves. That 82.4 edges past Claude Opus 4.7 at 80.8 and DeepSeek-V4-Pro at 80.6 on the very same test. On Terminal Bench 2.1, which runs 89 tasks inside containerized terminal environments spanning everything from debugging async code to closing security vulnerabilities and grades on completion rate, Ornith posts 77.5 against Claude Opus 4.7's 70.3.

There is a contamination worry hanging over SWE-bench. Earlier this year OpenAI argued that some models were padding their scores by memorizing benchmark answers they had seen during training. To address that, Ornith also publishes results on SWE-bench Pro, a tougher variant built on more varied, less-leaked codebases and scored the same way. There the 397 billion model lands at 62.2. That is noticeably lower, but still competitive with the field and still ahead of DeepSeek V4 Pro.

The 9 billion model may be the more striking result. It puts up 69.4 on SWE-bench Verified, beating Gemma 4-31B's 52 and running close to Qwen 3.5-35B's 70, even though it is three to four times smaller than those rivals.

Who this is actually for

Ornith-1.0 is deliberately not a general-purpose AI, and the model's own documentation admits it may stumble on anything outside agentic coding. If you want help summarizing a document, writing a doctoral thesis, or drafting an email, this is the wrong tool. It is tuned for a narrow job: developer pipelines where an AI agent takes a task description, works inside a code repository or terminal session, and finishes multi-step work on its own. It was built for people already running agent infrastructure, not for someone still deciding whether AI is worth the trouble.

The "beats Claude" angle is genuine, but it needs framing. Every lab is now racing to win on agentic coding evals, because that is where the useful performance gaps actually show up. Ornith-1.0-397B does clear Claude Opus 4.7 on both coding benchmarks, yet Anthropic's current flagship, Claude Opus 4.8, scores higher. The comparison that really holds up is within the open-source category, at comparable parameter counts, on coding-specific agent tasks. For developers building self-hosted coding pipelines, agentic infrastructure, or similar work, the small and medium models running on edge hardware could prove genuinely useful. For the average user, though, the answer probably lies somewhere else.

What this means for you

  • For developers: If you already run agentic coding pipelines, the free MIT-licensed 9B and 31B models can run on edge hardware and may genuinely speed up self-hosted development work.
  • For everyday AI users: Ornith is useless for writing emails, summaries or essays, so for general tasks you are better off sticking with a conversational assistant.

Questions & Answers

What is Ornith-1.0?
It is a family of open-source coding models built by DeepReinforce, made specifically for agentic coding tasks and available on Hugging Face.
How many models are in the family and what sizes?
There are four sizes: 9 billion, 31 billion, 35 billion MoE and a 397 billion MoE flagship. All ship under an MIT license with no regional restrictions.
Is it better than Claude?
The 397 billion flagship scores 82.4 on SWE-bench Verified and 77.5 on Terminal Bench 2.1, beating Claude Opus 4.7, but Anthropic's current flagship Claude Opus 4.8 scores higher.
How good is the 9 billion model?
It scores 69.4 on SWE-bench Verified, higher than Gemma 4-31B's 52 and close to Qwen 3.5-35B's 70, despite being three to four times smaller.
Can I use it for writing emails or summarizing documents?
No, the model's own documentation says it may underperform on tasks outside agentic coding, so it is the wrong pick for those jobs.
Who is this model built for?
It is built for developers running self-hosted coding pipelines and agentic infrastructure, not for the average user.
Amit Patel
About the authorAmit PatelBusiness Correspondent Delhi
ExpertiseBusiness News, Financial Markets, Stock Market Analysis, Corporate Affairs, Startups, Entrepreneurship, Economic Trends, Technology Business, Investments, Global Economy

Amit Patel is a Business Correspondent covering global markets, finance, startups, technology, and economic trends. He delivers timely news, market analysis, and insights into the businesses and industries shaping the modern economy.

Amit Patel is a Business Correspondent covering global markets, finance, entrepreneurship, technology, and economic developments. He reports on breaking business news, corporate strategies, stock market trends, startup ecosystems, and industry innovations that shape the global economy. With a focus on accuracy, clarity, and in-depth analysis, Amit helps readers understand complex business topics and their real-world impact. His coverage spans financial markets, multinational corporations, emerging industries, economic policy, investment trends, and digital transformation. Through data-driven reporting and insightful analysis, Amit delivers timely business news and expert perspectives for professionals, investors, entrepreneurs, and general readers alike.

View full profile ↗
#AI#Ornith#DeepReinforce#Open-SourceAI#AgenticCoding#HuggingFace#ClaudeOpus#SWE-Bench

Comments 0

Sign in to join the conversation.

Sign in

No comments yet — be the first.

Three Indian Sailors Killed in Gulf of Oman Strike: Shashi Tharoor Tears Into US Over 'Insensitive' Statement, Presses Jaishankar TooPolitics1
Three Indian Sailors Killed in Gulf of Oman Strike: Shashi Tharoor Tears Into US Over 'Insensitive' Statement, Presses Jaishankar Too
Wall Street's Big Bet on AMZN: Where Could Amazon Stock Land Between 2026 and 2028?Market2
Wall Street's Big Bet on AMZN: Where Could Amazon Stock Land Between 2026 and 2028?
FCC's 'Know Your Customer' Plan Could End Anonymous Phones — Plus the Week's Biggest Breaches and BustsSecurity3
FCC's 'Know Your Customer' Plan Could End Anonymous Phones — Plus the Week's Biggest Breaches and Busts

Latest news straight to your inbox

The day's big stories, in one email.

TrendKia बाज़ारAdvertisementमानसून सेल — हर चीज़ पर 50% तक छूटTrendKia बाज़ारअभी खरीदें →
Citizen journalism

Become a TrendKia journalist

Voice of the people

Share news, photos and videos from your area with TrendKia and let your voice reach the nation. Every citizen a journalist.

Join now
Citizen journalistCitizen journalist
Citizen journalist
Citizen journalist

Related stories

Inside Project Cannes: How Meta Contractors Masqueraded as Vulnerable Teens to Probe Safety Flaws in Rival AI ChatbotsAI
Inside Project Cannes: How Meta Contractors Masqueraded as Vulnerable Teens to Probe Safety Flaws in Rival AI Chatbots
44 min ago
Meta's New AI Reads the Brain and Types for You, No Surgery RequiredAI
Meta's New AI Reads the Brain and Types for You, No Surgery Required
2 hours ago
Beijing Hits Back at Anthropic's Export Walls With Two AI Hacking Agents, One Handed Free to the WorldAI
Beijing Hits Back at Anthropic's Export Walls With Two AI Hacking Agents, One Handed Free to the World
2 hours ago
Flexion Robotics: This Humanoid Robot Acts Like a Skilled Office InternAI
Flexion Robotics: This Humanoid Robot Acts Like a Skilled Office Intern
15 hours ago
Trump Administration Permits Anthropic to Release Mythos to Select US OrganizationsAI
Trump Administration Permits Anthropic to Release Mythos to Select US Organizations
3 days ago
OpenAI's New AI Model Names 'Sol, Terra, Luna' Trigger Crypto Community ReactionAI
OpenAI's New AI Model Names 'Sol, Terra, Luna' Trigger Crypto Community Reaction
3 days ago
OpenAI Limits Release of GPT-5.6 Sol Following Government Cybersecurity RequestAI
OpenAI Limits Release of GPT-5.6 Sol Following Government Cybersecurity Request
3 days ago
OpenAI Delays New AI Models: Here Is the Reason for the Standoff With Trump AdministrationAI
OpenAI Delays New AI Models: Here Is the Reason for the Standoff With Trump Administration
3 days ago