BridgeBenchBridgeBench
Back to blog
6 min readBridgeMind Team

NVIDIA Nemotron 3 Nano Omni Tops BridgeBench Speed at 376 t/s — Free

NVIDIA's free 30B-A3B reasoning model just took the #1 spot on BridgeBench Speed by a 70% margin over the next contender. We ran the full v2 suite to see what builders actually get when the price tag is $0.

NVIDIANemotronOpenRouterbenchmarksBridgeBench v2free models

TL;DR

We pushed `nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free` through the full BridgeBench v2 suite via OpenRouter on 2026-04-28. It is a 30B-parameter mixture-of-experts model with 3B active parameters, free on OpenRouter, and built on NVIDIA's reasoning-tuned Nemotron 3 family.

  • Speed: 376.2 tok/s — rank #1. That is +70% over the next-fastest model we have measured (GLM 5V Turbo at 221 t/s) and ~3.2× faster than Claude Opus 4.7. TTFT lands at 670 ms.
  • Cost: $0.00 across the entire suite. Free tier, no fallback to paid endpoints.
  • Quality is mid-tier on the easy stuff, weak on hard reasoning. It clears small algorithm and debugging tasks but craters on multi-step reasoning, security audits, and BS-pushback.
  • Best use cases right now: ultra-cheap, ultra-fast first-pass drafting, autocomplete-style agentic loops, and any pipeline where you can verify output cheaply downstream.

The Headline Table

All scores are 0–100 (higher is better) except Speed which is tokens-per-second (higher is better). Ranks are within the published BridgeBench v2 cohort.

BenchmarkScoreRankNotes
Speed376.2 t/s#1TTFT 670 ms, $0.00
Hallucination54.3#36 / 48accuracy 50.1%, fabrication 51.0%
Algorithms49.1public 47.4%, hidden 47.4%
Debugging41.8#25 / 27repro 50%, regression 50%, diagnosis 4.5%
Refactoring38.5#14 / 17visible 46.7%, hidden 46.1%, intent 51.3%
BS-Bench36.0#17 / 2028% pushback, 56% accepted
UI Bench33.0#22 / 24completeness 40.4, visual 27.9, interactive 28.3
Reasoning30.1#14 / 17accuracy 6.7%, evidence 69.4
Security10.0#17 / 22visible 7.8%, hidden 8.0%

Source: speed-snapshot.json, hallucination-snapshot.json, debugging-snapshot.json, refactoring-snapshot.json, bs-bench-snapshot.json, ui-bench-snapshot.json, reasoning-snapshot.json, security-snapshot.json in the published BridgeBench v2 dataset (2026-04-28).

Where It Wins: Raw Speed

Nemotron 3 Nano Omni does not just lead the speed leaderboard — it dominates it.

RankModelThroughputTTFTCost (15 runs)
1Nemotron 3 Nano Omni 30B-A3B Reasoning (Free)376.2 t/s670 ms$0.00
2GLM 5V Turbo221.2 t/s5,444 ms$0.15
3Elephant Alpha221.1 t/s508 ms$0.00
4MiMo-V2.5167.0 t/s1,960 ms$0.06
6Claude Opus 4.7116.4 t/s852 ms$0.93
7Qwen 3.6 Max Preview99.9 t/s20,501 ms$0.52

Why this matters for builders. In agentic loops where the model burns thousands of tokens per turn, throughput is the bottleneck. A 376 t/s ceiling means a 5,000-token plan or implementation streams back in ~13 seconds — fast enough to keep a vibe coding session in flow, and cheap enough that you can fan out parallel attempts without watching the meter.

The MoE architecture (30B params, 3B active per token) is doing exactly what it was designed to do: giving you the inference latency of a small model with a slightly larger knowledge base on tap.

Where It Holds Up: First-Pass Code

On the categories where most of the work is "produce a reasonable first attempt," Nemotron 3 Nano Omni is competitive with paid mid-tier models:

  • Algorithms (49.1) — Cleared 9 of 19 BridgeBench algorithm tasks at 100, including graph-bfs-shortest-path, longest-common-subsequence, max-profit-stock, merge-intervals, search-rotated-array, topological-sort, and union-find. The classics are solid.
  • Debugging (41.8) — Reproduced and patched the bug correctly on closure loops, deep clones, graph cycles, object mutation, and promise chains. Diagnosis quality (4.5%) is the weak link — it fixes things without fully explaining why, which matters if you want it to teach as it works.
  • Hallucination (54.3) — A genuinely surprising score for a small model. It cleanly handled doc-http-handler, doc-validation-pipe, and state-machine-claims with zero fabricated APIs. When it does hallucinate, it tends to invent 3–5 fake methods at a time rather than one — so verify before you ship.

Where It Cracks: Long-Horizon Reasoning and Security

The cracks show up exactly where you would expect for a 3B-active reasoning model: tasks that demand long chains of dependent inference and tasks where being *almost* right is the same as being wrong.

  • Reasoning (30.1) — Accuracy is just 6.7%. The model produces well-structured evidence trails (evidence score 69.4) but rarely lands the final answer. The reasoning trace eats most of the token budget before the model can commit to a conclusion.
  • Security (10.0) — This is the one to watch. On the v1 security suite, the model scored under 10% on visible and hidden tests. Do not ship Nemotron 3 Nano Omni's output into security-sensitive paths without a paid model or human review on top.
  • BS-Bench (36.0) — Accepted 56% of clearly-flawed premises without pushback. If you ask it for "JWT tokens with molecular weight," there is a coin-flip chance it will play along.
  • UI Bench (33.0) — It nailed Space Invaders (86.8) and the music visualizer (86.8) but produced empty / non-interactive artifacts on Lava Lamp, Underwater Coral Reef, Neon Sign, Breakout, Flappy Bird, Snake, and Analog Clock. Reasoning models with tight token budgets struggle with the "produce a single big HTML payload" UI bench format.

You can browse the actual artifacts on BridgeBench UI Bench — every cell that shows a score is backed by the raw HTML the model emitted.

How to Use It

Nemotron 3 Nano Omni is not a "swap in for Claude Opus" model. It is a first-pass, throughput-bound, cost-zero option that earns its spot in three workflows:

1. Drafting + verification pipelines. Have Nemotron generate, then route the output through a stricter verifier (rules, tests, or a paid model on the hard cases). 2. High-volume agentic loops. Anywhere you would want 10 parallel attempts at the same problem, this model lets you actually run all 10 for $0. 3. Latency-critical inline assist. Sub-second TTFT and 376 t/s throughput is the right shape for autocomplete, command palette intent parsing, or in-editor chat.

It is not the right pick for: security review, multi-file refactors that span subtle invariants, anything where you cannot cheaply verify the output, or BS-detection in user-facing assistants.

Methodology

  • Model ID: openrouter/nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free
  • Date: 2026-04-28
  • Routing: Direct via OpenRouter (https://openrouter.ai/api/v1)
  • Suite: BridgeBench v2 — Speed, BS, Reasoning, Hallucination, Algorithms, Debugging, Refactoring, Security, UI Bench
  • Settings: Default per-bench (max_tokens 8k–32k, temperature 0). No reasoning-budget overrides.
  • Cost: $0.00 (free tier)

A small number of tasks recorded RUNNER_ERROR due to free-tier rate limiting from running 8 benches in parallel. Re-runs with --resume will likely tick scores up modestly on the code-execution benches; the directional story does not change.

Try It Yourself

The full BridgeBench v2 leaderboard is live on bridgebench.bridgemind.ai. Every snapshot, every artifact, every score — open and reproducible.

To run BridgeBench against your own model:

``bash git clone https://github.com/bridgemind-ai/bridgebench.git cd bridgebench npm install && npm run build:v2 export OPENROUTER_API_KEY=sk-or-... node v2/dist/cli.js speed -m openrouter/nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free ``

Follow @bridgebench and @bridgemindai on X, or join the BridgeMind Discord to compare runs with other builders.