MedicalBenchmark

Blog

Articles, news and analysis about AI in medicine

188 Net Points: Bianca Ciobanu Breaks the MIR Record — But AI Already Reached 200
March 2, 202611 min read

188 Net Points: Bianca Ciobanu Breaks the MIR Record — But AI Already Reached 200

Bianca Ciobanu Selaru enters history with 188 net points, the best human result ever recorded in the MIR exam. 41 years old, Romanian origin, proof that perseverance breaks moulds. But the human record arrives at a singular moment: three AI models have already solved the complete exam — 200 out of 200 — and fifteen surpass 194 net points. We analyse what this double milestone means with data, charts and context.

MIR 2026ResultsAI vs Humans
Read more
Two Weeks Later: 22 New Models and a Triple 200/200 in MIR 2026
February 20, 202610 min read

Two Weeks Later: 22 New Models and a Triple 200/200 in MIR 2026

From February 5 to 20, 2026, we added 22 new models to the benchmark. In just 15 days we went from 99.5% to 100%: Gemini 3.1 Pro Preview arrives with 200/200, Qwen3.5 397B A17B breaks the open-weights ceiling in the global ranking, and MedGemma leaves an uncomfortable lesson about what "health specialization" really means. Technical storytelling with new charts about the perfect tie, the time-based tiebreaker, and what happens to a benchmark once it hits the ceiling.

MIR 2026BenchmarkGemini 3.1
Read more
ALMA and MIRI achieve the highest possible score on the MIR 2026 exam with 100% accuracy
February 11, 202626 min read

ALMA and MIRI achieve the highest possible score on the MIR 2026 exam with 100% accuracy

Two medical AI models developed in Spain achieve unprecedented results. ALMA answers all 600 questions from the last three MIR exams without a single error — an absolute 100% that no other model has achieved. MIRI reaches 99.3% at 13 times lower cost and much faster response times. They are not general-purpose models: they are Agentic RAG architectures with specialized experts, built by BinPar and Editorial Médica Panamericana, proving that the future of medical AI lies not in bigger models, but in smarter ones.

MIR 2026ALMAMIRI
Read more
The Cathedral and the Bazaar: Open Source vs Proprietary in MIR 2026
February 9, 202618 min read

The Cathedral and the Bazaar: Open Source vs Proprietary in MIR 2026

The top 33 positions in the MIR 2026 ranking are all proprietary models. The best open model lands at position 34. We analyze the gap between open and closed models, the real taxonomy of open source in AI — where many self-proclaimed open models are cathedrals with half-open doors — and why RAG outperforms fine-tuning for customizing medical AI without losing control of your data.

MIR 2026Open SourceOpen Weights
Read more
The Swiss Army Knife and the Scalpel: Why the Best Coding Models Fail the MIR
February 6, 202615 min read

The Swiss Army Knife and the Scalpel: Why the Best Coding Models Fail the MIR

Claude Opus 4.6 and GPT-5.2-Codex are the most advanced AI coding models, capable of coordinating agent teams and partially building themselves. But in MIR 2026, a Flash model costing just 0.34 euros humiliates them. The Swiss Army knife of programming cannot compete with the scalpel designed to cut. Analysis of the agentic paradox with data from 290 models that demonstrates why specialization outperforms raw power in the medical domain.

MIR 2026Agentic ModelsClaude Opus 4.6
Read more
199 out of 200: AI Only Fails Once in MIR 2026
February 5, 202618 min read

199 out of 200: AI Only Fails Once in MIR 2026

Final results of the largest medical AI benchmark in Spanish. Three models tie with 199 correct answers out of 200 valid questions — a 99.5% accuracy that no human has ever achieved in MIR history. A 'Flash' model leads for the third consecutive year, proving that more expensive does not mean better. Exhaustive analysis of 290 models evaluated with data on cost, speed, tokens, and accuracy that reveals the trends transforming medical artificial intelligence.

MIR 2026BenchmarkGemini Flash
Read more
MIR 2026: The Perfect Storm
January 26, 202611 min read

MIR 2026: The Perfect Storm

Forensic anatomy of a high-voltage examination and the silent danger of the ceiling effect. A comprehensive technical analysis of how complex administrative management and a technically accessible exam have created the most volatile edition of the decade. We dissect the exam booklets, official answer keys, and psychometric models of MIR 2026 to reveal a dangerous paradox: inflated scores where the margin for error is virtually nonexistent.

MIR 2026AnalysisPsychometrics
Read more