MedicalBenchmark

Evaluating the future of Medical AI

The definitive evaluation platform for language models on Spain's MIR exams (2024-2026). Trusted by clinicians and researchers.

Our Methodology

How we evaluate artificial intelligence models in the medical field using the MIR exam as a reference.

Official MIR Questions

We use real questions from Spain's MIR exam, the standard for evaluating medical knowledge at a professional level. Each question is verified and categorized by specialty.

Rigorous Evaluation

Each model is evaluated under the same controlled conditions, without access to external information. We measure accuracy, clinical reasoning, and response consistency.

Detailed Analysis

We provide granular metrics by medical specialty, question type, and difficulty level. This allows identifying strengths and areas for improvement for each model.

Questions catalogued by specialists

Distribution of MIR exam questions by subject and type in each edition.

Comprehensive Analysis

Our benchmark provides an exhaustive evaluation of AI model performance in the medical field.

Continuous Evaluation

Performance tracking over time to identify improvements and regressions.

Detailed Metrics

Granular analysis by subject and clinical question type.

Clear Objectives

Standardized benchmarks based on Spain's official MIR exam.

Full Transparency

Open and reproducible methodology with complete access to evaluation criteria.

Constant Updates

Periodic incorporation of new models and MIR exam editions.

Direct Comparison

Rankings and statistics that allow easy performance comparison between models.

Verified Data

Official questions from the Ministry of Health with validated answers.

Questions by Type

Question distribution by type

Anatomy3 questions
Biostatistics3 questions
Diagnosis86 questions
Epidemiology10 questions
Ethics6 questions
Interpretation41 questions
Legal9 questions
Pathophysiology26 questions
Pharmacology16 questions
Prevention17 questions
Prognosis5 questions
Risk17 questions
Tests36 questions
Treatment74 questions

Questions by Subject

Question distribution by subject

Allergology1 questions
Anesthesiology and Resuscitation7 questions
Cardiology25 questions
Dermatology11 questions
Endocrinology and Nutrition16 questions
ENT8 questions
Epidemiology8 questions
Gastroenterology32 questions
Genetics11 questions
Geriatrics14 questions
Gynecology and Obstetrics13 questions
Health Planning and Management10 questions
Hematology11 questions
Immunology6 questions
Infectious Diseases14 questions
Legal Medicine and Bioethics11 questions
Medical Oncology25 questions
Nephrology10 questions
Neurology15 questions
Ophthalmology6 questions
Palliative Care6 questions
Pediatrics22 questions
Pharmacology12 questions
Psychiatry8 questions
Pulmonology17 questions
Radiology-Emergency13 questions
Rheumatology12 questions
Statistics3 questions
Traumatology11 questions
Urology8 questions

Latest articles

Articles, news and analysis about AI in medicine

188 Net Points: Bianca Ciobanu Breaks the MIR Record — But AI Already Reached 200
Mar 2, 202611 min read

188 Net Points: Bianca Ciobanu Breaks the MIR Record — But AI Already Reached 200

Bianca Ciobanu Selaru enters history with 188 net points, the best human result ever recorded in the MIR exam. 41 years old, Romanian origin, proof that perseverance breaks moulds. But the human record arrives at a singular moment: three AI models have already solved the complete exam — 200 out of 200 — and fifteen surpass 194 net points. We analyse what this double milestone means with data, charts and context.

Read more
Two Weeks Later: 22 New Models and a Triple 200/200 in MIR 2026
Feb 20, 202610 min read

Two Weeks Later: 22 New Models and a Triple 200/200 in MIR 2026

From February 5 to 20, 2026, we added 22 new models to the benchmark. In just 15 days we went from 99.5% to 100%: Gemini 3.1 Pro Preview arrives with 200/200, Qwen3.5 397B A17B breaks the open-weights ceiling in the global ranking, and MedGemma leaves an uncomfortable lesson about what "health specialization" really means. Technical storytelling with new charts about the perfect tie, the time-based tiebreaker, and what happens to a benchmark once it hits the ceiling.

Read more
ALMA and MIRI achieve the highest possible score on the MIR 2026 exam with 100% accuracy
Feb 11, 202626 min read

ALMA and MIRI achieve the highest possible score on the MIR 2026 exam with 100% accuracy

Two medical AI models developed in Spain achieve unprecedented results. ALMA answers all 600 questions from the last three MIR exams without a single error — an absolute 100% that no other model has achieved. MIRI reaches 99.3% at 13 times lower cost and much faster response times. They are not general-purpose models: they are Agentic RAG architectures with specialized experts, built by BinPar and Editorial Médica Panamericana, proving that the future of medical AI lies not in bigger models, but in smarter ones.

Read more
The Swiss Army Knife and the Scalpel: Why the Best Coding Models Fail the MIR
Feb 6, 202615 min read

The Swiss Army Knife and the Scalpel: Why the Best Coding Models Fail the MIR

Claude Opus 4.6 and GPT-5.2-Codex are the most advanced AI coding models, capable of coordinating agent teams and partially building themselves. But in MIR 2026, a Flash model costing just 0.34 euros humiliates them. The Swiss Army knife of programming cannot compete with the scalpel designed to cut. Analysis of the agentic paradox with data from 290 models that demonstrates why specialization outperforms raw power in the medical domain.

Read more
199 out of 200: AI Only Fails Once in MIR 2026
Feb 5, 202618 min read

199 out of 200: AI Only Fails Once in MIR 2026

Final results of the largest medical AI benchmark in Spanish. Three models tie with 199 correct answers out of 200 valid questions — a 99.5% accuracy that no human has ever achieved in MIR history. A 'Flash' model leads for the third consecutive year, proving that more expensive does not mean better. Exhaustive analysis of 290 models evaluated with data on cost, speed, tokens, and accuracy that reveals the trends transforming medical artificial intelligence.

Read more