Evaluating the future of
Medical AI

The definitive evaluation platform for language models on Spain's MIR exams (2024-2026). Trusted by clinicians and researchers.

Exams and Questions

Explore official MIR exam questions organized by year. Each question includes detailed analysis of AI model performance.

New

MIR 2026

24 January 2026

210questions

7invalidated

Best AI: 200.00 net (Miri)

Best human: 188.00 net

MIR 2025

25 January 2025

210questions

6invalidated

Best AI: 200.00 net (ALMA)

Best human: 165.67 net

MIR 2024

20 January 2024

210questions

5invalidated

Best AI: 200.00 net (ALMA)

Best human: 186.67 net

Our Methodology

How we evaluate artificial intelligence models in the medical field using the MIR exam as a reference.

Official MIR Questions

We use real questions from Spain's MIR exam, the standard for evaluating medical knowledge at a professional level. Each question is verified and categorized by specialty.

Rigorous Evaluation

Each model is evaluated under the same controlled conditions, without access to external information. We measure accuracy, clinical reasoning, and response consistency.

Detailed Analysis

We provide granular metrics by medical specialty, question type, and difficulty level. This allows identifying strengths and areas for improvement for each model.

Questions catalogued by specialists

Distribution of MIR exam questions by subject and type in each edition.

Comprehensive Analysis

Our benchmark provides an exhaustive evaluation of AI model performance in the medical field.

Continuous Evaluation

Performance tracking over time to identify improvements and regressions.

Detailed Metrics

Granular analysis by subject and clinical question type.

Clear Objectives

Standardized benchmarks based on Spain's official MIR exam.

Full Transparency

Open and reproducible methodology with complete access to evaluation criteria.

Constant Updates

Periodic incorporation of new models and MIR exam editions.

Direct Comparison

Rankings and statistics that allow easy performance comparison between models.

Verified Data

Official questions from the Ministry of Health with validated answers.

Questions by Type

Question distribution by type

Anatomy3 questions

Biostatistics3 questions

Diagnosis86 questions

Epidemiology10 questions

Ethics6 questions

Interpretation41 questions

Legal9 questions

Pathophysiology26 questions

Pharmacology16 questions

Prevention17 questions

Prognosis5 questions

Risk17 questions

Tests36 questions

Treatment74 questions

Questions by Subject

Question distribution by subject

Allergology1 questions

Anesthesiology and Resuscitation7 questions

Cardiology25 questions

Dermatology11 questions

Endocrinology and Nutrition16 questions

ENT8 questions

Epidemiology8 questions

Gastroenterology32 questions

Genetics11 questions

Geriatrics14 questions

Gynecology and Obstetrics13 questions

Health Planning and Management10 questions

Hematology11 questions

Immunology6 questions

Infectious Diseases14 questions

Legal Medicine and Bioethics11 questions

Medical Oncology25 questions

Nephrology10 questions

Neurology15 questions

Ophthalmology6 questions

Palliative Care6 questions

Pediatrics22 questions

Pharmacology12 questions

Psychiatry8 questions

Pulmonology17 questions

Radiology-Emergency13 questions

Rheumatology12 questions

Statistics3 questions

Traumatology11 questions

Urology8 questions

Latest articles

Articles, news and analysis about AI in medicine

Mar 2, 202611 min read

188 Net Points: Bianca Ciobanu Breaks the MIR Record — But AI Already Reached 200

Bianca Ciobanu Selaru enters history with 188 net points, the best human result ever recorded in the MIR exam. 41 years old, Romanian origin, proof that perseverance breaks moulds. But the human record arrives at a singular moment: three AI models have already solved the complete exam — 200 out of 200 — and fifteen surpass 194 net points. We analyse what this double milestone means with data, charts and context.

Feb 20, 202610 min read

Two Weeks Later: 22 New Models and a Triple 200/200 in MIR 2026

From February 5 to 20, 2026, we added 22 new models to the benchmark. In just 15 days we went from 99.5% to 100%: Gemini 3.1 Pro Preview arrives with 200/200, Qwen3.5 397B A17B breaks the open-weights ceiling in the global ranking, and MedGemma leaves an uncomfortable lesson about what "health specialization" really means. Technical storytelling with new charts about the perfect tie, the time-based tiebreaker, and what happens to a benchmark once it hits the ceiling.

Feb 11, 202626 min read

ALMA and MIRI achieve the highest possible score on the MIR 2026 exam with 100% accuracy

Two medical AI models developed in Spain achieve unprecedented results. ALMA answers all 600 questions from the last three MIR exams without a single error — an absolute 100% that no other model has achieved. MIRI reaches 99.3% at 13 times lower cost and much faster response times. They are not general-purpose models: they are Agentic RAG architectures with specialized experts, built by BinPar and Editorial Médica Panamericana, proving that the future of medical AI lies not in bigger models, but in smarter ones.

Feb 9, 202618 min read

The Cathedral and the Bazaar: Open Source vs Proprietary in MIR 2026

The top 33 positions in the MIR 2026 ranking are all proprietary models. The best open model lands at position 34. We analyze the gap between open and closed models, the real taxonomy of open source in AI — where many self-proclaimed open models are cathedrals with half-open doors — and why RAG outperforms fine-tuning for customizing medical AI without losing control of your data.

Feb 6, 202615 min read

The Swiss Army Knife and the Scalpel: Why the Best Coding Models Fail the MIR

Claude Opus 4.6 and GPT-5.2-Codex are the most advanced AI coding models, capable of coordinating agent teams and partially building themselves. But in MIR 2026, a Flash model costing just 0.34 euros humiliates them. The Swiss Army knife of programming cannot compete with the scalpel designed to cut. Analysis of the agentic paradox with data from 290 models that demonstrates why specialization outperforms raw power in the medical domain.

Feb 5, 202618 min read

199 out of 200: AI Only Fails Once in MIR 2026

Final results of the largest medical AI benchmark in Spanish. Three models tie with 199 correct answers out of 200 valid questions — a 99.5% accuracy that no human has ever achieved in MIR history. A 'Flash' model leads for the third consecutive year, proving that more expensive does not mean better. Exhaustive analysis of 290 models evaluated with data on cost, speed, tokens, and accuracy that reveals the trends transforming medical artificial intelligence.

All articles

Evaluating the future of
Medical AI

Exams and Questions

MIR 2026

MIR 2025

MIR 2024

Top Results (MIR 2026)

ALMA

Miri

Gemini 3.1 Pro Preview

Gemini 3 Flash Preview

o3

GPT-5

Our Methodology

Official MIR Questions

Rigorous Evaluation

Detailed Analysis

Questions catalogued by specialists

Comprehensive Analysis

Continuous Evaluation

Detailed Metrics

Clear Objectives

Full Transparency

Constant Updates

Direct Comparison

Verified Data

Questions by Type

Questions by Subject

Latest articles

188 Net Points: Bianca Ciobanu Breaks the MIR Record — But AI Already Reached 200

Two Weeks Later: 22 New Models and a Triple 200/200 in MIR 2026

ALMA and MIRI achieve the highest possible score on the MIR 2026 exam with 100% accuracy

The Cathedral and the Bazaar: Open Source vs Proprietary in MIR 2026

The Swiss Army Knife and the Scalpel: Why the Best Coding Models Fail the MIR

199 out of 200: AI Only Fails Once in MIR 2026

Evaluating the future of Medical AI

Exams and Questions

MIR 2026

MIR 2025

MIR 2024

Top Results (MIR 2026)

ALMA

Miri

Gemini 3.1 Pro Preview

Gemini 3 Flash Preview

o3

GPT-5

Our Methodology

Official MIR Questions

Rigorous Evaluation

Detailed Analysis

Questions catalogued by specialists

Comprehensive Analysis

Continuous Evaluation

Detailed Metrics

Clear Objectives

Full Transparency

Constant Updates

Direct Comparison

Verified Data

Questions by Type

Questions by Subject

Latest articles

188 Net Points: Bianca Ciobanu Breaks the MIR Record — But AI Already Reached 200

Two Weeks Later: 22 New Models and a Triple 200/200 in MIR 2026

ALMA and MIRI achieve the highest possible score on the MIR 2026 exam with 100% accuracy

The Cathedral and the Bazaar: Open Source vs Proprietary in MIR 2026

The Swiss Army Knife and the Scalpel: Why the Best Coding Models Fail the MIR

199 out of 200: AI Only Fails Once in MIR 2026

Evaluating the future of
Medical AI