Welcome to MedBench: The Largest Medical Benchmark in Spanish

Introduction

We are pleased to present MedBench, the largest medical benchmark platform focused on evaluating artificial intelligence models using real questions from Spain's MIR (Médico Interno Residente) exam.

Why MedBench?

Evaluating language models in the medical field presents unique challenges:

Critical precision: In medicine, errors can have serious consequences
Specialized knowledge: Deep understanding of multiple specialties is required
Clinical reasoning: Memorization is not enough; you must know how to apply knowledge

Key Features

MIR Questions

We use official MIR exam questions, which guarantees:

Clinical quality and relevance
Coverage of all medical specialties
Different difficulty levels
Constant updates with new exam editions

Detailed Metrics

We evaluate each model across multiple dimensions:

Overall accuracy: Percentage of correct answers
Net score: Considering penalty for errors
Specialty breakdown: Performance in each medical area
Confidence level: Model certainty in its responses

Next Steps

We are working on:

Expanding the question set
Adding more models to the ranking
Implementing comparative analyses
Developing tools for researchers

Join the Community

If you are a researcher, developer, or medical professional interested in AI applied to health, we invite you to:

Explore our rankings
Check out the methodology
Contact us for collaborations

Thank you for your interest in MedBench!