
Welcome to MedBench: The Largest Medical Benchmark in Spanish
Introducing MedBench, a platform to evaluate language models in the medical field using questions from the MIR exam.
Equipo MedBenchJanuary 23, 20242 min read
announcementbenchmarkMIRmedical AI
Introduction
We are pleased to present MedBench, the largest medical benchmark platform focused on evaluating artificial intelligence models using real questions from Spain's MIR (Médico Interno Residente) exam.
Why MedBench?
Evaluating language models in the medical field presents unique challenges:
- Critical precision: In medicine, errors can have serious consequences
- Specialized knowledge: Deep understanding of multiple specialties is required
- Clinical reasoning: Memorization is not enough; you must know how to apply knowledge
Key Features
MIR Questions
We use official MIR exam questions, which guarantees:
- Clinical quality and relevance
- Coverage of all medical specialties
- Different difficulty levels
- Constant updates with new exam editions
Detailed Metrics
We evaluate each model across multiple dimensions:
- Overall accuracy: Percentage of correct answers
- Net score: Considering penalty for errors
- Specialty breakdown: Performance in each medical area
- Confidence level: Model certainty in its responses
Next Steps
We are working on:
- Expanding the question set
- Adding more models to the ranking
- Implementing comparative analyses
- Developing tools for researchers
Join the Community
If you are a researcher, developer, or medical professional interested in AI applied to health, we invite you to:
- Explore our rankings
- Check out the methodology
- Contact us for collaborations
Thank you for your interest in MedBench!