MedicalBenchmark

Precision Medical AI

ALMA

Medical AI system developed by BinPar with content from Editorial Medica Panamericana and Spanish Clinical Guidelines. Combines Agentic RAG with a reference medical corpus to achieve perfect accuracy on the MIR exam.

Verified Results

ALMA has been evaluated across three consecutive MIR exam editions with perfect results verified by MedicalBenchmark.

600/600

Correct answers

Out of all valid questions in MIR 2024, 2025 and 2026

100%

Total accuracy

Zero errors across three consecutive editions

3 years

Consecutive MIR exams

Sustained perfect performance in 2024, 2025 and 2026

~$10.50

Cost per exam

Average processing cost per full exam edition

~53s

Per question

Average response time including full reasoning

~32

Specialized experts

Medical domain agents in the Agentic RAG system

99.8%

Confidence interval

Statistical reliability of the evaluation system

Agentic RAG Architecture

ALMA uses an intelligent orchestrator that coordinates multiple specialized agents to answer medical questions with maximum accuracy. Unlike conventional RAG, the system iterates and validates before responding.

Iterative querying

The orchestrator performs multiple query rounds against the corpus, refining the search until finding the most relevant evidence.

Specialized experts

Approximately 32 domain agents cover all MIR medical specialties, from cardiology to psychiatry.

Synthetic corpus

Knowledge base built from Editorial Medica Panamericana's reference bibliography, processed and optimized for RAG.

English reasoning

The system reasons internally in English to maximize base model performance and responds in the question's language.

Intelligent sub-delegation

Experts can delegate sub-queries to other specialists when a question crosses specialty boundaries, creating dynamic knowledge networks.

Multimodal support

Processing of clinical images (X-rays, ECGs, dermatological photographs) within each expert agent's specialized context.

The central orchestrator is Claude Sonnet 4.5 with extended reasoning, running on Amazon Bedrock in the Aragon region (Spain).

Processing Flow

MIR QuestionMIR
OrchestratorClaude Sonnet 4.5
Experts~32 specialists
Medical CorpusPanamericana
ValidationIterative
ResponseVerified

Multilingual Reasoning Pipeline

Current LLMs have richer internal representations in English. ALMA forces internal reasoning in English to maximize accuracy, always responding in the question's original language.

ESQuestion in Spanish
ES → ENInternal translation
ENReasoning in English
ENSynthesis in English
ESResponse in Spanish

How It Works

ALMA's process for answering a medical question follows a structured five-step flow.

1

Question reception

The orchestrator receives the MIR question with its answer options and analyzes the clinical context.

2

Analysis and planning

Relevant medical specialties are identified and appropriate expert agents are selected.

3

Corpus querying

Selected agents query Panamericana's synthetic medical corpus to obtain clinical evidence.

4

Iteration and validation

The orchestrator evaluates collected evidence and, if insufficient, launches additional query rounds.

5

Synthesis and response

Evidence is synthesized into structured reasoning and the answer with the strongest clinical support is selected.

Technical Innovations

Beyond the general architecture, ALMA incorporates key innovations that contribute to its exceptional performance.

Optimized synthetic corpus

Original medical documents are processed through a pipeline that extracts relevant information, eliminates redundancy, restructures for LLM efficiency, and enriches with cross-specialty relationships.

Incremental updates

System based on Recursive Language Models (RLM) that updates the corpus without rebuilding it, detecting obsolete fragments and integrating new information while maintaining coherence.

Memory tree with sub-delegation

The orchestrator maintains a context tree where each branch corresponds to an expert. Sub-queries inherit relevant context without duplicating tokens, optimizing cost and speed.

Agentic RAG vs Fine-tuning

Unlike fine-tuning which statically modifies model weights, Agentic RAG dynamically queries updated information, enabling continuous improvement without retraining.

Data Sovereignty

ALMA is designed to meet the highest privacy and data sovereignty standards in the European healthcare sector.

EU processing

All processing runs on AWS Bedrock in the Aragon region (Spain), ensuring data never leaves the EU.

No provider access

Anthropic has no access to processed data. AWS Bedrock guarantees complete provider isolation.

GDPR compliance

Designed to comply with the General Data Protection Regulation and European healthcare regulations.

AI Act ready

Architecture aligned with European AI Act requirements for high-risk systems.

ALMA is currently in production at CATSalut (Catalan Health Service), helping healthcare professionals in real clinical environments.

Explore ALMA's results

Check ALMA's detailed performance on each MIR edition, or contact us for more information.