MedicalBenchmark
Microsoft: Phi 4 provider

Phi 4

244

#244 of 291 modelsMIR 2024

Net score

100.66 pts

Accuracy

62.5%

Correct / Incorrect

125 / 73

Total Cost

$0.02

Overall Performance

(vs. average)
Accuracy

62.5%

avg: 80.5%

Net score

100.66 pts

avg: 150.85 pts

Correct

125

avg: 161

Incorrect

73

avg: 30

Total Cost

$0.02

avg: $3.32

Average response time

13.2s

avg: 16.4s

Output Tokens

130K

avg: 427K

Reasoning Tokens

0

avg: 310K

Average confidence

97.0%

avg: 95.4%

Subject Breakdown

Allergology
Correct
2
Incorrect
1
Unanswered
0
Accuracy
66.7%
Average
90.5%
Anesthesiology and Resuscitation
Correct
0
Incorrect
3
Unanswered
1
Accuracy
0.0%
Average
87.1%
Cardiology
Correct
16
Incorrect
5
Unanswered
0
Accuracy
76.2%
Average
79.7%
Dermatology
Correct
9
Incorrect
5
Unanswered
0
Accuracy
64.3%
Average
80.2%
Endocrinology and Nutrition
Correct
12
Incorrect
7
Unanswered
0
Accuracy
63.2%
Average
84.2%
ENT
Correct
4
Incorrect
2
Unanswered
1
Accuracy
57.1%
Average
74.4%
Epidemiology
Correct
8
Incorrect
0
Unanswered
0
Accuracy
100.0%
Average
89.3%
Gastroenterology
Correct
14
Incorrect
8
Unanswered
0
Accuracy
63.6%
Average
70.5%
Genetics
Correct
5
Incorrect
2
Unanswered
0
Accuracy
71.4%
Average
86.5%
Geriatrics
Correct
8
Incorrect
2
Unanswered
0
Accuracy
80.0%
Average
86.9%
Gynecology and Obstetrics
Correct
9
Incorrect
5
Unanswered
0
Accuracy
64.3%
Average
81.2%
Health Planning and Management
Correct
1
Incorrect
1
Unanswered
0
Accuracy
50.0%
Average
73.2%
Hematology
Correct
10
Incorrect
3
Unanswered
0
Accuracy
76.9%
Average
81.5%
Immunology
Correct
7
Incorrect
1
Unanswered
0
Accuracy
87.5%
Average
89.1%
Infectious Diseases
Correct
13
Incorrect
10
Unanswered
0
Accuracy
56.5%
Average
81.8%
Legal Medicine and Bioethics
Correct
2
Incorrect
0
Unanswered
0
Accuracy
100.0%
Average
91.7%
Medical Oncology
Correct
14
Incorrect
6
Unanswered
1
Accuracy
66.7%
Average
80.2%
Nephrology
Correct
8
Incorrect
5
Unanswered
0
Accuracy
61.5%
Average
80.8%
Neurology
Correct
14
Incorrect
8
Unanswered
0
Accuracy
63.6%
Average
83.7%
Ophthalmology
Correct
2
Incorrect
3
Unanswered
0
Accuracy
40.0%
Average
80.0%
Palliative Care
Correct
2
Incorrect
2
Unanswered
0
Accuracy
50.0%
Average
88.2%
Pediatrics
Correct
12
Incorrect
5
Unanswered
0
Accuracy
70.6%
Average
82.0%
Pharmacology
Correct
14
Incorrect
8
Unanswered
1
Accuracy
60.9%
Average
85.4%
Psychiatry
Correct
6
Incorrect
3
Unanswered
1
Accuracy
60.0%
Average
89.5%
Pulmonology
Correct
9
Incorrect
10
Unanswered
0
Accuracy
47.4%
Average
80.6%
Radiology-Emergency
Correct
9
Incorrect
5
Unanswered
0
Accuracy
64.3%
Average
64.9%
Rheumatology
Correct
6
Incorrect
8
Unanswered
0
Accuracy
42.9%
Average
81.4%
Statistics
Correct
3
Incorrect
0
Unanswered
0
Accuracy
100.0%
Average
91.1%
Traumatology
Correct
8
Incorrect
7
Unanswered
0
Accuracy
53.3%
Average
74.5%
Urology
Correct
4
Incorrect
2
Unanswered
0
Accuracy
66.7%
Average
78.2%

Question Type Breakdown

Anatomy
Correct
3
Incorrect
3
Unanswered
0
Accuracy
50.0%
Average
79.8%
Biostatistics
Correct
5
Incorrect
0
Unanswered
0
Accuracy
100.0%
Average
90.7%
Diagnosis
Correct
45
Incorrect
28
Unanswered
0
Accuracy
61.6%
Average
79.2%
Epidemiology
Correct
10
Incorrect
2
Unanswered
0
Accuracy
83.3%
Average
81.2%
Ethics
Correct
1
Incorrect
0
Unanswered
0
Accuracy
100.0%
Average
94.5%
Interpretation
Correct
22
Incorrect
15
Unanswered
0
Accuracy
59.5%
Average
69.6%
Pathophysiology
Correct
24
Incorrect
8
Unanswered
1
Accuracy
72.7%
Average
85.4%
Pharmacology
Correct
14
Incorrect
10
Unanswered
1
Accuracy
56.0%
Average
84.0%
Prevention
Correct
9
Incorrect
3
Unanswered
0
Accuracy
75.0%
Average
89.8%
Prognosis
Correct
7
Incorrect
0
Unanswered
0
Accuracy
100.0%
Average
83.9%
Risk
Correct
10
Incorrect
3
Unanswered
0
Accuracy
76.9%
Average
83.6%
Tests
Correct
11
Incorrect
10
Unanswered
0
Accuracy
52.4%
Average
73.9%
Treatment
Correct
44
Incorrect
26
Unanswered
1
Accuracy
62.0%
Average
81.3%
#AnswerCorrectStatus
1BB
2AD
3AB
4BC
5AC
6BB
7DD
8CC
9CA
10BD
11DD
12AA
13DC
14BA
15BB
16AA
17CC
18AA
19BB
20CC
21CD
22BB
23AA
24CA
25AC
26BB
27DC
28DA
29AB
30CC
31BD
32AA
33CC
34DB
35DD
36DD
37BA
38AA
39CC
40BB
41DC
42AD
43AA
44DD
45DD
46BB
47CC
48CC
49CB
50CC
51AA
52CD
53AC
54CB
55C
56DD
57AA
58BA
59BA
60CA
61AA
62AD
63DD
64BAnnulled
65DD
66CC
67DB
68BAnnulled
69AA
70CB
71BB
72CD
73CB
74CC
75BB
76AA
77CD
78DC
79DB
80AA
81CC
82CC
83BB
84CC
85AA
86AA
87DB
88CD
89CB
90CA
91DD
92DA
93CC
94BB
95BD
96B
97BB
98DB
99AA
100AB
101AA
102BD
103BB
104DD
105DB
106CC
107CC
108BB
109AD
110CD
111BB
112CC
113DAnnulled
114BD
115DD
116AA
117DD
118DD
119AA
120CC
121AA
122DB
123DD
124DD
125CB
126CD
127CA
128DB
129DD
130CC
131AC
132DD
133BA
134AC
135AA
136DD
137CA
138CC
139AA
140BC
141BB
142CC
143DA
144DD
145DC
146CC
147CC
148AA
149CC
150DD
151CA
152AA
153AC
154BB
155DD
156CC
157CC
158DD
159DD
160BB
161BB
162BB
163BB
164BB
165CA
166CC
167AA
168BB
169CC
170AA
171DD
172BB
173BA
174BB
175AA
176CC
177CC
178BB
179BC
180DAnnulled
181CB
182DD
183AC
184CA
185CC
186DD
187AA
188BC
189BD
190BD
191BB
192BB
193CC
194DC
195CC
196BB
197AA
198BB
199DD
200AA
201AB
202DD
203CB
204DD
205AD
206BAnnulled
207CA
208AA
209DB
210AD