MedicalBenchmark
Goliath 120B provider

Goliath 120B

266

#266 of 291 modelsMIR 2024

Net score

58.66 pts

Accuracy

45.5%

Correct / Incorrect

91 / 97

Total Cost

$1.28

Overall Performance

(vs. average)
Accuracy

45.5%

avg: 80.5%

Net score

58.66 pts

avg: 150.85 pts

Correct

91

avg: 161

Incorrect

97

avg: 30

Total Cost

$1.28

avg: $3.32

Average response time

24.0s

avg: 16.4s

Output Tokens

103K

avg: 427K

Reasoning Tokens

0

avg: 310K

Average confidence

87.7%

avg: 95.4%

Subject Breakdown

Allergology
Correct
1
Incorrect
1
Unanswered
1
Accuracy
33.3%
Average
90.5%
Anesthesiology and Resuscitation
Correct
2
Incorrect
2
Unanswered
0
Accuracy
50.0%
Average
87.1%
Cardiology
Correct
13
Incorrect
7
Unanswered
1
Accuracy
61.9%
Average
79.7%
Dermatology
Correct
4
Incorrect
8
Unanswered
2
Accuracy
28.6%
Average
80.2%
Endocrinology and Nutrition
Correct
9
Incorrect
9
Unanswered
1
Accuracy
47.4%
Average
84.2%
ENT
Correct
3
Incorrect
4
Unanswered
0
Accuracy
42.9%
Average
74.4%
Epidemiology
Correct
4
Incorrect
3
Unanswered
1
Accuracy
50.0%
Average
89.3%
Gastroenterology
Correct
10
Incorrect
10
Unanswered
2
Accuracy
45.5%
Average
70.5%
Genetics
Correct
2
Incorrect
5
Unanswered
0
Accuracy
28.6%
Average
86.5%
Geriatrics
Correct
5
Incorrect
4
Unanswered
1
Accuracy
50.0%
Average
86.9%
Gynecology and Obstetrics
Correct
7
Incorrect
5
Unanswered
2
Accuracy
50.0%
Average
81.2%
Health Planning and Management
Correct
0
Incorrect
2
Unanswered
0
Accuracy
0.0%
Average
73.2%
Hematology
Correct
6
Incorrect
6
Unanswered
1
Accuracy
46.2%
Average
81.5%
Immunology
Correct
5
Incorrect
2
Unanswered
1
Accuracy
62.5%
Average
89.1%
Infectious Diseases
Correct
8
Incorrect
11
Unanswered
4
Accuracy
34.8%
Average
81.8%
Legal Medicine and Bioethics
Correct
2
Incorrect
0
Unanswered
0
Accuracy
100.0%
Average
91.7%
Medical Oncology
Correct
8
Incorrect
10
Unanswered
3
Accuracy
38.1%
Average
80.2%
Nephrology
Correct
3
Incorrect
10
Unanswered
0
Accuracy
23.1%
Average
80.8%
Neurology
Correct
9
Incorrect
11
Unanswered
2
Accuracy
40.9%
Average
83.7%
Ophthalmology
Correct
2
Incorrect
3
Unanswered
0
Accuracy
40.0%
Average
80.0%
Palliative Care
Correct
3
Incorrect
1
Unanswered
0
Accuracy
75.0%
Average
88.2%
Pediatrics
Correct
5
Incorrect
10
Unanswered
2
Accuracy
29.4%
Average
82.0%
Pharmacology
Correct
8
Incorrect
14
Unanswered
1
Accuracy
34.8%
Average
85.4%
Psychiatry
Correct
7
Incorrect
2
Unanswered
1
Accuracy
70.0%
Average
89.5%
Pulmonology
Correct
10
Incorrect
6
Unanswered
3
Accuracy
52.6%
Average
80.6%
Radiology-Emergency
Correct
7
Incorrect
7
Unanswered
0
Accuracy
50.0%
Average
64.9%
Rheumatology
Correct
6
Incorrect
8
Unanswered
0
Accuracy
42.9%
Average
81.4%
Statistics
Correct
2
Incorrect
1
Unanswered
0
Accuracy
66.7%
Average
91.1%
Traumatology
Correct
6
Incorrect
9
Unanswered
0
Accuracy
40.0%
Average
74.5%
Urology
Correct
2
Incorrect
4
Unanswered
0
Accuracy
33.3%
Average
78.2%

Question Type Breakdown

Anatomy
Correct
2
Incorrect
4
Unanswered
0
Accuracy
33.3%
Average
79.8%
Biostatistics
Correct
3
Incorrect
2
Unanswered
0
Accuracy
60.0%
Average
90.7%
Diagnosis
Correct
31
Incorrect
37
Unanswered
5
Accuracy
42.5%
Average
79.2%
Epidemiology
Correct
4
Incorrect
8
Unanswered
0
Accuracy
33.3%
Average
81.2%
Ethics
Correct
1
Incorrect
0
Unanswered
0
Accuracy
100.0%
Average
94.5%
Interpretation
Correct
19
Incorrect
17
Unanswered
1
Accuracy
51.4%
Average
69.6%
Pathophysiology
Correct
14
Incorrect
18
Unanswered
1
Accuracy
42.4%
Average
85.4%
Pharmacology
Correct
8
Incorrect
15
Unanswered
2
Accuracy
32.0%
Average
84.0%
Prevention
Correct
6
Incorrect
4
Unanswered
2
Accuracy
50.0%
Average
89.8%
Prognosis
Correct
3
Incorrect
3
Unanswered
1
Accuracy
42.9%
Average
83.9%
Risk
Correct
9
Incorrect
4
Unanswered
0
Accuracy
69.2%
Average
83.6%
Tests
Correct
8
Incorrect
13
Unanswered
0
Accuracy
38.1%
Average
73.9%
Treatment
Correct
36
Incorrect
29
Unanswered
6
Accuracy
50.7%
Average
81.3%
#AnswerCorrectStatus
1CB
2CD
3BB
4DC
5BC
6BB
7CD
8DC
9AA
10D
11DD
12BA
13DC
14DA
15BB
16AA
17AC
18BA
19BB
20CC
21DD
22BB
23DA
24BA
25BC
26BB
27CC
28DA
29AB
30BC
31DD
32BA
33BC
34CB
35DD
36DD
37AA
38AA
39BC
40BB
41CC
42DD
43DA
44BD
45DD
46BB
47CC
48CC
49BB
50BC
51CA
52CD
53BC
54BB
55AC
56DD
57DA
58BA
59BA
60AA
61AA
62DD
63CD
64DAnnulled
65DD
66DC
67BB
68Annulled
69AA
70BB
71DB
72DD
73AB
74CC
75B
76AA
77BD
78C
79BB
80CA
81DC
82CC
83BB
84CC
85BA
86A
87BB
88DD
89CB
90AA
91DD
92CA
93DC
94BB
95DD
96BB
97BB
98DB
99BA
100BB
101BA
102DD
103BB
104CD
105AB
106BC
107CC
108BB
109AD
110CD
111BB
112CC
113DAnnulled
114DD
115DD
116A
117DD
118DD
119CA
120CC
121AA
122BB
123DD
124DD
125BB
126DD
127CA
128DB
129DD
130CC
131BC
132BD
133BA
134AC
135DA
136DD
137AA
138DC
139A
140AC
141DB
142AC
143DA
144BD
145DC
146BC
147DC
148BA
149BC
150D
151BA
152A
153DC
154BB
155DD
156BC
157DC
158DD
159DD
160CB
161BB
162BB
163DB
164CB
165CA
166CC
167DA
168DB
169DC
170CA
171D
172BB
173DA
174BB
175AA
176CC
177AC
178DB
179CC
180DAnnulled
181B
182DD
183CC
184DA
185CC
186DD
187CA
188BC
189CD
190DD
191B
192DB
193AC
194DC
195BC
196BB
197CA
198B
199CD
200BA
201AB
202DD
203BB
204CD
205BD
206CAnnulled
207BA
208AA
209BB
210DD