MedicalBenchmark
Meta: Llama 3.1 8B Instruct provider

Llama 3.1 8B Instruct

275

#275 of 291 modelsMIR 2024

Net score

44.00 pts

Accuracy

37.0%

Correct / Incorrect

74 / 90

Total Cost

$0.02

Overall Performance

(vs. average)
Accuracy

37.0%

avg: 80.5%

Net score

44.00 pts

avg: 150.85 pts

Correct

74

avg: 161

Incorrect

90

avg: 30

Total Cost

$0.02

avg: $3.32

Average response time

20.8s

avg: 16.4s

Output Tokens

258K

avg: 427K

Reasoning Tokens

0

avg: 310K

Average confidence

81.6%

avg: 95.4%

Subject Breakdown

Allergology
Correct
3
Incorrect
0
Unanswered
0
Accuracy
100.0%
Average
90.5%
Anesthesiology and Resuscitation
Correct
4
Incorrect
0
Unanswered
0
Accuracy
100.0%
Average
87.1%
Cardiology
Correct
9
Incorrect
8
Unanswered
4
Accuracy
42.9%
Average
79.7%
Dermatology
Correct
8
Incorrect
6
Unanswered
0
Accuracy
57.1%
Average
80.2%
Endocrinology and Nutrition
Correct
4
Incorrect
7
Unanswered
8
Accuracy
21.1%
Average
84.2%
ENT
Correct
3
Incorrect
4
Unanswered
0
Accuracy
42.9%
Average
74.4%
Epidemiology
Correct
5
Incorrect
3
Unanswered
0
Accuracy
62.5%
Average
89.3%
Gastroenterology
Correct
10
Incorrect
10
Unanswered
2
Accuracy
45.5%
Average
70.5%
Genetics
Correct
3
Incorrect
4
Unanswered
0
Accuracy
42.9%
Average
86.5%
Geriatrics
Correct
3
Incorrect
3
Unanswered
4
Accuracy
30.0%
Average
86.9%
Gynecology and Obstetrics
Correct
4
Incorrect
8
Unanswered
2
Accuracy
28.6%
Average
81.2%
Health Planning and Management
Correct
0
Incorrect
1
Unanswered
1
Accuracy
0.0%
Average
73.2%
Hematology
Correct
3
Incorrect
6
Unanswered
4
Accuracy
23.1%
Average
81.5%
Immunology
Correct
4
Incorrect
3
Unanswered
1
Accuracy
50.0%
Average
89.1%
Infectious Diseases
Correct
11
Incorrect
7
Unanswered
5
Accuracy
47.8%
Average
81.8%
Legal Medicine and Bioethics
Correct
1
Incorrect
1
Unanswered
0
Accuracy
50.0%
Average
91.7%
Medical Oncology
Correct
6
Incorrect
13
Unanswered
2
Accuracy
28.6%
Average
80.2%
Nephrology
Correct
2
Incorrect
9
Unanswered
2
Accuracy
15.4%
Average
80.8%
Neurology
Correct
6
Incorrect
10
Unanswered
6
Accuracy
27.3%
Average
83.7%
Ophthalmology
Correct
1
Incorrect
3
Unanswered
1
Accuracy
20.0%
Average
80.0%
Palliative Care
Correct
2
Incorrect
1
Unanswered
1
Accuracy
50.0%
Average
88.2%
Pediatrics
Correct
1
Incorrect
11
Unanswered
5
Accuracy
5.9%
Average
82.0%
Pharmacology
Correct
11
Incorrect
8
Unanswered
4
Accuracy
47.8%
Average
85.4%
Psychiatry
Correct
7
Incorrect
1
Unanswered
2
Accuracy
70.0%
Average
89.5%
Pulmonology
Correct
8
Incorrect
5
Unanswered
6
Accuracy
42.1%
Average
80.6%
Radiology-Emergency
Correct
6
Incorrect
7
Unanswered
1
Accuracy
42.9%
Average
64.9%
Rheumatology
Correct
8
Incorrect
6
Unanswered
0
Accuracy
57.1%
Average
81.4%
Statistics
Correct
1
Incorrect
2
Unanswered
0
Accuracy
33.3%
Average
91.1%
Traumatology
Correct
1
Incorrect
11
Unanswered
3
Accuracy
6.7%
Average
74.5%
Urology
Correct
1
Incorrect
4
Unanswered
1
Accuracy
16.7%
Average
78.2%

Question Type Breakdown

Anatomy
Correct
2
Incorrect
4
Unanswered
0
Accuracy
33.3%
Average
79.8%
Biostatistics
Correct
2
Incorrect
2
Unanswered
1
Accuracy
40.0%
Average
90.7%
Diagnosis
Correct
26
Incorrect
30
Unanswered
17
Accuracy
35.6%
Average
79.2%
Epidemiology
Correct
4
Incorrect
6
Unanswered
2
Accuracy
33.3%
Average
81.2%
Ethics
Correct
1
Incorrect
0
Unanswered
0
Accuracy
100.0%
Average
94.5%
Interpretation
Correct
10
Incorrect
18
Unanswered
9
Accuracy
27.0%
Average
69.6%
Pathophysiology
Correct
13
Incorrect
16
Unanswered
4
Accuracy
39.4%
Average
85.4%
Pharmacology
Correct
12
Incorrect
9
Unanswered
4
Accuracy
48.0%
Average
84.0%
Prevention
Correct
6
Incorrect
2
Unanswered
4
Accuracy
50.0%
Average
89.8%
Prognosis
Correct
5
Incorrect
2
Unanswered
0
Accuracy
71.4%
Average
83.9%
Risk
Correct
8
Incorrect
4
Unanswered
1
Accuracy
61.5%
Average
83.6%
Tests
Correct
8
Incorrect
10
Unanswered
3
Accuracy
38.1%
Average
73.9%
Treatment
Correct
22
Incorrect
36
Unanswered
13
Accuracy
31.0%
Average
81.3%
#AnswerCorrectStatus
1BB
2CD
3BB
4CC
5DC
6B
7DD
8C
9CA
10D
11DD
12A
13DC
14DA
15BB
16DA
17CC
18A
19DB
20C
21CD
22B
23AA
24CA
25CC
26BB
27AC
28BA
29DB
30CC
31DD
32CA
33AC
34CB
35DD
36AD
37AA
38BA
39CC
40CB
41BC
42DD
43CA
44AD
45AD
46BB
47CC
48CC
49BB
50C
51AA
52AD
53CC
54BB
55CC
56DD
57BA
58CA
59BA
60DA
61CA
62AD
63CD
64BAnnulled
65D
66CC
67CB
68BAnnulled
69AA
70BB
71CB
72BD
73BB
74C
75B
76DA
77D
78C
79CB
80CA
81BC
82BC
83BB
84CC
85AA
86AA
87B
88DD
89BB
90AA
91CD
92BA
93BC
94BB
95BD
96BB
97BB
98CB
99A
100AB
101BA
102DD
103B
104CD
105DB
106BC
107BC
108BB
109DD
110DD
111AB
112CC
113Annulled
114DD
115AD
116CA
117DD
118D
119CA
120CC
121AA
122B
123CD
124DD
125CB
126DD
127DA
128B
129BD
130C
131CC
132CD
133AA
134BC
135DA
136DD
137DA
138CC
139BA
140CC
141BB
142DC
143A
144CD
145DC
146BC
147CC
148BA
149CC
150DD
151CA
152A
153DC
154B
155BD
156BC
157DC
158BD
159DD
160CB
161B
162B
163DB
164AB
165A
166C
167CA
168CB
169CC
170CA
171DD
172BB
173BA
174BB
175A
176CC
177C
178AB
179CC
180AAnnulled
181B
182DD
183C
184AA
185AC
186BD
187A
188C
189BD
190D
191BB
192B
193C
194DC
195CC
196DB
197DA
198BB
199CD
200DA
201BB
202DD
203CB
204DD
205BD
206BAnnulled
207AA
208A
209B
210DD