How Well Does an AI Chatbot (ChatGPT) Perform on the USMLE?

‍As Artificial Intelligence (AI) continues to shape various sectors, its potential to revolutionize healthcare and medical education is becoming increasingly evident.

One of the most intriguing questions in this domain is how well an AI chatbot, specifically ChatGPT, can perform on the United States Medical Licensing Examination (USMLE)—a rigorous series of tests that assess the knowledge and decision-making skills required to become a licensed physician.

In this article, we will delve into ChatGPT's performance on the USMLE, comparing it with other AI models, and explore what this means for the future of medical education and clinical decision-making.

Can AI truly be a reliable learning tool for medical students, or is it still a work in progress? Let’s find out.

Check this out: Recent News: Large Language Models Surpass Human Experts in Predicting Neuroscience Results

‍

What is the USMLE?

The USMLE is a multi-part examination that consists of three major steps:

Step 1: Focuses on basic science knowledge such as anatomy, biochemistry, microbiology, and pathology.
Step 2: Focuses on how well students can apply their medical knowledge to real-life patient scenarios.
Step 3: Assesses whether students can independently manage patient care as practicing doctors.

These exams are designed to challenge medical students' ability to apply their knowledge in real-world clinical settings.

The performance of an AI chatbot on these exams is a significant benchmark to understand its potential in Medical Education and decision-making.

You may want to read this: Best ChatGPT Alternatives for Academic Writing

‍

How Did ChatGPT Perform on the USMLE?

Recent studies evaluated ChatGPT's performance on both Step 1 and Step 2 of the USMLE.

These studies used multiple question sets to simulate real exam conditions and assess the performance of an AI Chatbot specifically ChatGPT.

Here are the key findings:

Step 1 (Basic Science):
- ChatGPT scored 44% accuracy on the AMBOSS Step 1 question set (44 correct answers out of 100 questions).

Although this score is below the typical passing level, it shows that ChatGPT has a solid grasp of basic science concepts, though there’s room for improvement on more difficult questions.

Step 2 (Clinical Knowledge):
- For Step 2, ChatGPT scored 42% on the AMBOSS Step 2 set (42 correct answers out of 100 questions).

Again, this reflects some challenges with more complex clinical cases, but it still demonstrates that an AI chatbot can answer clinical knowledge questions with a reasonable degree of accuracy.

‍

Step 1 (National Board of Medical Examiners - NBME Free Questions):
- ChatGPT scored 64.4% accuracy on this set of Step 1 questions (56 correct answers out of 87 questions).

This is closer to the passing score for medical students, indicating that ChatGPT is capable of performing at a third-year medical student level.

Step 2 (NBME Free Questions):
- For Step 2, ChatGPT achieved 57.8% accuracy (59 correct answers out of 102 questions), which is again a strong showing but indicates that the model still faces challenges with more advanced clinical reasoning .

More on ChatGPT in healthcare here! Exploring User Intentions to Use ChatGPT for Self-Diagnosis: Recent Study Findings

‍

‍

Key Differences Between AMBOSS and NBME

AMBOSS and NBME are two key resources for medical students preparing for the USMLE (United States Medical Licensing Examination), but they serve different purposes:

‍

Purpose:

AMBOSS: All-in-one platform for learning and exam prep, covering all USMLE Steps with a comprehensive question bank, reference library, and clinical tools.
NBME: Official provider of USMLE practice exams and test simulations (e.g., Free 120) focused on assessing exam readiness.

Pricing:

AMBOSS: Flexible pricing with a 5-day free trial, ranging from $358 (6 months) to $1,199 for Student Life Access.
NBME: Offers practice exams for a fee, exam-focused, without the broad learning tools AMBOSS provides.

Learning Tools:

AMBOSS: Features like hover definitions, clinical imaging, study plans, and performance tracking for active learning.
NBME: Focuses on practice exams and test simulations, without interactive learning tools or content resources.

Success Rate:

AMBOSS: Proven to help students score 10.4 points higher on Step 2 CK. Popular among IMGs for comprehensive prep.
NBME: Essential for exam simulations but doesn't provide the same level of interactive learning or success-driven track record as AMBOSS.

‍

What Does This Mean for AI Chatbot in Medicine?

The results provide valuable insights into the potential role of AI chatbot in medical education and decision-making:

ChatGPT Can Be a Useful Learning Tool

With an accuracy of 64.4% on the Step 1 NBME questions, ChatGPT performed at a level similar to third-year medical students.

This suggests that it could serve as a supportive tool for medical students, offering explanations and helping to reinforce their learning.

‍

Logical Justification and Internal Data Use

One of ChatGPT’s strengths is its ability to justify answers logically. It 100% provides logical explanations for its answers on the NBME Free 120 questions.

Additionally, it used 96.8% of internal information (data directly mentioned in the question) when answering, which suggests it’s capable of paying attention to critical details in each question.

‍

External Information Helps AI Performance

When ChatGPT used external information (i.e., relevant knowledge not directly provided in the question), it performed better.

For example, correct answers on the NBME Free 120 set were 44.5% more likely to include external information compared to incorrect answers, highlighting that AI chatbot’s ability to incorporate additional knowledge can improve its accuracy.

Want to read more on AI Chatbots in healthcare? Click Here!

‍

How Does ChatGPT Compare to Other AI Models?

In the study, ChatGPT outperformed InstructGPT and GPT-3 on all tested question sets.

ChatGPT vs. InstructGPT:some text
- On average, ChatGPT outperformed InstructGPT by 8.15% across all data sets.
- This suggests that the fine-tuning and conversational training ChatGPT received make it better suited for answering complex questions compared to InstructGPT, which was primarily designed to generate human-like text.
ChatGPT vs. GPT-3:some text
- GPT-3 struggled, showing no significant improvement over random chance (i.e., about a 50/50 correct/incorrect answer rate).
- This highlights the substantial advancements ChatGPT has made over its predecessor .

‍

What’s Next for AI in Medical Education?

The study offers several implications for the future of medical education:

AI Chatbot as an Educational Aid

ChatGPT’s ability to provide logical justifications and accurate information makes it a promising tool for medical education.

Medical students could use it for practice exams, explanations of complex topics, and even simulated patient interactions.

Room for Improvement

While ChatGPT shows promise, it needs further development to handle more advanced medical scenarios that require complex reasoning.

For example, it still struggles with clinical decision-making and difficult case analysis .

‍

Beyond Testing Knowledge

As AI chatbot models continue to improve, they could be used for more than just testing.

ChatGPT, with its ability to explain answers and incorporate external knowledge, could one day assist medical professionals in clinical decision-making, making the healthcare process more efficient.

‍

Moreover,

As AI technology continues to evolve, its integration into medical education offers promising potential, though it’s clear that it is not yet a perfect solution.

ChatGPT’s performance on the USMLE, while not passing, highlights the significant strides AI has made in understanding and applying medical knowledge.

It serves as a solid starting point for medical students seeking supplemental learning tools, offering insights into clinical knowledge and basic science concepts.

While AI chatbots like ChatGPT still face challenges in more complex clinical reasoning, their ability to process vast amounts of information and provide quick responses makes them valuable for reinforcing foundational knowledge and practicing decision-making.

With further advancements and fine-tuning, AI could become an indispensable resource for medical education, helping students refine their skills, boost their confidence, and ultimately improve patient care in the future.

The future of AI in healthcare education is bright, and tools like ChatGPT are only the beginning.

As these technologies evolve, we can expect them to play an even more critical role in shaping the next generation of medical professionals.

‍

Ready to Transform Your Industry with AI Chatbots?

Join the leaders revolutionizing healthcare and beyond with Makebot. From advanced healthcare AI chatbots to tailored, cost-efficient LLM solutions, we deliver cutting-edge technology optimized for your needs.

Contact us today to discover how Makebot can drive innovation in your business with trusted, industry-specific chatbot solutions.

📧 Email us at: b2b@makebot.ai

‍