AI as a doctor: ChatGPT could even pass a notoriously difficult US medical exam, as a study shows. The AI system managed between 52 and 75 percent correct answers in this three-part test for medical students and new residents. For most runs, ChatGPT was above 60 percent – the threshold from which this test is considered passed. This applied both to multiple-choice questions and to freely formulated answer texts. According to the researchers, this is impressive and surprising.
ChatGPT is causing a stir worldwide because this adaptive AI system generates answers and texts of unprecedented quality. Even experts and algorithms designed for this can hardly distinguish his texts from man-made ones. Behind ChatGPT is a neural network that has been trained on millions of texts from the Internet and other sources. Based on statistical probabilities, the adaptive language model determines which word is most likely to follow next.
The AI system therefore “knows” nothing about the content, it only maps language patterns – and yet produces surprisingly correct and logical texts. ChatGPT can even create and edit convincing scientific abstracts .
ChatGPT in the medical examination
Tiffany Kung from Massachusetts General Hospital in Boston and her colleagues have now examined how good ChatGPT is in medical expertise. For their study, they had the bot take the US Medical Licensing Exam (USMLE), a three-part medical test that US medical students must take in their second year, fourth year, and after graduation. ChatGPT received 376 questions from June 2022. Because the AI system has no access to the internet and received its last training texts in January 2022, it could not know these questions.
Similar to the medical test, ChatGPT received the tasks in three different formats: The first variant consists of questions with open answers, for example after the diagnosis for a clinical picture described in the question or the right therapy for an ailment. The second form is a multiple-choice test with five possible answers and the third is a multiple-choice test in which free text must also be used to explain why you chose this answer and rejected the others. In most cases, the test is passed if the answers are around 60 percent correct.
AI would have mostly passed
The result for ChatGPT: If the AI system were a human, it would have a real chance of passing the medical test. Because the proportion of correct answers was between 52 and 75 percent. “This is the first AI experiment to reach this threshold, which is a surprising and impressive result,” write Kung and her colleagues. “To perform so well on this notoriously difficult test, and do so without dedicated training or human assistance, marks a milestone in the maturity of clinical AI systems.”
The language model performed particularly well in the open-ended questions, achieving between 68 and 75 percent. In the simple multiple-choice test, his hit rate was the lowest at 55 to 61 percent. Interesting, however: Although the third part of the test, intended for qualified doctors, is the most difficult, ChatGPT managed an average of between 61 and 68.8 percent depending on the variant of the task – it would have passed the test accordingly.
Better than special medical AI
About 95 percent of the answers formulated by ChatGPT themselves were coherent and also medically correct, as blinded experts determined. Nearly 90 percent of the responses also included at least one significant finding or conclusion that was non-obvious and clinically relevant. “Paradoxically, ChatGPT’s results even surpass PubMedGPT, a language model with a very similar neuronal structure that was trained exclusively with biomedical literature,” the researchers report.
According to Kung and her colleagues, AI systems such as ChatGPT and others have now reached a level of performance where they can certainly be useful in medicine. The first adaptive algorithms are already being used to help with the evaluation of medical images and diagnosis. But medical students could also use systems like ChatGPT in the future to help them learn.
“We believe that language models like ChatGPT have reached a level of maturity that will soon influence medicine as a whole and could facilitate individualized, compassionate and scalable healthcare,” state Kung and her team. Her clinic is already experimenting with having ChatGPT edit patient letters to make them more understandable for laypeople. (PLOS Digital Health, 2023; doi:10.1371/journal.pdig.0000198 )