Researchers claim that it requires selective training because it is inadequate at identifying linkages.
While the talkative AI bot’s success rate in diagnosing difficult medical cases was just 39% in an examination conducted last year, a report published this week in JAMA Pediatrics reveals the fourth version of the huge language model performs particularly poorly when it comes to children. Only 17% of pediatric medical issues could be diagnosed using it accurately. In case it was a concern, the low success rate indicates that human pediatricians won’t be out of work very soon. According to the authors, “[T]his study underscores the invaluable role that clinical experience holds.” However, it also points out the crucial flaws that caused ChatGPT’s high mistake rate and suggests fixes to make it a practical clinical care tool. Many physicians, including pediatricians, believe that AI chatbots will eventually find their way into clinical practice due to the high level of interest and development in this field.
Generally speaking, the medical industry has been an early user of AI-powered technology. This has led to both major accomplishments and disappointments, including the creation of algorithmic racial bias and the automation of administrative activities as well as the assistance in the interpretation of retinal and chest scan pictures. There is a lot in between as well. However, because AI can solve problems, there is a lot of interest in making it a useful tool for difficult diagnosis. A quirky, prickly, pill-popping medical genius is not necessary. According to a recent study by researchers at New York’s Cohen Children’s Medical Center, ChatGPT-4 isn’t currently suitable for diagnosing pediatric patients. The researchers observe that pediatric instances necessitate a greater consideration of the patient’s age than general cases. Furthermore, as all parents are aware, it can be particularly challenging to diagnose illnesses in newborns and early children since they are unable to identify or describe all of their symptoms.
The researchers tested the chatbot in the study using 100 pediatric case challenges that were published in NEJM and JAMA Pediatrics between 2013 and 2023. These are actual medical cases that were released as tests or challenges. Physicians who are following along are welcome to attempt to diagnose a complicated or uncommon case correctly using the information that the attending physicians had available at the time. Occasionally, the articles also detail how the attending physicians arrived at the accurate diagnosis.
Missed connections
The required content from the medical cases was placed into the prompt for ChatGPT’s exam. Two trained physician-researchers then graded the AI-generated responses, classifying them as either correct, incorrect, or “did not fully capture the diagnosis.” In the latter instance, ChatGPT identified a clinically relevant ailment that was either too general or too vague to be accepted as the accurate diagnosis. For example, in one child’s case, ChatGPT identified the cause as a branchial cleft cyst, which is a lump in the neck or below the collarbone, but the true diagnosis was Branchio-oto-renal syndrome, a genetic condition causing abnormal development of neck tissue along with kidney and ear malformations. The development of branchial cleft cysts is one of the symptoms of the illness. In a total of 100 cases, ChatGPT correctly identified the solution in just 17. In seventy-two cases, it was blatantly incorrect, and in the eleven cases that remained, the diagnosis was incomplete. 47 (57%) of the 83 incorrect diagnoses involved the same organ system.
Researchers discovered that among the failures ChatGPT seems to have trouble identifying recognized connections between ailments—things that a knowledgeable doctor would ideally be able to identify. For instance, in one medical instance, it failed to draw the link between scurvy (a vitamin C deficit) and autism. Restricted diets resulting from neuropsychiatric diseases like autism can cause vitamin deficits. Therefore, doctors should be alert for neuropsychiatric problems as they are significant risk factors for the development of vitamin deficiencies in children residing in high-income nations. Meanwhile, ChatGPT identified the illness as a rare autoimmune disorder. Even though the chatbot performed poorly in this test, the researchers believe it may do better if it were explicitly and selectively trained on reliable medical literature rather than content from the Internet, which can contain false and misleading information. Additionally, they propose that chatbots could function better if they had more immediate access to medical data, which would enable the models to “tune” or increase their accuracy.
“This presents an opportunity for researchers to investigate if specific medical data training and tuning can improve the diagnostic accuracy of LLM-based chatbots,” the researchers conclude.