Gain more IT knowledge!

Have you ever searched the internet for "am I sick if I feel pain"? The answer may not be quite right. But with the rise of large-scale natural language models (LLMs) like ChatGPT, people are starting to experiment with using them to answer medical questions or medical knowledge.

But is it worth trusting?

On its own, the answers given by AI are accurate. But James Davenport, a professor at the University of Bath in the UK, points out the difference between medical questions and the actual practice of medicine, arguing that "the practice of medicine is not just about answering medical questions; if it were purely about answering medical questions, we wouldn't need teaching hospitals, and doctors wouldn't need to train for years after their academic programs. "

Given all the doubts, in a new paper published in Nature, the world's leading AI experts show a benchmark for assessing how well large natural language models can solve people's medical problems.

Existing models are not yet perfect

This latest assessment, from Google Research and Deep Mind, Inc. The experts concluded that AI models have a lot of potential in the medical field, including knowledge retrieval and supporting clinical decision-making. However, existing models are not yet perfect and may, for example, fabricate compelling medical misinformation or incorporate biases that exacerbate health inequalities. This is why there is a need to assess their clinical knowledge.

Relevant assessments have not been previously unavailable. However, in the past, automated assessments with limited benchmarks, such as individual medical test scores, have typically been relied upon. This translates to the real world with a lack of reliability and value.

Moreover, when people turn to the Internet for medical information, they experience "information overload" and then suffer a lot of unnecessary stress by choosing the worst of 10 possible diagnoses.

The team hoped that the language model would provide brief expert opinions that are unbiased, indicate their citation sources, and reasonably express uncertainty.

How the LLM performs on 540 billion parameters

To assess the ability of LLMs to encode clinical knowledge, Google Research expert Shekoufi Aziz and colleagues explored their ability to answer medical questions. The team came up with a benchmark called "MultiMedQA": it combines six existing question-answering datasets covering specialized medical, research, and consumer queries with "HealthSearchQA" -- a new dataset containing 3,173 medical questions searched online.

The team then evaluated PaLM (a 540-billion-parameter LLM) and its variant, Flan-PaLM, which they found to be state-of-the-art in some datasets. In the MedQA dataset, which integrates questions from the U.S. Physician Licensing Examination category, Flan-PaLM outperforms the previous state-of-the-art LLM by 17%.

However, while Flan-PaLM scored well on multiple-choice questions, further evaluation revealed gaps in answering consumers' medical questions.

LLM specializing in medicine is encouraging

To address this issue, AI experts further debugged Flan-PaLM to adapt to the medical domain using a method called design instruction fine-tuning. Meanwhile, the researchers introduced Med-PaLM, an LLM that specializes in the medical field.

Design instruction fine-tuning is an effective way to make a general-purpose LLM applicable to new areas of specialization. The resulting model, Med-PaLM, performed encouragingly in the pilot evaluation. For example, Flan-PaLM was rated by a group of physicians as being in agreement with the scientific consensus by only 61.9% of the long responses, and Med-PaLM was rated at 92.6% of the responses, which is equivalent to the responses made by the physicians (92.9%). Similarly, 29.7% of Flan-PaLM responses were rated as likely to lead to harmful outcomes, and only 5.8% for Med-PaLM, equivalent to responses made by physicians (6.5%).

The research team mentioned that the results, while promising, warrant further evaluation, especially as they relate to safety, fairness, and bias.In other words, there are still many limitations to overcome before the clinical application of LLM is feasible.

Most Popular

There are so many types of CPU, how to recognize the type of CPU

Microsoft for ChatGPT self-research AI chip, TSMC 5nm, as early as next year to open with

What is a LAN?

Do you know what 3D Mapping is?

What is the hosts file? Where is the hosts file?

Apple phone into the water how to do? Four first aid measures to help you

A one-minute walk through the difference between a switch and a router

What are the Wi-Fi password security levels?

Meta Quest 3 expected to support generative AI by 2024

Can AI work this round when you ask a doctor online to break a disease?

NASA is developing an artificial intelligence interface where astronauts can talk directly to AI

76-year-old father of deep learning Hinton left Google! Publishes AI threat theory, pessimistic prediction of catastrophic risk

What is the neural network of artificial intelligence?

Winning Business Excellence with Data Analytics

Has the development of big data come to an end?

How Research Institutes Should Use Data Analytics Tools to Improve Research Efficiency

How to Program Big Data Effectively

Five database concepts, read the database layout of Amazon Cloud Technologies

Healthcare Explores Cloud Computing Market: Security Concerns Raise, Multi-Party Collaboration Urgently Needed

Remote work and cloud computing create a variety of endpoint security issues

Three common misconceptions about sustainability and cloud computing

Ten Ways Cloud-Native Development is Changing Cybersecurity

What is a multi-cloud network?

Smart Supply Chain Guide

Internet of Things and the Elderly

The Future of the Internet of Things and Self-Storage

Skills shortage remains the biggest barrier to IoT adoption in the oil and gas industry

Why the Metaverse Matters for the Future of Manufacturing

Blockchain Foundation - What is Blockchain Technology

Blockchain Wallet

Scientists propose quantum proof-of-work consensus for blockchain

How blockchain technology can be applied to environmental protection to drive a green economy

After the collision between quantum computing and blockchain - quantum blockchain

Everything you need to know about artificial intelligence in the age of AI

6 Tips for Getting ChatGPT to Aid Brainstorming

Low-Quality AI-Generated Websites Are Growing Rapidly With Ad Support

There are so many types of CPU, how to recognize the type of CPU

Microsoft for ChatGPT self-research AI chip, TSMC 5nm, as early as next year to open with

What is a LAN?

Most Popular

Can AI work this round when you ask a doctor online to break a disease?

Related Articles