di , 05/10/2023

In the big world of health care, the amount of unstructured data: medical records, clinical research papers, scientific publications, clinical trials, etc., can be overwhelming.

Extracting valuable information and knowledge from this unstructured data has long been a challenge, hindering the progress of medical research, diagnosis, and patient care.

However, with the advent of large-scale language models (LLMs), a breakthrough has occurred. These powerful artificial intelligence models have broken barriers, paving the way for unprecedented advances in the analysis of unstructured data in healthcare.

These models have incredible potential and are already transforming the data analytics landscape.

What are Large Language Models?

In recent years, large language models have emerged as an innovative development in artificial intelligence (AI) technology and natural language processing (NLP), transforming several fields.

They are designed to process and understand human language by exploiting large amounts of textual data. By learning patterns, relationships, and contextual information from this data, these models gain the ability to generate coherent and contextually appropriate responses and perform various language-related tasks.

LLMs are built with interconnected artificial neurons that imitate the human brain. They undergo extensive training on enormous datasets containing billions of sentences from diverse sources like books, articles, and websites.

Another vital aspect of these models is their immense number of parameters, which can range from millions to billions. These parameters enable the models to grasp the intricacies of language, resulting in the generation of contextually relevant and high-quality text.

Real-world Examples of Large Language Models in Healthcare Analytics

Disease diagnosis and treatment recommendations
In a study, researchers trained a language model using a large amount of medical literature and medical records. The model was then used to analyze complex patient cases, accurately diagnosing rare diseases and recommending tailored treatment strategies based on the latest research findings.

Literature review and evidence synthesis
Researchers have used these models to analyze large volumes of scientific literature, enabling comprehensive reviews and evidence-based assessments. By automating the extraction and synthesis of information, language models accelerate the identification of relevant studies, summarize key findings, and support evidence-based decision making.

Medical image analysis and radiology
In many scenarios, models can interpret radiology reports and extract key findings, aiding radiologists in diagnosis. They can also help with automatic report generation, reducing reporting time and improving workflow efficiency in radiology.

Mental health support and chatbots
These models have been integrated into mental health support systems and chatbots, providing personalized assistance and resources to people. They are also able to initiate natural language conversations, understand emotional nuances, and provide support, information, and referrals for mental health issues.

Integrating Large Language Models in Life Sciences

LLMs are not easily replicable, nor affordable for all organizations. The energy cost of training GPT-4 has been close to $100 million and rising in proportion to the complexity of the model itself. Thus, large IT companies, including Google, Amazon and OpenAI (sponsored by Microsoft, and others) have been the only players to have entered this space.

Users are therefore forced to work with these pre-trained models, limited to simple “fine tuning” with respect to their needs. However, for very specific domains, it is crucial to recognize that the results and performance may differ substantially from expectations.

Healthcare is a knowledge domain where many of the documents (scientific publications, etc.) are publicly available, and, therefore, large language models are already trained and seem to work well. When, however, we submit private and very specialized documents, performance may change and the LLM may not recognize concepts, such as: active ingredients, or names of molecules, or development processes that are internal knowledge.

Often implemented by universities and research centers, some LLMs, such as Google BERT, have been specialized, with additional training on certain areas, and released to the open-source community: BioBERT, MedBERT, SciBERT; and more recently, BioGPT, a verticalized version on biomedical concepts of the well-known GPT, have been released as well.

Therefore, it is important to have awareness and understanding of the scope of the intended use cases to choose the most suitable model, without getting dragged into the mainstream ChatGPT.

The right process of development can thus be summarized as:

  • Identify the right use case: Assess business operations to identify areas where an LLM can add value.
  • Select the appropriate model: Choose an LLM that fits your needs, considering the complexity of the task, model capabilities and resource requirements.
  • Prepare and fine-tune data: Collect and if necessary, pre-process relevant data to fine-tune the chosen model to ensure that it is aligned with the business context and produces accurate, domain-specific results.
  • Plan integration with existing systems: Perform the integration of an LLM into existing business processes and technology infrastructure.
  • Monitor and evaluate performance: Continuously monitor the performance of the implemented LLM, using metrics such as accuracy, response time, and user satisfaction to identify areas for improvement.
  • Ethical and privacy considerations: Take into account potential ethical and privacy issues related to AI implementation, while ensuring compliance with data protection regulations and responsible use of AI technologies.
  • Promote a culture of AI adoption: Encourage understanding and acceptance of AI technologies throughout the company by providing training and resources for employees to embrace and leverage LLMs.

Encouraging further exploration and experimentation

Ongoing research, development and testing of language models are essential to fully unlock their potential in health data analytics, to ensure that data privacy and security standards are met and to promote responsible use of AI technologies. While the seamless integration of language models with existing healthcare systems and workflows is critical for widespread adoption. By developing interoperable platforms and APIs that allow easy access to the models and facilitate integration with electronic health records, clinical decision support systems, and other healthcare applications, the potential impact and usability of large language models can be maximized.

It’s clear that these technologies have disrupted the landscape of healthcare data analytics, providing healthcare providers with advanced capabilities to extract information, thus improving care, and driving medical research.

The way forward with Healthware

For years, we at Healthware have been following the evolution of artificial intelligence and have utilized our machine learning and data science expertise to help our customers.

The new LLM-based tools offer us and our customers new ways to accelerate, enhance, and develop processes, products, and projects. They won’t make professionals obsolete, instead; they will empower them to work faster and more efficiently.

Our senior developers are already utilizing ChatGPT to speed up development work. Instead of researching documentation, the developer can ask the chatbot to help create a new component, which they can then review and integrate into the codebase. Chatbots are especially useful for more senior developers, who can adequately review the code and ensure it is suitable, working, and secure.

This approach allowed our designer to focus entirely on the core design. Looking toward the future, we could ask Chatbots to generate ideas or sketches of these characters. Ultimately, this design approach expedited the discovery process, allowing the designers to find the correct style and refine it.

And the number of opportunities just keeps growing. We can generate audio, video, text, images, code, and more with the current tools. These usually cannot be used as-is at the moment but are great drafts that our experts can finalize. And as the technology evolves, more and more final production content can be generated with these tools. They have already opened up a new skillset of growing importance in the future: prompt hacking. I.e., the ability to ask the right questions with the proper context in the right way from the right chatbot to get the best possible results.