Machine Learning and Natural Language Processing for Biomedical Applications
National Library Of Medicine
Investigators
Linked publications & trials
Abstract
Over the last decade, the online search for biological information has progressed rapidly and has become an integral part of any scientific discovery process. Today, it is virtually impossible to conduct R&D in biomedicine without relying on the kind of Web resources developed and maintained by the NLM. Indeed, each day millions of users search for biological information via NLM updated online PubMed system. However, finding data relevant to a users information need is not always easy. Improving our understanding of the growing population of Entrez users, their information needs and the way in which they meet these needs opens opportunities to improve information services and information access provided by NLM. In late 2024, we introduced TrialGPT, an innovative, end-to-end framework leveraging large language models (LLMs) for zero-shot patient-to-trial matching, aiming to streamline the notoriously time-consuming process of clinical trial recruitment. TrialGPT consists of three integrated modules: Retrieval, which filters candidate trials; Matching, which assesses eligibility at the level of trial inclusion/exclusion criteria with approximately 87.3% accuracyâcomparable to human clinicians; and Ranking, which produces trial-level relevance scores that strongly correlate with human judgments and outperform previous methods by about 43.8% in ranking precision. Evaluated using synthetic and real-world patient data, TrialGPT demonstrated over 90% recall of relevant trials using under 6% of the initial trial set. Importantly, a pilot user study showed that clinicians using TrialGPT reduced screening time by around 42â43% while maintaining the same decision accuracy, highlighting its potential to enhance efficiency and equity in clinical research enrollment We further developed GeneAgent, an innovative AI agent built on GPT 4 that enhances gene set analysis by reducing hallucinations through autonomous self-verification. GeneAgent processes user-provided gene sets by generating candidate functional descriptions via an LLM, then systematically verifies these against information retrieved from multiple expert-curated biomedical databases. Claims are categorized as supported, partially supported, or refuted, enabling the model to refine its outputs iteratively. Evaluations using datasets from gene ontology, tumor proteomics, and molecular function databases demonstrate that GeneAgent significantly outperforms standard GPT 4 approaches â particularly on novel or cross-species gene sets â by yielding more accurate and reliable annotations. Human expert assessment of a random sample of claims confirmed a striking 92% correctness rate in its self-verification module. Overall, GeneAgent presents a powerful advancement toward more trustworthy, evidence-grounded functional insights from gene set analysis. In NLP research, we published a study that systematically benchmarks large language models (LLMs), including GPT-4 and open-source LLaMA variants, across 12 biomedical NLP tasks spanning six application areas. We compared zero-shot, few-shot, and fine-tuned LLM performance against established BERT- and BART-based models. We find that traditional fine-tuned models still outperform zero- or few-shot LLMs in most tasks, while GPT-4 excels in reasoning-heavy applications such as medical question answering. Open-source LLMs show promise but require fine-tuning to approach closed-source performance. The study also highlights persistent challenges, including hallucinations and missing information in LLM outputs, and provides cost analyses alongside practical recommendations. Overall, the work underscores that while LLMs hold transformative potential for biomedical text processing, careful model selection, fine-tuning, and error awareness remain essential for reliable deployment. We further addressed a key limitation in biomedical relation extraction (RE) by enhancing the BioRED dataset with explicit directionality annotationsâdefining whether entities act as subjects or objects in their relationshipsâwhich is crucial for properly understanding and reconstructing biological networks. Specifically, we enriched BioRED with 10,864 such annotations and developed an innovative multi-task language model that jointly identifies relation types, novelty, and entity roles using soft prompt learning, context segmentation to handle long documents, and multi-task training, thereby overcoming typical BERT input limitations. Our method not only leverages these enriched annotations to improve relational understanding, but also outperforms state-of-the-art models, including GPT 4 and LLaMA 3 (with parameter-efficient fine-tuning) on benchmark document-level RE tasks. The dataset and accompanying code (BioREDirect) are made publicly available. Finally, we launched LitSenseâ¯2.0, an advanced biomedical literature search system that enables granular, AI-powered retrieval at both the sentence and paragraph levels, covering around 38â¯million PubMed abstracts and 6.6â¯million open-access full-text articles from PMCâ totaling approximately 1.4â¯billion sentences and ~300â¯million paragraphs, updated weekly. It builds on the original LitSense platform by introducing paragraph-level search in addition to sentence-level, improving retrieval accuracy using a state-of-the-art biomedical text encoder, and offering a unified, cross-platform interface for seamless searching across PubMed and PMC. These enhancements significantly boost the precision and usability of biomedical information extraction for researchers.
View original record on NIH RePORTER →