Edit this page

NA-MIC Project Weeks

Back to Projects List

Extraction of Orofacial Pain Comorbidities from Clinical Notes Using Large Language Models

Key Investigators

Project Description

Temporomandibular Disorders (TMDs) are often linked with complex comorbidities that are difficult to extract from long free-text clinical notes. This project leverages Large Language Models (LLMs) to identify and summarize these comorbidities, enabling structured analysis and visualization across patient cohorts.

Objective

  1. Fine-tune open-source LLMs to extract a curated list of TMD-related comorbidities from clinical notes.
  2. Generate structured patient-level outputs from model predictions.
  3. Visualize comorbidity data using an interactive dashboard.
  4. Compare model performance to determine the most clinically effective approach.

Approach and Plan

  1. Annotate clinical notes with summaries across 56 comorbidity criteria.
  2. Fine-tune LLMs such as facebook/bart-large-cnn and deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B using chunked note inputs.
  3. Generate structured outputs and compile them into a CSV.
  4. Visualize cohort-level trends using a Python-based dashboard.
  5. Evaluate model performance and deploy the tool to be accessible in 3D Slicer.

Progress and Next Steps

  1. Deidentified clinical notes were obtained and manually summarized for 500.
  2. Fine-tuned facebook/bart-large-cnn and deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B on these summaries to generate structured outputs across 56 comorbidity fields.
  3. Generated CSV outputs from model summaries and created a dashboard to visualize cohort-level patterns.
  4. Currently working on fine-tuning larger models and expanding the dataset.
  5. Next steps include completing 500 patient summaries, comparing model performance, and deploying the tool for use in 3D Slicer.

Illustrations

Table 1. Metrics from BART training

Fold ROUGE-1 ROUGE-2 ROUGE-L ROUGE-L sum
Fold1 83.68 71.99 83.50 83.49
Fold2 83.48 73.40 83.11 83.14
Fold3 84.93 74.23 84.38 84.57
Fold4 85.50 74.73 85.11 85.21
Fold5 85.47 74.64 84.98 85.01
Average 84.61 73.80 84.22 84.29

Table 2. Metrics from DeepSeek training

Fold ROUGE-1 ROUGE-2 ROUGE-L ROUGE-L sum
Fold1 86.55 86.49 86.53 86.54
Fold2 84.90 84.79 84.86 84.82
Fold3 86.08 86.09 86.10 86.11
Fold4 85.96 85.91 85.95 85.92
Fold5 85.21 85.70 85.17 85.21
Average 85.74 85.70 85.72 85.72

Figure 1. Dashboard summary from 500 cases extracted manually

Dashboard summary from 500 cases extracted manually

Figure 2. Dashboard summary from 500 cases extracted by fine-tuned BART

Dashboard summary from 500 cases extracted automatically by BART

Figure 3. Dashboard summary from 500 cases extracted by fine-tuned DeepSeek

Dashboard summary from 500 cases extracted automatically by DeepSeek

Background and References