Back to
Projects List
Extraction of Orofacial Pain Comorbidities from Clinical Notes Using Large Language Models
Key Investigators
- Alban Gaydamour (University of Michigan, USA)
- Lucia Cevidanes (University of Michigan, USA)
- Steve Pieper (Isomics, USA)
- David Hanauer (University of Michigan, USA)
- Juan Prieto (University of North Carolina, USA)
- Lucie Dole (University of North Carolina, USA)
Project Description
Temporomandibular Disorders (TMDs) are often linked with complex comorbidities that are difficult to extract from long free-text clinical notes. This project leverages Large Language Models (LLMs) to identify and summarize these comorbidities, enabling structured analysis and visualization across patient cohorts.
Objective
- Fine-tune open-source LLMs to extract a curated list of TMD-related comorbidities from clinical notes.
- Generate structured patient-level outputs from model predictions.
- Visualize comorbidity data using an interactive dashboard.
- Compare model performance to determine the most clinically effective approach.
Approach and Plan
- Annotate clinical notes with summaries across 56 comorbidity criteria.
- Fine-tune LLMs such as
facebook/bart-large-cnn
and deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
using chunked note inputs.
- Generate structured outputs and compile them into a CSV.
- Visualize cohort-level trends using a Python-based dashboard.
- Evaluate model performance and deploy the tool to be accessible in 3D Slicer.
Progress and Next Steps
- Deidentified clinical notes were obtained and manually summarized for 500.
- Fine-tuned
facebook/bart-large-cnn
and deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
on these summaries to generate structured outputs across 56 comorbidity fields.
- Generated CSV outputs from model summaries and created a dashboard to visualize cohort-level patterns.
- Currently working on fine-tuning larger models and expanding the dataset.
- Next steps include completing 500 patient summaries, comparing model performance, and deploying the tool for use in 3D Slicer.
Illustrations
Table 1. Metrics from BART training
Fold |
ROUGE-1 |
ROUGE-2 |
ROUGE-L |
ROUGE-L sum |
Fold1 |
83.68 |
71.99 |
83.50 |
83.49 |
Fold2 |
83.48 |
73.40 |
83.11 |
83.14 |
Fold3 |
84.93 |
74.23 |
84.38 |
84.57 |
Fold4 |
85.50 |
74.73 |
85.11 |
85.21 |
Fold5 |
85.47 |
74.64 |
84.98 |
85.01 |
Average |
84.61 |
73.80 |
84.22 |
84.29 |
Table 2. Metrics from DeepSeek training
Fold |
ROUGE-1 |
ROUGE-2 |
ROUGE-L |
ROUGE-L sum |
Fold1 |
86.55 |
86.49 |
86.53 |
86.54 |
Fold2 |
84.90 |
84.79 |
84.86 |
84.82 |
Fold3 |
86.08 |
86.09 |
86.10 |
86.11 |
Fold4 |
85.96 |
85.91 |
85.95 |
85.92 |
Fold5 |
85.21 |
85.70 |
85.17 |
85.21 |
Average |
85.74 |
85.70 |
85.72 |
85.72 |



Background and References
- Github Page: https://github.com/DCBIA-OrthoLab/MedEx
- Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880.
- DeepSeek-AI, Guo D, Yang D, et al. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. Preprint at arXiv, 2025. Available from: https://arxiv.org/pdf/2501.12948.