TNM Tumor Classification from Unstructured Breast Cancer Pathology Reports using LoRA Finetuning of Mistral 7B

Published in AAAI 2024 Spring Symposium on Clinical Foundation Models, 2022

Abstract:

Over the past year, large language models have seen an explosion in usage, with researchers and companies rushing to discover new applications. This explosion was kick-started by OpenAI, with their release of GPT 3.5 and GPT 4 to the general public. These foundation models have proven extraordinarily capable on a wide range of tasks, but their cost and reliability present problems for more sensitive and/or resource-limited applications. Over the same time-span, however, we have also seen a rush of development in smaller foundation models, such as Mistral’s 7B model, as well as in fine-tuning those models for specific tasks.

In this paper, we explore the application of Low-Rank Adaptation (LoRA) fine-tuning of small language models for performing TNM staging on unstructured pathology reports for triple negative breast cancer cases. We also attempt to develop a more generalized approach, so that our work can be applied to other NLP tasks within the medical field.

We found that performing TNM staging with reliable accuracy is possible for a small foundational model through fine-tuning, allowing fast and reliable automation of critical language processing tasks within medicine.

Recommended citation: McCleary, K., Ghawaly, J., & Miele, L. (2024). TNM Tumor Classification from Unstructured Breast Cancer Pathology Reports using LoRA Finetuning of Mistral 7B. In AAAI 2024 Spring Symposium on Clinical Foundation Models.

Share on

Twitter Facebook LinkedIn

James Ghawaly Jr. PhD

Share on