TNM Tumor Classification from Unstructured Breast Cancer Pathology Reports using LoRA Finetuning of Mistral 7B

Published in AAAI 2024 Spring Symposium on Clinical Foundation Models, 2022

Abstract:

Over the past year, large language models have seen an explosion in usage, with researchers and companies rushing to discover new applications. This explosion was kick-started by OpenAI, with their release of GPT 3.5 and GPT 4 to the general public. These foundation models have proven extraordinarily capable on a wide range of tasks, but their cost and reliability present problems for more sensitive and/or resource-limited applications. Over the same time-span, however, we have also seen a rush of development in smaller foundation models, such as Mistral’s 7B model, as well as in fine-tuning those models for specific tasks.

In this paper, we explore the application of Low-Rank Adaptation (LoRA) fine-tuning of small language models for performing TNM staging on unstructured pathology reports for triple negative breast cancer cases. We also attempt to develop a more generalized approach, so that our work can be applied to other NLP tasks within the medical field.

We found that performing TNM staging with reliable accuracy is possible for a small foundational model through fine-tuning, allowing fast and reliable automation of critical language processing tasks within medicine.

Recommended citation: McCleary, K., Ghawaly, J., & Miele, L. (2024). TNM Tumor Classification from Unstructured Breast Cancer Pathology Reports using LoRA Finetuning of Mistral 7B. In AAAI 2024 Spring Symposium on Clinical Foundation Models.