Page 195 - Kaleidoscope Academic Conference Proceedings 2024
P. 195
ENHANCING ONCOLOGY CARE WITH FEDERATED LEARNING AND
FOUNDATION MODELS
1
Gagan, N and Sanand, Sasidharan 2
1
GE HealthCare
ABSTRACT specific nature of medical terminologies and concepts. The
domain adaptation methods address this issue. BioBERT
Millions of people worldwide are battling cancer, and [3] is an example of a biomedical domain adapted BERT
personalised care plans are essential for effective diagnosis, model, that has demonstrated better performance in NLP
treatment, and monitoring of this disease. Recently, Large tasks within this domain compared to the original BERT
Language Models (LLMs) have proven valuable in cancer model. Nevertheless, acquiring good quality medical data
treatment, for instance, extracting key information from is quite difficult due to privacy, data governance, and the
Electronic Medical Records (EMRs). This study presents a difficulties in handling sensitive patient data.
transformer encoder based LLM, that is domain adapted for
Federated learning which is a privacy-preserving ML
Oncology, and outperforms generic models in recognising
technique, has been created as the possible solution that can
critical oncology related elements from clinical text. We
help in collaborative model training across decentralised data
observe that the development of such domain specific LLMs
sources without actually sharing the data [4]. Particularly
demands a huge amount of data and computational resources,
in the area of sensitive data like healthcare, federated
which is a deterrent to the sustainability development goal
learning has a lot of benefits over traditional centralised
of equitable health. To address this problem, we propose
ML methods. Federated learning allows the collaborative
a federated learning approach for model development that
sharing of model training among decentralised data sources
will eliminate data sharing and centralised computational
without compromising data privacy, which is a very important
resource costs. Our evaluations show that the federated
issue as it involves data governance, computational resource
approach outperforms the generic base model, highlighting
limitations, and models. Our method relies on the combined
the advantages of collaborative learning in capturing domain
knowledge of many healthcare facilities thus, direct data
specific knowledge and enhancing performance in oncology
sharing is not needed. Also, federated methods are essential
related NLP tasks. Our work is in line with the United
to eliminating biases in AI models towards developed
Nations Sustainable Development Goals (SDGs) which are
demographics by addressing the two major barriers to model
aimed at promoting equitable health and narrowing down
development: data exchange and computational facilities.
the differences in access to advanced cancer treatment.
This study develops a method that unites transformer based
language models, domain adaptation, and federated learning
Keywords - Pre-tuning, Domain Adaptation, Federated
to improve oncology care. We introduce a language
Learning, NER, Fine-tuning, BERT, SDG’s, Embedding
model specifically designed for the oncology domain, which
outperforms generic models in NLP tasks related to oncology.
1. INTRODUCTION
Our evaluation was mainly based on the NER task as
Cancer is a global health issue affecting millions of people a primary metric, which showed the model’s capacity to
worldwide. The recent progress in artificial intelligence (AI) recognise and extract significant entities in the oncology
and machine learning (ML) has proved to be very effective field. Nevertheless, the scope of this oncology-specific
for oncology care by means of data-driven insights and foundational model is not limited to NER, it can be modified
decision support systems [1]. Natural Language Processing for other downstream tasks like relation extraction, text
(NLP), particularly Named Entity Recognition (NER), is a classification, and text generation, thus paving the way for
very useful tool in oncology care. It identifies and extracts further development in oncology text mining and analysis.
the vital elements like cancer types, treatments, drugs, etc. To overcome the issues related to data sharing and the lack
from unstructured medical texts such as clinical notes and of computational resources, we propose a federated learning
pathology reports. NER assists oncologists in these tasks so based model to perform collaborative model building without
that they can efficiently access patients key information in a compromising data privacy.
very short time, which leads to better diagnosis, treatment Our work is in line with the United Nations Sustainable
planning, and overall patient management. Development Goals (SDGs), particularly SDG 3 (Good
The transformer based language models such as BERT have Health and Well being), SDG 9 (Industry, Innovation, and
proven to be very effective in different NLP tasks [2]. Infrastructure), and SDG 10 (Reduced Inequalities) as we
However, the performance of these systems in specialised promote balanced health and limit the disparity in the
fields like oncology is not optimal because of the domain availability of advanced oncology care. The enhanced
978-92-61-39091-4/CFP2268P @ITU 2024 – 151 – Kaleidoscope