Page 196 - Kaleidoscope Academic Conference Proceedings 2024
P. 196
2024 ITU Kaleidoscope Academic Conference
performance of our domain adaptable and federated models 2.2 Federated Learning in Healthcare
in capturing oncology specific language and knowledge
will conversely help in various aspects of oncology care, Federated learning is a technique that deals with privacy
propagating the delivery of personalised, equitable, and high and data governance problems in AI applications. It
quality oncology care to all patients across the demographics. permits the collaboration of model training across different
decentralised data sources without actually sharing any
Even though the strategies mentioned in this paper are sensitive information, hence, it is a privacy-preserving
designed for encoder based transformers, they can also be substitute to the traditional way of centralised machine
applied to decoder based transformer architectures, thus learning [4]. For instance, researchers have proposed
laying a foundation for further exploration and application a federated learning framework that allows collaboration
in various natural language processing tasks. The rest among multiple medical institutions on medical image
of the paper is organised as follows: Section 2 discusses analysis tasks like finding COVID-19 using chest X-ray
related work, Section 3 discusses the methodology, Section 4 images [12]. These studies show the federated learning is
presents the Results, and Section 5 concludes the paper. capable of being used to create models in collaboration while
at the same time, maintaining data privacy.
In the context of natural language processing, federated
2. RELATED WORK learning has been applied for different tasks, such as clinical
entity recognition with EHR data from various healthcare
institutions. These methods prove the efficiency of federated
This section provides an overview of relevant literature and
learning in destroying data silos and enhancing model
research advancements in two key areas: Domain adaptation
accuracy by means of collaborative training. Moreover, the
of transformer-based language models and federated learning.
scientists have also examined federated transfer learning for
We underline the progress made in these areas, and at the same
medical relation extraction, in which pre-trained models are
time, we point out the gaps and opportunities that motivated
adjusted on distributed data sources and better performance is
our work.
achieved than that of centralised training [13]. Also, medical
relation extraction tasks have been solved using federated
2.1 Domain Adaptation for Healthcare Applications learning as well, thus proving the possibility of privacy
preserving collaborative learning [14]. Even though these
studies show positive outcomes, the application of federated
The performance of transformer based language models like learning for domain adaptation of transformer based language
BERT [2], in natural language processing has motivated models to a specific healthcare area such as oncology remains
researchers to consider their application in many fields, unexplored [15].
including healthcare. Nevertheless, the complexity of
medical terminology and concepts creates difficulties due 2.3 Research Contributions
to model application in the context of domain adaptation.
Many studies have investigated domain adaptation techniques This research provides an integration approach of domain
to enhance the performance of pre-trained language models adaptation and federated learning approaches to improve
in the biomedical and clinical fields. This can be seen in oncology practice through the development of a stable,
ClinicalBERT [5], which was fine-tuned on clinical notes privacy preserving base model specific to oncology.
from the MIMIC-III dataset [6] and did better at clinical Specifically, our contributions are as follows :
natural language inference and relation. BlueBERT [7] has
been fine-tuned on electronic health records (EHRs) and 1. Domain Adaptation: We utilise a set of
showed much better performance than BioBERT [3] and other oncology-related datasets that encompass
baselines in clinical named entity recognition and relation cancer-specific language nuances and semantics
extraction tasks. Among other domain adapted models to adapt the BioBERT model for the oncology domain.
are PubMedBERT [8], fine-tuned on PubMed abstracts and
2. Federated Learning: We employ federated learning
full-text articles, and SciBERT [9], which is fine-tuned
to address data collection and computation challenges,
on a large corpus of scientific literature. In [10] Zhang
training models at source sites, and aggregating weights
Et.al trained BERT on Chinese medical diagnostic and
to distribute costs and maintain privacy.
treatment texts. Liu Et.al have proposed Med-BERT [11],
medical dictionary enhanced BERT model. These models 3. Extensive Evaluation: To demonstrate the
performed better in biomedical information extraction, text effectiveness of our approach in capturing domain
classification, and question answering tasks. Although, these specific semantics and improving oncology based NLP
domain adapted models have demonstrated potential in their tasks, we perform evaluations including embedding
specific medical domains, their suitability for special areas visualisation, clustering analysis, and named entity
such as oncology is quite restricted. Terms, concepts, and recognition (NER) tasks.
contexts specific to oncology can be extremely subtle and
often require special domain adaptation to help understand This study intends to improve AI in oncology care through
the specifics of cancer language and knowledge. the use of transformer-based language models, domain
– 152 –