Page 198 - Kaleidoscope Academic Conference Proceedings 2024

P. 198

2024 ITU Kaleidoscope Academic Conference

approach. The common strategy is that centralised model
training typically requires high computational power and
storage, which are not always available in individual hospitals.
Federated learning enables every hospital to use its local
computational resources for local model training, eliminating
the need for expensive infrastructure and allowing even
hospitals with fewer resources to participate in the process.
(a) Conventional Approach
3.4 Evaluation Strategy

To evaluate the effectiveness of our approach and demonstrate
the benefits of domain adaptation and federated learning, we
conducted evaluations, including embedding visualisation,
clustering analysis, and named entity recognition (NER)
tasks. We assessed the domain adaptation of our pre-tuned
models and BioBERT by visualising the semantic relations
of the oncology related terms that the models could
(b) Federated Approach
comprehend. We took the embeddings of the key words
related to cancer treatment, diagnosis, and general terms
Figure 2 – Model Training Approaches from each model. To visualise the high dimensional word
embeddings in a two dimensional space, t-SNE was used.
employed a federated learning approach, as illustrated in To maintain t-SNE projection standardisation, we used the
Figure 2. Earlier, machine learning models for the healthcare same projection martrix for all three models. We applied
domain were usually created by gathering data from various K-means clustering after dimensionality reduction to group
hospitals and then putting it together in a central repository to similar words together based on their proximity in the reduced
make models. This centralised data set was used to build AI space. The embedding visualisations and clustering analyses
models. As shown in Figure 2a health care facilities have to gave qualitative insights into the model’s capacity to represent
open up their patients confidential data to an external model semantic similarities. Furthermore, we also tested the
development repository. The storage and operation of data models on the NER task using a manually annotated clinical
that is not under the control of individual hospitals have made reports dataset. The dataset was preprocessed, tokenized,
people worry about privacy, possible breaches, and legal as and the NER labels were aligned with the token sequences.
well as ethical issues. This reduces the volume and diversity The models were fine-tuned to predict the labels for each
of the data made available for model training, which could token. To measure the contribution of domain adaptation and
result in biassed or less generalizable models. federated learning to the exact identification and classification
Federated learning allows collaborative model training of cancer related entities in the clinical texts, we compared
without hospitals having to reveal their patients data. As the precision, recall, and F1 scores of pre-trained models and
demonstrated in Figure 2b. Every hospital has its own BioBERT on the NER task.
data set and trains the local model on their site. Only
model updates, like weights or gradients, are exchanged 4. RESULTS AND DISCUSSION
with a central server for aggregation. Federated learning
In this section, we discuss the outcomes of our evaluations,
addresses privacy and security issues by decentralising data
focusing on two major aspects: (1) visualisation and
and granting each hospital autonomy over its own data.
clustering of embedding, and (2) the named entity recognition
This approach guarantees that confidential patient data will
task. The first step is to check how domain adaptation can
not be accessed by unauthorised people, and it complies
really capture the semantics of oncology related terms by
with stringent regulations and standards, such as HIPAA,
means of embedding visualisation and clustering. After
which governs the management of health information in the
that, we analyse the performance of all three models on
healthcare industry. Apart from that, federated learning
the NER task, showing quantitative results and talking about
encourages data governance and ownership as the hospitals
the enhancements made by domain adaptation and federated
keep their own data and make a choice of when to participate
learning.
in collaborative model training. This approach encourages
other hospitals to exchange their trained updates that are
4.1 Embedding Visualisation and Clustering
more diverse and reflective of the model’s development.
The increased diversity of data enables the development
To assess the domain adaptation in acquiring semantic
of stronger and more generalizable models that can reflect
relations and similarities between oncology-related terms, we
the differences in patient populations and clinical practices
performed embedding visualization and clustering analysis
among different hospitals.
on BioBERT and our two domain-adapted models (one
Moreover, federated learning optimises the computational developed without federation and the other with federation).
resources, and this is effective relative to the standard t-SNE (t-distributed Stochastic Neighbor Embedding) is a
– 154 –

193 194 195 196 197 198 199 200 201 202 203