Page 198 - Kaleidoscope Academic Conference Proceedings 2024
P. 198

2024 ITU Kaleidoscope Academic Conference




                                                              approach. The common strategy is that centralised model
                                                              training typically requires high computational power and
                                                              storage, which are not always available in individual hospitals.
                                                              Federated learning enables every hospital to use its local
                                                              computational resources for local model training, eliminating
                                                              the need for expensive infrastructure and allowing even
                                                              hospitals with fewer resources to participate in the process.
                         (a) Conventional Approach
                                                              3.4  Evaluation Strategy

                                                              To evaluate the effectiveness of our approach and demonstrate
                                                              the benefits of domain adaptation and federated learning, we
                                                              conducted evaluations, including embedding visualisation,
                                                              clustering analysis, and named entity recognition (NER)
                                                              tasks. We assessed the domain adaptation of our pre-tuned
                                                              models and BioBERT by visualising the semantic relations
                                                              of the oncology related terms that the models could
                          (b) Federated Approach
                                                              comprehend. We took the embeddings of the key words
                                                              related to cancer treatment, diagnosis, and general terms
                   Figure 2 – Model Training Approaches       from each model. To visualise the high dimensional word
                                                              embeddings in a two dimensional space, t-SNE was used.
           employed a federated learning approach, as illustrated in  To maintain t-SNE projection standardisation, we used the
           Figure 2. Earlier, machine learning models for the healthcare  same projection martrix for all three models. We applied
           domain were usually created by gathering data from various  K-means clustering after dimensionality reduction to group
           hospitals and then putting it together in a central repository to  similar words together based on their proximity in the reduced
           make models. This centralised data set was used to build AI  space. The embedding visualisations and clustering analyses
           models. As shown in Figure 2a health care facilities have to  gave qualitative insights into the model’s capacity to represent
           open up their patients confidential data to an external model  semantic similarities.  Furthermore, we also tested the
           development repository. The storage and operation of data  models on the NER task using a manually annotated clinical
           that is not under the control of individual hospitals have made  reports dataset. The dataset was preprocessed, tokenized,
           people worry about privacy, possible breaches, and legal as  and the NER labels were aligned with the token sequences.
           well as ethical issues. This reduces the volume and diversity  The models were fine-tuned to predict the labels for each
           of the data made available for model training, which could  token. To measure the contribution of domain adaptation and
           result in biassed or less generalizable models.    federated learning to the exact identification and classification
           Federated learning allows collaborative model training  of cancer related entities in the clinical texts, we compared
           without hospitals having to reveal their patients data. As  the precision, recall, and F1 scores of pre-trained models and
           demonstrated in Figure 2b.  Every hospital has its own  BioBERT on the NER task.
           data set and trains the local model on their site.  Only
           model updates, like weights or gradients, are exchanged     4.  RESULTS AND DISCUSSION
           with a central server for aggregation. Federated learning
                                                              In this section, we discuss the outcomes of our evaluations,
           addresses privacy and security issues by decentralising data
                                                              focusing on two major aspects:  (1) visualisation and
           and granting each hospital autonomy over its own data.
                                                              clustering of embedding, and (2) the named entity recognition
           This approach guarantees that confidential patient data will
                                                              task. The first step is to check how domain adaptation can
           not be accessed by unauthorised people, and it complies
                                                              really capture the semantics of oncology related terms by
           with stringent regulations and standards, such as HIPAA,
                                                              means of embedding visualisation and clustering.  After
           which governs the management of health information in the
                                                              that, we analyse the performance of all three models on
           healthcare industry.  Apart from that, federated learning
                                                              the NER task, showing quantitative results and talking about
           encourages data governance and ownership as the hospitals
                                                              the enhancements made by domain adaptation and federated
           keep their own data and make a choice of when to participate
                                                              learning.
           in collaborative model training. This approach encourages
           other hospitals to exchange their trained updates that are
                                                              4.1 Embedding Visualisation and Clustering
           more diverse and reflective of the model’s development.
           The increased diversity of data enables the development
                                                              To assess the domain adaptation in acquiring semantic
           of stronger and more generalizable models that can reflect
                                                              relations and similarities between oncology-related terms, we
           the differences in patient populations and clinical practices
                                                              performed embedding visualization and clustering analysis
           among different hospitals.
                                                              on BioBERT and our two domain-adapted models (one
           Moreover, federated learning optimises the computational  developed without federation and the other with federation).
           resources, and this is effective relative to the standard  t-SNE (t-distributed Stochastic Neighbor Embedding) is a
                                                          – 154 –
   193   194   195   196   197   198   199   200   201   202   203