Page 195 - Kaleidoscope Academic Conference Proceedings 2024
P. 195

ENHANCING ONCOLOGY CARE WITH FEDERATED LEARNING AND
                                               FOUNDATION MODELS

                                                      1
                                              Gagan, N and Sanand, Sasidharan 2
                                                      1
                                                       GE HealthCare



                              ABSTRACT                        specific nature of medical terminologies and concepts. The
                                                              domain adaptation methods address this issue. BioBERT
            Millions of people worldwide are battling cancer, and  [3] is an example of a biomedical domain adapted BERT
           personalised care plans are essential for effective diagnosis,  model, that has demonstrated better performance in NLP
           treatment, and monitoring of this disease. Recently, Large  tasks within this domain compared to the original BERT
           Language Models (LLMs) have proven valuable in cancer  model. Nevertheless, acquiring good quality medical data
           treatment, for instance, extracting key information from  is quite difficult due to privacy, data governance, and the
           Electronic Medical Records (EMRs). This study presents a  difficulties in handling sensitive patient data.
           transformer encoder based LLM, that is domain adapted for
                                                              Federated learning which is a privacy-preserving ML
           Oncology, and outperforms generic models in recognising
                                                              technique, has been created as the possible solution that can
           critical oncology related elements from clinical text. We
                                                              help in collaborative model training across decentralised data
           observe that the development of such domain specific LLMs
                                                              sources without actually sharing the data [4]. Particularly
           demands a huge amount of data and computational resources,
                                                              in the area of sensitive data like healthcare, federated
           which is a deterrent to the sustainability development goal
                                                              learning has a lot of benefits over traditional centralised
           of equitable health. To address this problem, we propose
                                                              ML methods. Federated learning allows the collaborative
           a federated learning approach for model development that
                                                              sharing of model training among decentralised data sources
           will eliminate data sharing and centralised computational
                                                              without compromising data privacy, which is a very important
           resource costs. Our evaluations show that the federated
                                                              issue as it involves data governance, computational resource
           approach outperforms the generic base model, highlighting
                                                              limitations, and models. Our method relies on the combined
           the advantages of collaborative learning in capturing domain
                                                              knowledge of many healthcare facilities thus, direct data
           specific knowledge and enhancing performance in oncology
                                                              sharing is not needed. Also, federated methods are essential
           related NLP tasks.  Our work is in line with the United
                                                              to eliminating biases in AI models towards developed
           Nations Sustainable Development Goals (SDGs) which are
                                                              demographics by addressing the two major barriers to model
           aimed at promoting equitable health and narrowing down
                                                              development: data exchange and computational facilities.
           the differences in access to advanced cancer treatment.
                                                              This study develops a method that unites transformer based
                                                              language models, domain adaptation, and federated learning
             Keywords - Pre-tuning, Domain Adaptation, Federated
                                                              to improve oncology care.  We introduce a language
             Learning, NER, Fine-tuning, BERT, SDG’s, Embedding
                                                              model specifically designed for the oncology domain, which
                                                              outperforms generic models in NLP tasks related to oncology.
                         1. INTRODUCTION
                                                              Our evaluation was mainly based on the NER task as
           Cancer is a global health issue affecting millions of people  a primary metric, which showed the model’s capacity to
           worldwide. The recent progress in artificial intelligence (AI)  recognise and extract significant entities in the oncology
           and machine learning (ML) has proved to be very effective  field.  Nevertheless, the scope of this oncology-specific
           for oncology care by means of data-driven insights and  foundational model is not limited to NER, it can be modified
           decision support systems [1]. Natural Language Processing  for other downstream tasks like relation extraction, text
           (NLP), particularly Named Entity Recognition (NER), is a  classification, and text generation, thus paving the way for
           very useful tool in oncology care. It identifies and extracts  further development in oncology text mining and analysis.
           the vital elements like cancer types, treatments, drugs, etc.  To overcome the issues related to data sharing and the lack
           from unstructured medical texts such as clinical notes and  of computational resources, we propose a federated learning
           pathology reports. NER assists oncologists in these tasks so  based model to perform collaborative model building without
           that they can efficiently access patients key information in a  compromising data privacy.
           very short time, which leads to better diagnosis, treatment  Our work is in line with the United Nations Sustainable
           planning, and overall patient management.          Development Goals (SDGs), particularly SDG 3 (Good
           The transformer based language models such as BERT have  Health and Well being), SDG 9 (Industry, Innovation, and
           proven to be very effective in different NLP tasks [2].  Infrastructure), and SDG 10 (Reduced Inequalities) as we
           However, the performance of these systems in specialised  promote balanced health and limit the disparity in the
           fields like oncology is not optimal because of the domain  availability of advanced oncology care.  The enhanced



            978-92-61-39091-4/CFP2268P @ITU 2024          – 151 –                                    Kaleidoscope
   190   191   192   193   194   195   196   197   198   199   200