Page 200 - Kaleidoscope Academic Conference Proceedings 2024
P. 200

2024 ITU Kaleidoscope Academic Conference




                                                              recognising oncology related entities, which led to improved
           Model    Ep.   Prec.    Rec.     F1       Acc.
                                                              semantic understanding and accuracy.  Even though the
           BioBERT   1  0.579 761 0.561 087 0.570 271 0.562 193  strategies mentioned in this paper are designed for encoder
                     2  0.613 650 0.602 766 0.608 159 0.609 599  based transformers, they can also be applied to decoder
                     3  0.621 776 0.606 939 0.614 268 0.619 746  based transformer architectures. This lays the foundation
                                                              for further exploration and application in various natural
           Finetuned  1  0.606 918 0.584 869 0.595 690 0.592 137
                                                              language processing tasks, broadening the impact and utility
           BERT      2  0.623 128 0.626 762 0.624 940 0.629 791
                                                              of our approach. Additionally, our research aligns with and
                     3  0.625 666 0.624 729 0.625 197 0.637 718
                                                              supports several United Nations Sustainable Development
           Federated  1  0.611 512 0.580 348 0.595 523 0.590 274  Goals, specifically SDG 3 (Good Health and Well-being),
                                                              SDG 9 (Industry, Innovation, and Infrastructure), and SDG 10
           BERT      2  0.616 274 0.621 599 0.618 925 0.626 179
                                                              (Reduced Inequalities). By developing advanced AI models
                     3  0.623 578 0.623 311 0.623 445 0.634 450
                                                              for cancer care, applying cutting-edge technologies, and
                                                              employing federated learning to create AI models using data
           Table 1 – Performance metrics of BERT models over three  from different regions, our study contributes to improving
           epochs.                                            global oncology care and healthcare accessibility while
                                                              promoting a more equitable and sustainable future.
           identify entities while minimizing false positives and false
           negatives.
                                                                               REFERENCES
           The training outcomes indicate a bit of an improvement in all
           the models, with our domain adapted models being the best in  [1] Andre  Esteva,  Alexandre  Robicquet,  Bharath
           precision, recall, F1-score, and accuracy. Nevertheless, the  Ramsundar, Volodymyr Kuleshov, Mark DePristo,
           real influence of domain adaptation and federated learning is  Katherine Chou, Claire Cui, Greg Corrado, Sebastian
           most visible when we assess the NER-annotated data with  Thrun, and Jeff Dean. A guide to deep learning in
           a focus on cancer related entities.  Table 2 presents the  healthcare. Nature medicine, 25(1):24–29, 2019.
           frequency of recognition for specific cancer-related named
           entities when comparing the performance of BioBERT and  [2] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
           our domain adapted models on the clinical reports dataset.  Kristina Toutanova.  Bert:  Pre-training of deep
           This highlights the improvements achieved by our domain  bidirectional transformers for language understanding.
           adapted models in recognising critical cancer related entities  arXiv preprint arXiv:1810.04805, 2018.
           compared to the generic BioBERT model. For instance,
                                                               [3] Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon
           our models demonstrate a notable increase in the recognition
                                                                  Kim, Sunkyu Kim, Chan Ho So, and Jaewoo
           frequency of cancer treatment, prosthetic, and drug regimen
                                                                  Kang.  Biobert: a pre-trained biomedical language
           entities. These improvements are due to the domain specific
                                                                  representation model for biomedical text mining.
           fine tuning of our models on oncology related datasets,
                                                                  Bioinformatics, 36(4):1234–1240, 2020.
           thus, they can better be used for capturing the nuances and
           terminology that are particular to the oncology field.  [4] Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin
                                                                  Tong.  Federated machine learning:  Concept and
             Tag              BioBERT Fine-tuned Federated
                                                                  applications. ACM Transactions on Intelligent Systems
                                         BERT     BERT            and Technology (TIST), 10(2):1–19, 2019.
             cancer_treatment    7        269      307
                                                               [5] Emily Alsentzer, John R Murphy, Willie Boag,
             prosthetic         504       601      655
                                                                  Wei-Hung Weng, Di Jin, Tristan Naumann, and
             drug_regimen       1885     2095      2239           Matthew McDermott. Publicly available clinical bert
             pathological_findings  118   254      356            embeddings. arXiv preprint arXiv:1904.03323, 2019.
                                                               [6] Alistair EW Johnson, Tom J Pollard, Lu Shen, Li-wei H
                Table 2 – Number of tagged instances identified.
                                                                  Lehman, Mengling Feng, Mohammad Ghassemi,
                                                                  Benjamin Moody, Peter Szolovits, Leo Anthony Celi,
                           5.  CONCLUSION                         and Roger G Mark. Mimic-iii, a freely accessible critical
                                                                  care database. Scientific data, 3(1):1–9, 2016.
           This study shows the possibility of using federated learning
                                                               [7] Yifan Peng, Shankai Yan, and Zhiyong Lu. Transfer
           and LLM’s domain adaptation techniques to improve
                                                                  learning in biomedical natural language processing:
           cancer treatment.  We addressed the problems of data
                                                                  an evaluation of bert and elmo on ten benchmarking
           privacy, governance, and resource limitations using the
                                                                  datasets. arXiv preprint arXiv:1906.05474, 2019.
           transformer based network of BioBERT, pre-training it
           on oncology specific datasets, and introducing federated  [8] Yu Gu, Robert Tinn, Hao Cheng, Michael Lucas, Naoto
           learning techniques. Our domain adapted models were better  Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng
           than the generic ones in understanding oncology texts and  Gao, and Hoifung Poon. Domain-specific language



                                                          – 156 –
   195   196   197   198   199   200   201   202   203   204   205