Page 200 - Kaleidoscope Academic Conference Proceedings 2024
P. 200
2024 ITU Kaleidoscope Academic Conference
recognising oncology related entities, which led to improved
Model Ep. Prec. Rec. F1 Acc.
semantic understanding and accuracy. Even though the
BioBERT 1 0.579 761 0.561 087 0.570 271 0.562 193 strategies mentioned in this paper are designed for encoder
2 0.613 650 0.602 766 0.608 159 0.609 599 based transformers, they can also be applied to decoder
3 0.621 776 0.606 939 0.614 268 0.619 746 based transformer architectures. This lays the foundation
for further exploration and application in various natural
Finetuned 1 0.606 918 0.584 869 0.595 690 0.592 137
language processing tasks, broadening the impact and utility
BERT 2 0.623 128 0.626 762 0.624 940 0.629 791
of our approach. Additionally, our research aligns with and
3 0.625 666 0.624 729 0.625 197 0.637 718
supports several United Nations Sustainable Development
Federated 1 0.611 512 0.580 348 0.595 523 0.590 274 Goals, specifically SDG 3 (Good Health and Well-being),
SDG 9 (Industry, Innovation, and Infrastructure), and SDG 10
BERT 2 0.616 274 0.621 599 0.618 925 0.626 179
(Reduced Inequalities). By developing advanced AI models
3 0.623 578 0.623 311 0.623 445 0.634 450
for cancer care, applying cutting-edge technologies, and
employing federated learning to create AI models using data
Table 1 – Performance metrics of BERT models over three from different regions, our study contributes to improving
epochs. global oncology care and healthcare accessibility while
promoting a more equitable and sustainable future.
identify entities while minimizing false positives and false
negatives.
REFERENCES
The training outcomes indicate a bit of an improvement in all
the models, with our domain adapted models being the best in [1] Andre Esteva, Alexandre Robicquet, Bharath
precision, recall, F1-score, and accuracy. Nevertheless, the Ramsundar, Volodymyr Kuleshov, Mark DePristo,
real influence of domain adaptation and federated learning is Katherine Chou, Claire Cui, Greg Corrado, Sebastian
most visible when we assess the NER-annotated data with Thrun, and Jeff Dean. A guide to deep learning in
a focus on cancer related entities. Table 2 presents the healthcare. Nature medicine, 25(1):24–29, 2019.
frequency of recognition for specific cancer-related named
entities when comparing the performance of BioBERT and [2] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
our domain adapted models on the clinical reports dataset. Kristina Toutanova. Bert: Pre-training of deep
This highlights the improvements achieved by our domain bidirectional transformers for language understanding.
adapted models in recognising critical cancer related entities arXiv preprint arXiv:1810.04805, 2018.
compared to the generic BioBERT model. For instance,
[3] Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon
our models demonstrate a notable increase in the recognition
Kim, Sunkyu Kim, Chan Ho So, and Jaewoo
frequency of cancer treatment, prosthetic, and drug regimen
Kang. Biobert: a pre-trained biomedical language
entities. These improvements are due to the domain specific
representation model for biomedical text mining.
fine tuning of our models on oncology related datasets,
Bioinformatics, 36(4):1234–1240, 2020.
thus, they can better be used for capturing the nuances and
terminology that are particular to the oncology field. [4] Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin
Tong. Federated machine learning: Concept and
Tag BioBERT Fine-tuned Federated
applications. ACM Transactions on Intelligent Systems
BERT BERT and Technology (TIST), 10(2):1–19, 2019.
cancer_treatment 7 269 307
[5] Emily Alsentzer, John R Murphy, Willie Boag,
prosthetic 504 601 655
Wei-Hung Weng, Di Jin, Tristan Naumann, and
drug_regimen 1885 2095 2239 Matthew McDermott. Publicly available clinical bert
pathological_findings 118 254 356 embeddings. arXiv preprint arXiv:1904.03323, 2019.
[6] Alistair EW Johnson, Tom J Pollard, Lu Shen, Li-wei H
Table 2 – Number of tagged instances identified.
Lehman, Mengling Feng, Mohammad Ghassemi,
Benjamin Moody, Peter Szolovits, Leo Anthony Celi,
5. CONCLUSION and Roger G Mark. Mimic-iii, a freely accessible critical
care database. Scientific data, 3(1):1–9, 2016.
This study shows the possibility of using federated learning
[7] Yifan Peng, Shankai Yan, and Zhiyong Lu. Transfer
and LLM’s domain adaptation techniques to improve
learning in biomedical natural language processing:
cancer treatment. We addressed the problems of data
an evaluation of bert and elmo on ten benchmarking
privacy, governance, and resource limitations using the
datasets. arXiv preprint arXiv:1906.05474, 2019.
transformer based network of BioBERT, pre-training it
on oncology specific datasets, and introducing federated [8] Yu Gu, Robert Tinn, Hao Cheng, Michael Lucas, Naoto
learning techniques. Our domain adapted models were better Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng
than the generic ones in understanding oncology texts and Gao, and Hoifung Poon. Domain-specific language
– 156 –