Page 168 - Kaleidoscope Academic Conference Proceedings 2024

P. 168

2024 ITU Kaleidoscope Academic Conference

necessitating improvements in model interpretability,
trustworthiness, and cost-effectiveness. Transformer model
of Yang et al. [2] with multiple instance learning (TMIL)
has segmented images into 224×224 patches for feature
extraction. This approach preserves valuable information
and leverages pre-trained weights without performance
loss. Experimental results show TMIL outperforms existing
methods, improving classification accuracy and reducing
inference time by 62% on the APTOS and Messidor-1
datasets. Lahmar et al. [3] conducted a comparative study of
seven pre-trained CNN models for the binary classification
of DR. Evaluating these models on various parameters using
DR datasets, they found that MobileNetV2 achieved the
highest accuracy, scoring 93.09% on the APTOS dataset.
Lahmar et al. [4] conducted an extensive study assessing
the performance of 28 hybrid deep learning architectures and
7 standalone deep learning models for binary classification
of DR. Their comprehensive evaluation aimed to identify
the most effective methods for distinguishing between the
presence and absence of DR, providing valuable insights into
automated DR detection.
Kassani et al. [5] introduced a novel feature extraction
technique for DR diagnosis using a customized Xception
architecture. Leveraging the deep layer accumulation
characteristic of Xception, this approach efficiently extracts
intricate features from retinal images. The extracted features
are subsequently fed into a multi-layer perceptron, offering
a robust classification method. Evaluation on the APTOS
dataset demonstrated an accuracy of 83.09%, highlighting its
promise for dependable DR detection in clinical settings. In
[6], a machine learning-based method is proposed for early
DR detection employing the Inception V3 model. Trained and
tested on the EyePACS and APTOS 2019 datasets, their model
attained an accuracy of 81.61% and an F1 score of 80.21%
on the APTOS 2019 dataset, demonstrating its effectiveness
in DR detection.
In [7], intermediate layers of the DenseNet-121 model are
utilized for feature extraction. In [8], a CNN-based model Figure 1 – Block diagram of the proposed method.
is introduced for detecting and categorizing DR. Utilizing
the APTOS dataset, the model attained an accuracy of
learning due to its optimal number of learning layers, which
97% and 93% with CNN and AlexNet models, respectively,
accelerates training speed. The schematic representation
following training and validation on distinct datasets. Farag
of our proposed method is illustrated in Fig. 1. Initially,
et al. [9] proposed a novel method utilizing DenseNet169
the dataset is loaded and divided into training, testing, and
to automatically assess the severity of DR. They leveraged
validation sets. The pre-trained VGG16 network is then
DenseNet169’s encoder to generate visual embeddings and
loaded and its input and classification layers are adapted to
integrated an attention module to improve discrimination
suit the task. Subsequently, appropriate hyper-parameters
capability. Dhir et al. [10] modified neural networks and
are chosen, and the network undergoes training. Following
evaluated their performance on the APTOS dataset. The
transfer learning, features are extracted from the fully
study evaluates five deep learning architectures using 1228
connected layer 7 (FC7). To optimize feature selection, a KW
fundus images from six datasets, including DR, Glaucoma,
test is applied, and machine learning classifiers are employed
and Cataract. EfficientNetB0 outperforms other models
for the classification task using the selected features. Detailed
significantly surpassing previous studies like Fast-RCNN and
explanations of each step are provided below:
InceptionResNet [11].
3.1 Dataset
3. METHODOLOGY
This study utilized the APTOS dataset [12] for binary
This study employs a straightforward Convolutional Neural classification purposes. This dataset comprises retinal images
Network (CNN) models, VGG16 and ResNet18 for transfer captured by a fundus camera operated by Aravind Eye
– 124 –

163 164 165 166 167 168 169 170 171 172 173