Page 169 - Kaleidoscope Academic Conference Proceedings 2024
P. 169
Innovation and Digital Transformation for a Sustainable World
acquired from solving one task is leveraged to enhance
performance on another related task. In practical terms, this
often entails utilizing a pre-trained model, which has been
trained on a sizable dataset for a specific task, such as image
classification. The information ingrained in the pre-trained
model’s parameters (weights) is then transferred to a new
model configured for a different task. In this paper, transfer
Figure 2 – Sample images from DR class.
learning is employed to fine-tune two distinct CNN networks.
Since all these networks are pre-trained on the ImageNet
dataset, customization of the classification layer is necessary
to adapt them to our specific application. The original
output layer is substituted with a new one tailored for binary
classification. Consequently, a single output neuron with
a sigmoid activation function is employed. Subsequently,
the model is compiled with an appropriate loss function.
Following this, the model undergoes training, during which
Figure 3 – Sample images from Normal class. the weights are fine-tuned according to our dataset and task.
Finally, average accuracy and loss values are computed
to assess the model’s performance. The hyperparameters
Hospital, situated in a rural setting. It is very popular dataset
employed during training are detailed in Table 1. Following
among researchers for DR detection. Images, 2930 were
CNN models are used in this research work.
allocated to the training set, while 366 were designated for
both the validation and test sets. Within the training set, 1434
images belong to the false (no DR) class, while 1496 images
3.3.1 VGG16
belong to the true (DR) class. Similarly, the validation set
comprises 172 false and 194 true class images. In the test
VGG16 stands as a prominent CNN architecture [13],
set, there are 199 false and 167 true class images. Sample
pioneered by the Visual Geometry Group (VGG) at the
images representing both class labels are illustrated in Fig. 2
University of Oxford. Its popularity for transfer learning
and Fig. 3. All images in the dataset possess a resolution of
stems from its profound depth and robust feature extraction
3216x2136.
capabilities. Comprising 16 weight layers, VGG16 includes
13 convolutional layers, 3 fully connected layers, and a
3.2 Pre-processing
softmax layer for classification. The convolutional layers are
organized into blocks, with each block housing multiple layers
To accommodate the requirements of CNN networks, all
dedicated to feature extraction, followed by a max-pooling
images are resized to a standard size of 224x224x3.
layer for spatial dimension reduction and down-sampling.
Additionally, a preprocessing step involves subtracting the
Notably, all convolutional layers share the same filter size
mean across all images from each image in the training
and stride. The fully connected layers undertake the task of
dataset. This normalization centers the data around zero,
learning high-level features and generating predictions based
which not only aids in training convergence but also enhances
on the extracted features. Leveraging pre-trained weights
performance. Furthermore, the color space is converted to the
from extensive datasets like ImageNet, this study fine-tunes
BGR color space, and all pixel values are normalized to range
VGG16 on the APTOS dataset for binary classification,
either between [0, 1] or [-1, 1]. This normalization enhances
rendering it a potent choice for DR image classification.
the stability and convergence of the training process.
3.3.2 ResNet18
3.3 Convolutional neural networks (CNN)
ResNet18 emerges as another widely adopted CNN
Deep learning models designed for image processing leverage architecture for transfer learning, renowned for its depth
CNNs to autonomously acquire hierarchical representations and incorporation of skip connections [14]. These
from raw pixel data. These models are usually composed skip connections effectively address the vanishing gradient
of multiple layers, encompassing convolutional, pooling, problem. Comprising 18 layers, ResNet18 demonstrates
and fully connected layers, enabling them to discern robust performance in image classification tasks. Central
and comprehend intricate patterns and features within to its architecture are residual connections, which enable the
images. Pre-trained CNN architectures such as VGG, learning of residual functions instead of directly mapping
ResNet, and AlexNet serve as popular frameworks, which inputs to outputs. ResNet18 architecture features a series
can be fine-tuned to suit specific tasks. These models of convolutional layers and residual blocks, with each
have demonstrated remarkable performance across various block housing two convolutional layers, alongside batch
image-related tasks, spanning classification, object detection, normalization and ReLU activation functions. These residual
and segmentation. blocks facilitate the learning of residual representations,
Transfer learning is a strategic method wherein knowledge which are then merged with the original input via shortcut
– 125 –