Page 154 - Kaleidoscope Academic Conference Proceedings 2024

P. 154

2024 ITU Kaleidoscope Academic Conference
Step 4: Label Encoding B. EMNIST Dataset in Conjunction with OpenCV
Encode character labels into a format compatible with CNN The integration of the Extended Modified National Institute of
and Sequential models. Convert character labels into one-hot Standards and Technology (EMNIST) dataset with the OpenCV
encoded vectors, representing each character as a binary vector library emerges as a formidable synergy in Optical Character
based on its position in the character set. Recognition (OCR) research. EMNIST, an extension of the NIST
Special Database 19, provides a diverse collection of handwritten
and machine-printed characters, facilitating comprehensive inves-
Step 5: Feature Extraction with CNN
tigations into handwritten character recognition. OpenCV, on the
Utilize the CNN for feature extraction from pre-processed im- other hand, is a powerful computer vision library, known for its
ages. Convolutional layers learn hierarchical features crucial for versatility in image processing tasks. The seamless amalgamation
recognizing handwritten characters. The convolutional operation of EMNIST and OpenCV in OCR research enables researchers
is expressed as: to harness sophisticated image processing and machine learn-
ing techniques for character recognition tasks. By leveraging
OpenCV’s functionalities alongside the nuanced diversity of the
 
X
Output = σ  (Input × Filter i,j ) + Bias  (2) EMNIST dataset, researchers can develop and evaluate OCR
models that are robust across various handwriting styles and
i,j
linguistic intricacies. This combined approach not only enhances
Step 6: Sequential Modeling the accuracy of character recognition but also contributes to the
advancement of OCR systems in real-world applications.
Use a Sequential model, possibly incorporating Long Short-
Term Memory (LSTM) layers, for sequence modeling. This is
relevant for recognizing the sequential nature of handwritten char-
acters. LSTM equations involve calculations for input, forget, and
output gates, determining cell state and hidden state transitions.

Fig. 2. Pre-processing Handwritten Data for OCR
Fig. 3. EMNIST Dataset Test using OpenCV

Step 7: Data Splitting C. Building and Compiling CNN using Sparse Cross-Entropy
Split pre-processed data into training, validation, and test sets Building and compiling the Convolutional Neural Network
to assess model performance effectively. (CNN) for Alpha-Bit involves a meticulous process, particularly
when employing sparse cross-entropy as the loss function. The
architecture typically comprises convolutional layers for feature
V. IMPLEMENTATION FRAMEWORK
extraction, pooling layers for dimensionality reduction, and fully
The Implementation Framework section delves into the techni- connected layers for classification. The CNN is designed to
cal aspects of the Alpha-Bit research, providing a comprehensive process image inputs, extracting hierarchical features crucial for
overview of the tools and methodologies employed to bring the Optical Character Recognition (OCR). The sparse cross-entropy
vision of the application to life. It provides a detailed roadmap loss function is preferred when dealing with datasets with sparse
of the technical foundations that empower Alpha-Bit’s innovative labels, as is often the case in OCR tasks where characters are
approach to democratizing education through cutting-edge OCR sparsely distributed across classes. The sparse cross-entropy loss
technologies. function can be expressed as follows:
N C
1 X X
A. Convolutional Neural Network (CNN) L(y, ˆ) = − w j · y ij · log(ˆ ij ) (3)
y
y
N
Convolutional Neural Networks (CNNs) lie at the heart of i=1 j=1
Alpha-Bit, a groundbreaking Android application designed to rev- Here, N represents the number of samples, C is the number
olutionize the landscape of education through Optical Character of classes, y ij is an indicator variable (1 if the sample i belongs
y
Recognition (OCR). In the realm of Alpha-Bit, CNNs serve as to class j, 0 otherwise), ˆ ij is the predicted probability of sample
the backbone for extracting intricate features and patterns vital i belonging to class j, and w j is an optional weight for class j.
for recognizing and interpreting alphabets and numbers. As the The inclusion of this loss function in the compilation of the CNN
engine driving the system’s visual understanding capabilities, for Alpha-Bit ensures efficient training, especially when dealing
CNNs play a pivotal role in enhancing the accuracy and efficiency with the recognition of sparse characters in educational materials.
of OCR, contributing to a transformative learning experience. By In the compilation step, the model is configured with an opti-
harnessing the power of CNNs within Alpha-Bit, the application mizer, such as Adam or RMSprop, and the chosen loss function
transcends traditional OCR functionalities, offering a dynamic (sparse cross-entropy in this case). Hyperparameters like learning
platform that combines cutting-edge technology with guided in- rate and batch size are fine-tuned to optimize model performance.
struction and personalized progress reporting, ultimately working The CNN is then compiled, ready for training on the Alpha-Bit
towards the democratization of education on a global scale. dataset. This comprehensive approach to building and compiling
– 110 –

149 150 151 152 153 154 155 156 157 158 159