Page 154 - Kaleidoscope Academic Conference Proceedings 2024
P. 154

2024 ITU Kaleidoscope Academic Conference
       Step 4: Label Encoding                                 B. EMNIST Dataset in Conjunction with OpenCV
         Encode character labels into a format compatible with CNN  The integration of the Extended Modified National Institute of
       and Sequential models. Convert character labels into one-hot  Standards and Technology (EMNIST) dataset with the OpenCV
       encoded vectors, representing each character as a binary vector  library emerges as a formidable synergy in Optical Character
       based on its position in the character set.            Recognition (OCR) research. EMNIST, an extension of the NIST
                                                              Special Database 19, provides a diverse collection of handwritten
                                                              and machine-printed characters, facilitating comprehensive inves-
       Step 5: Feature Extraction with CNN
                                                              tigations into handwritten character recognition. OpenCV, on the
         Utilize the CNN for feature extraction from pre-processed im-  other hand, is a powerful computer vision library, known for its
       ages. Convolutional layers learn hierarchical features crucial for  versatility in image processing tasks. The seamless amalgamation
       recognizing handwritten characters. The convolutional operation  of EMNIST and OpenCV in OCR research enables researchers
       is expressed as:                                       to harness sophisticated image processing and machine learn-
                                                              ing techniques for character recognition tasks. By leveraging
                                                              OpenCV’s functionalities alongside the nuanced diversity of the
                                                
                          X
               Output = σ    (Input × Filter i,j ) + Bias   (2)  EMNIST dataset, researchers can develop and evaluate OCR
                                                              models that are robust across various handwriting styles and
                           i,j
                                                              linguistic intricacies. This combined approach not only enhances
       Step 6: Sequential Modeling                            the accuracy of character recognition but also contributes to the
                                                              advancement of OCR systems in real-world applications.
         Use a Sequential model, possibly incorporating Long Short-
       Term Memory (LSTM) layers, for sequence modeling. This is
       relevant for recognizing the sequential nature of handwritten char-
       acters. LSTM equations involve calculations for input, forget, and
       output gates, determining cell state and hidden state transitions.












                Fig. 2. Pre-processing Handwritten Data for OCR
                                                                         Fig. 3. EMNIST Dataset Test using OpenCV

       Step 7: Data Splitting                                 C. Building and Compiling CNN using Sparse Cross-Entropy
         Split pre-processed data into training, validation, and test sets  Building and compiling the Convolutional Neural Network
       to assess model performance effectively.               (CNN) for Alpha-Bit involves a meticulous process, particularly
                                                              when employing sparse cross-entropy as the loss function. The
                                                              architecture typically comprises convolutional layers for feature
                  V. IMPLEMENTATION FRAMEWORK
                                                              extraction, pooling layers for dimensionality reduction, and fully
         The Implementation Framework section delves into the techni-  connected layers for classification. The CNN is designed to
       cal aspects of the Alpha-Bit research, providing a comprehensive  process image inputs, extracting hierarchical features crucial for
       overview of the tools and methodologies employed to bring the  Optical Character Recognition (OCR). The sparse cross-entropy
       vision of the application to life. It provides a detailed roadmap  loss function is preferred when dealing with datasets with sparse
       of the technical foundations that empower Alpha-Bit’s innovative  labels, as is often the case in OCR tasks where characters are
       approach to democratizing education through cutting-edge OCR  sparsely distributed across classes. The sparse cross-entropy loss
       technologies.                                          function can be expressed as follows:
                                                                                     N  C
                                                                                  1  X X
       A. Convolutional Neural Network (CNN)                           L(y, ˆ) = −         w j · y ij · log(ˆ ij )  (3)
                                                                            y
                                                                                                     y
                                                                                  N
         Convolutional Neural Networks (CNNs) lie at the heart of                   i=1 j=1
       Alpha-Bit, a groundbreaking Android application designed to rev-  Here, N represents the number of samples, C is the number
       olutionize the landscape of education through Optical Character  of classes, y ij is an indicator variable (1 if the sample i belongs
                                                                                 y
       Recognition (OCR). In the realm of Alpha-Bit, CNNs serve as  to class j, 0 otherwise), ˆ ij is the predicted probability of sample
       the backbone for extracting intricate features and patterns vital  i belonging to class j, and w j is an optional weight for class j.
       for recognizing and interpreting alphabets and numbers. As the  The inclusion of this loss function in the compilation of the CNN
       engine driving the system’s visual understanding capabilities,  for Alpha-Bit ensures efficient training, especially when dealing
       CNNs play a pivotal role in enhancing the accuracy and efficiency  with the recognition of sparse characters in educational materials.
       of OCR, contributing to a transformative learning experience. By  In the compilation step, the model is configured with an opti-
       harnessing the power of CNNs within Alpha-Bit, the application  mizer, such as Adam or RMSprop, and the chosen loss function
       transcends traditional OCR functionalities, offering a dynamic  (sparse cross-entropy in this case). Hyperparameters like learning
       platform that combines cutting-edge technology with guided in-  rate and batch size are fine-tuned to optimize model performance.
       struction and personalized progress reporting, ultimately working  The CNN is then compiled, ready for training on the Alpha-Bit
       towards the democratization of education on a global scale.  dataset. This comprehensive approach to building and compiling
                                                          – 110 –
   149   150   151   152   153   154   155   156   157   158   159