Page 102 - AI for Good Innovate for Impact
P. 102

AI for Good Innovate for Impact



                      Technological Approach

                      The proposed approach includes building a pipeline consisting of data acquisition, data pre-
                      processing, data exploration, feature selection, model building, and data visualization. These
                      pipelines are seamlessly extensible.

                      During the pre-processing stage, the WSI, along with the ground truth annotation indicating
                      the locations of regions, is taken by the patch-based framework. In this approach, small positive
                      and negative patches are extracted from the set of training WSIs. Local mini-patches (tiles)
                      around 256 × 256 are sampled from the large WSI around the annotated regions. Tiles are
                      classified according to their biological feature, such as necrosis, mitosis, etc., relative to the
                      annotated polygons of the WSI.

                      The retrieved tile is turned into a 256 × 256 ×3 Numpy array. Tiles with predominantly
                      whitespace and small artifacts have been removed.
                      OpenSlide manages the properties and characteristics of distinct vendor formats at run-time
                      between the OpenSlide API and a particular vendor's data format.

                      We use a thresholding process to segregate the tissue from the background whitespaces,
                      which are present in the whole slide images. Since white tiles are known artifacts that do not
                      have any useful features, those tiles can be eliminated.

                      The threshold that we have set is 5% of tissue area based on the literature survey. If a tile has
                      less than 5% tissue by area, implying 95% or more whitespace, then that tile is eliminated.
                      Handcrafted features, relating to colour distribution, textures, and morphology, are extracted
                      using Gray-Level Cooccurrence Matrix (GLCM), histogram, and region-based feature measures.

                      The corresponding features are fused together in different combinations, and a number of
                      machine learning and boosting algorithms are applied to get the desired model. The effect of
                      stain normalization, augmentation, and Principal Component Analysis (PCA) are also studied.
                      The following selected features were utilized for building the ML and Ensemble pipeline:
                      Histogram, GLCM Features, Region Features like eccentricity, maximum area, extent, and
                      perimeter. Grid search CV, along with 5-fold cross-validation, was used for tuning the
                      hyperparameters.

                      A combination of qualitative and quantitative statistical measures is used to compare the
                      classification performance of a number of these algorithms to determine the best models that
                      can augment the clinical decision-making process of the pathologists. The performance of
                      the algorithms across a set of whole slide images, as well as the performance across tiles with
                      varying amounts of necrosis regions, is also studied.

                      Adaboost has given optimum performance among the Boosting algorithms. Logistic Regression
                      has given optimum performance among the Classical Algorithms.

                      Partners

                      Neuropathology, a relatively niche specialization of pathology, involves analysis of biopsy tissues
                      of the brain, spine and nerves. Working with the Neuropathology Lab at National Institute of
                      Mental Health and Neurosciences (NIMHANS), Bangalore, we explore the use of computational





                  66
   97   98   99   100   101   102   103   104   105   106   107