Page 102 - AI for Good Innovate for Impact
P. 102
AI for Good Innovate for Impact
Technological Approach
The proposed approach includes building a pipeline consisting of data acquisition, data pre-
processing, data exploration, feature selection, model building, and data visualization. These
pipelines are seamlessly extensible.
During the pre-processing stage, the WSI, along with the ground truth annotation indicating
the locations of regions, is taken by the patch-based framework. In this approach, small positive
and negative patches are extracted from the set of training WSIs. Local mini-patches (tiles)
around 256 × 256 are sampled from the large WSI around the annotated regions. Tiles are
classified according to their biological feature, such as necrosis, mitosis, etc., relative to the
annotated polygons of the WSI.
The retrieved tile is turned into a 256 × 256 ×3 Numpy array. Tiles with predominantly
whitespace and small artifacts have been removed.
OpenSlide manages the properties and characteristics of distinct vendor formats at run-time
between the OpenSlide API and a particular vendor's data format.
We use a thresholding process to segregate the tissue from the background whitespaces,
which are present in the whole slide images. Since white tiles are known artifacts that do not
have any useful features, those tiles can be eliminated.
The threshold that we have set is 5% of tissue area based on the literature survey. If a tile has
less than 5% tissue by area, implying 95% or more whitespace, then that tile is eliminated.
Handcrafted features, relating to colour distribution, textures, and morphology, are extracted
using Gray-Level Cooccurrence Matrix (GLCM), histogram, and region-based feature measures.
The corresponding features are fused together in different combinations, and a number of
machine learning and boosting algorithms are applied to get the desired model. The effect of
stain normalization, augmentation, and Principal Component Analysis (PCA) are also studied.
The following selected features were utilized for building the ML and Ensemble pipeline:
Histogram, GLCM Features, Region Features like eccentricity, maximum area, extent, and
perimeter. Grid search CV, along with 5-fold cross-validation, was used for tuning the
hyperparameters.
A combination of qualitative and quantitative statistical measures is used to compare the
classification performance of a number of these algorithms to determine the best models that
can augment the clinical decision-making process of the pathologists. The performance of
the algorithms across a set of whole slide images, as well as the performance across tiles with
varying amounts of necrosis regions, is also studied.
Adaboost has given optimum performance among the Boosting algorithms. Logistic Regression
has given optimum performance among the Classical Algorithms.
Partners
Neuropathology, a relatively niche specialization of pathology, involves analysis of biopsy tissues
of the brain, spine and nerves. Working with the Neuropathology Lab at National Institute of
Mental Health and Neurosciences (NIMHANS), Bangalore, we explore the use of computational
66