Page 116 - AI for Good Innovate for Impact
P. 116
AI for Good Innovate for Impact
Data Type, Structure, and Format:
• Primary Data: The core dataset consists of millions of digital pathology slide images
(Whole Slide Images - WSIs). Initially, these images are largely unlabeled.
• Supporting Data: Tens of thousands of image-text captions, meticulously curated by
medical experts (pathologists), linking specific visual features or regions within the
pathology images to descriptive text. Thousands of image-based Chain-of-Thought (CoT)
data instances. This structured data pairs pathology images with step-by-step reasoning
text, designed to train the model in diagnostic thought processes.
• Structure & Format: The system leverages self-supervised learning algorithms on the vast
corpus of pathology images to extract robust visual representations without requiring
extensive initial manual annotation. The curated captions and CoT data provide structured,
multimodal input crucial for aligning visual and textual features and enabling complex
reasoning capabilities. Data processing, including WSI tiling/patching and anonymization,
was managed by the ModelEngine which is a one-stop AI toolchain provided by DCS
AI Solution, implying handling of standard WSI formats and subsequent generation of
structured data formats suitable for model training (e.g., image patches linked to text via
identifiers in formats like JSON or CSV).
Image Labelling System: A semi-automated image labelling system was employed, facilitated
by the Model Engine. This system assists in data engineering tasks, including efficient image
annotation. While self-supervised learning reduced the dependency on exhaustive pixel-level
labelling for initial feature extraction, expert-driven annotation was critical for creating the high-
quality image-text captions and the image-based CoT data used for fine-tuning and aligning
modalities.
Knowledge Transfer and Update Format: The system features a modular multimodal
architecture comprising distinct components: a Visual Projector, an Image-Language Projector,
and a Deep Reasoning Language Model module. Each module has well-defined input and
output data format standards, allowing them to be trained independently (decoupled training).
This modularity provides a clear format for knowledge transfer and updates. When new
research findings, treatment mechanisms, or diagnostic criteria emerge, the knowledge base
can be updated flexibly by two ways: 1) Incremental Training: Retraining specific modules
with supplementary data incorporating the new knowledge (e.g., updating the language
model with new medical texts, or the image-language projector with new image-caption
pairs reflecting new findings). 2) Plugin Replacement: Replacing existing modules with newer
versions incorporating updated algorithms or knowledge.
Explanation of Prognosis/Diagnosis to Experts: The integration of Chain-of-Thought (CoT)
technology is key to providing explainability. The model explicitly generates and presents
the reasoning steps leading to its diagnostic conclusions, mirroring a pathologist's thought
process. This transparent reasoning pathway was presented to the expert pathologist.
Furthermore, the system supports multi-turn, in-depth conversational interaction, allowing
pathologists to query the model, ask follow-up questions about specific reasoning steps or
image features, thereby significantly enhancing the interpretability and trustworthiness of the
AI's suggestions. (Note: The source primarily mentions diagnosis and reasoning; explanation
related to prognosis would follow similar principles if trained on relevant data).
Feedback and Learnings from Ruijin Hospital Deployment: The RuiPath model is currently
deployed for pilot testing at Ruijin Hospital across 11 subspecialties, including breast, prostate,
and thyroid pathology. Initial feedback based on performance metrics indicates promising
results, with the system achieving over 90% accuracy in common tasks such as cancer subtype
80