Page 32 - Shaping ethics, regulation and standardization in AI for health

P. 32

Shaping ethics, regulation and standardization in AI for health

rather than complex relationships between different datasets, highlighting the need for more
comprehensive data collection to improve accuracy and robustness.

The current version of the benchmarking system (V3) for AI-based malaria detection is a
standalone system that handles prediction, task definition, and metric calculation. It uses a
challenge-based implementation to evaluate different machine learning solutions. The input
data consists of 1 182 annotated images of thick blood smear slides in JPEG format. The
benchmarking process includes defining test data labels, using metrics like ROC accuracy
and F1 scores, and acquiring undisclosed test datasets from various health facilities. Data
sharing policies ensure legal compliance and patient privacy. Baseline acquisition involves
comparing AI model performance with existing alternatives, typically involving doctors, using
the same benchmarking data. The benchmarking platform is still under development and will
be publicized for further discussion of outcomes.

A�4�6 DEL 10�7: FG-AI4H Topic Description Document for the Topic Group
on maternal and child health (TG-MCH)

Summary: This topic description document (TDD) aims to specify a standardized benchmarking
for AI-based maternal and child health. It covers all scientific, technical, and administrative
aspects relevant for setting up this benchmarking.

The AI task aims to leverage AI to improve maternal and child health, particularly in low-resource
settings, by addressing high rates of mortality and morbidity. AI applications include predictions
during pregnancy, hospital warning systems, patient-centric health screenings, and post-natal
predictions. These tools can help close the expertise gap among frontline health workers,
enhance monitoring, and ensure accountability. However, the lack of consistent standardization
in AI applications for maternal and child health is a significant challenge, making standardized
benchmarking crucial for global health organizations to adopt these solutions effectively.

Existing benchmarking processes for AI systems in maternal and child health, focusing on
quality assessment were reviewed. It covers scientific publications, benchmarking frameworks,
scores, metrics, and clinical evaluation attempts. The goal is to gather insights from previous
benchmarking efforts to aid in implementing a benchmarking process for this topic group.
This includes summaries of relevant publications, internal benchmarking by AI developers, and
existing benchmarking frameworks, with an emphasis on using established platforms like the
FG-AI4H assessment platform for evaluating AI in health.

The first iteration of benchmarking for neonatal mortality prediction in low- and middle-income
countries is outlined. These countries face significant challenges such as limited prenatal care,
inadequate healthcare infrastructure, and a shortage of specialized professionals. Machine
learning algorithms can help predict and prevent neonatal mortality by analysing large datasets
to identify risk factors and patterns. Using data from the Global Network's Maternal Newborn
Health Registry, which includes information from 500 000 pregnancies across eight countries,
the study evaluated different training strategies for machine learning models. Preliminary results
suggest that using a general algorithm trained on data from all countries is preferable due to
the larger sample size and diversity.

The benchmarking process evaluated various algorithms to predict neonatal mortality risk,
identifying key predictors such as birth weight, gestational age, and maternal age. The study
highlighted the importance of WHO's recommended indicators and found that the general

27 28 29 30 31 32 33 34 35 36 37