Page 32 - Shaping ethics, regulation and standardization in AI for health
P. 32

Shaping ethics, regulation and standardization in AI for health



                  rather than complex relationships between different datasets, highlighting the need for more
                  comprehensive data collection to improve accuracy and robustness.

                  The current version of the benchmarking system (V3) for AI-based malaria detection is a
                  standalone system that handles prediction, task definition, and metric calculation. It uses a
                  challenge-based implementation to evaluate different machine learning solutions. The input
                  data consists of 1 182 annotated images of thick blood smear slides in JPEG format. The
                  benchmarking process includes defining test data labels, using metrics like ROC accuracy
                  and F1 scores, and acquiring undisclosed test datasets from various health facilities. Data
                  sharing policies ensure legal compliance and patient privacy. Baseline acquisition involves
                  comparing AI model performance with existing alternatives, typically involving doctors, using
                  the same benchmarking data. The benchmarking platform is still under development and will
                  be publicized for further discussion of outcomes.


                  A�4�6  DEL 10�7: FG-AI4H Topic Description Document for the Topic Group
                          on maternal and child health (TG-MCH)

                  Summary: This topic description document (TDD) aims to specify a standardized benchmarking
                  for AI-based maternal and child health. It covers all scientific, technical, and administrative
                  aspects relevant for setting up this benchmarking.

                  The AI task aims to leverage AI to improve maternal and child health, particularly in low-resource
                  settings, by addressing high rates of mortality and morbidity. AI applications include predictions
                  during pregnancy, hospital warning systems, patient-centric health screenings, and post-natal
                  predictions. These tools can help close the expertise gap among frontline health workers,
                  enhance monitoring, and ensure accountability. However, the lack of consistent standardization
                  in AI applications for maternal and child health is a significant challenge, making standardized
                  benchmarking crucial for global health organizations to adopt these solutions effectively.

                  Existing benchmarking processes for AI systems in maternal and child health, focusing on
                  quality assessment were reviewed. It covers scientific publications, benchmarking frameworks,
                  scores, metrics, and clinical evaluation attempts. The goal is to gather insights from previous
                  benchmarking efforts to aid in implementing a benchmarking process for this topic group.
                  This includes summaries of relevant publications, internal benchmarking by AI developers, and
                  existing benchmarking frameworks, with an emphasis on using established platforms like the
                  FG-AI4H assessment platform for evaluating AI in health.

                  The first iteration of benchmarking for neonatal mortality prediction in low- and middle-income
                  countries is outlined. These countries face significant challenges such as limited prenatal care,
                  inadequate healthcare infrastructure, and a shortage of specialized professionals. Machine
                  learning algorithms can help predict and prevent neonatal mortality by analysing large datasets
                  to identify risk factors and patterns. Using data from the Global Network's Maternal Newborn
                  Health Registry, which includes information from 500 000 pregnancies across eight countries,
                  the study evaluated different training strategies for machine learning models. Preliminary results
                  suggest that using a general algorithm trained on data from all countries is preferable due to
                  the larger sample size and diversity.

                  The benchmarking process evaluated various algorithms to predict neonatal mortality risk,
                  identifying key predictors such as birth weight, gestational age, and maternal age. The study
                  highlighted the importance of WHO's recommended indicators and found that the general




                                                           22
   27   28   29   30   31   32   33   34   35   36   37