Page 30 - Shaping ethics, regulation and standardization in AI for health

P. 30

Shaping ethics, regulation and standardization in AI for health

– documentation of interoperability and security
– user testing and user engagement reports.

Algorithmic validation (used here to refer to the evaluation of the AI system 'in silico' requires:

– a description of the data used for development
– internal and external testing, and of the model type used
– reporting of performance metrics in the internal and independent external testing data
– benchmarking of system performance against standard of care, and where relevant, other
AI systems.

Clinical validation (for the purposes of this Technical Report this is the evaluation of the AI system
through interventional or clinical studies) requires:
– a clinical study with a relevant comparator and a meaningful endpoint, and the steps taken
to minimise bias.
Finally, deployment and ongoing evaluation requires:

– monitoring of performance and impact (including safety and effectiveness) to understand
the anticipated and unanticipated outcomes
– algorithmic audits [5] to understand how adverse events or algorithmic errors occur.

Annex A of the deliverable summarizes the key findings as a checklist to facilitate the application
of this Deliverable.

A�4�2 DEL 10: AI4H use cases: Topic description documents

Summary: This document provides an overview of the ITU/WHO Focus Group on AI for Health
(FG-AI4H) "AI4H use cases: Topic Description Documents". Each use case is represented by
a topic group that is dedicated to a specific health topic in the context of AI. The topic group
proposes a procedure to benchmark AI models developed for a special task within this health
topic. All members of a topic group create a TDD that contains information about the structure,
operations, features, and considerations of the specific health topic. This document serves as
an introduction to the topic groups and their topic description documents.

A�4�3 DEL 10�2: FG-AI4H Topic Description Document for the Topic Group
on AI-based dermatology (TG-Derma)

Summary: This TDD specifies a standardized benchmarking for AI-based dermatology. It covers
all scientific, technical, and administrative aspects relevant for setting up this benchmarking.

The Group defines specific AI tasks such as skin disease classification, lesion segmentation,
disease severity assessment, and treatment recommendation, detailing target conditions,
datasets, and evaluation metrics to guide model development and evaluation. They also
discuss gold standards, including expert consensus and standardized image datasets, to
benchmark AI performance, ensuring consistent, reproducible, and collaborative advancements
in dermatological AI applications. The group's efforts in defining AI tasks and discussing gold
standards provided a clear roadmap for developing and evaluating AI models that address critical
challenges in dermatological diagnosis and patient care. These definitions are continuously
refined to stay aligned with the latest advancements in AI for dermatology.

25 26 27 28 29 30 31 32 33 34 35