Page 91 - ITU KALEIDOSCOPE, ATLANTA 2019
P. 91
ICT for Health: Networks, standards and innovation
health application. The deliverables of the WGs are planned topics and the specific problems involved with a number of
to be a number of documents that cover topics including: AI for health tasks and data modalities. At present, the topic
• AI ethical considerations, groups address AI-based cardiovascular disease risk
• AI legal consideration, prediction, dermatology, histopathology, outbreak detection,
• AI software life-cycle, ophthalmology, radiotherapy, symptom assessment,
• reference data annotation specification, tuberculosis prognostics/diagnostics and several further
• training and test data specification, domains. In each topic group, different stakeholders,
• AI training process specification, including competing companies, with a common interest in
• AI test process specification, the topic are working together. “Calls for topic group
• AI test metric specification, and participation” are published on the website
(https://www.itu.int/go/fgai4h), introduce the respective
• AI post-market adaptation and surveillance topic group and invite participation. The creation of many
specification. other topic groups in response to the open “call for proposals:
use cases, benchmarking, and data” is expected. Selection
An overview of the technical output of the WGs is given in criteria include the prospect for a widespread and, ideally,
Figure 2. global impact, a clear concept described in sufficient detail,
and preliminary evidence for feasibility.
Every topic group defines its scope, the specific ML/AI tasks
and the evaluation procedures with corresponding test data
and metrics in a topic description document in full detail.
Statistical metrics for assessing the model performance are,
e.g. precision, specificity, F1 score and area under curve, but
can be multiple or combined metrics too [61]. In particular,
it should be assured that the (e.g. clinical) endpoints are
Figure 2 − Overview of the technical output of the WGs meaningful in practice. Further criteria should be considered,
e.g. robustness to noise and to other variations in the input
The WG data and AI solution assessment methods reviews data [62], or to manipulations [65]. Humans prefer
the topic description documents (see below), in collaboration transparent decision-making: Can the model adequately
with independent experts with substantial records of quantify the uncertainty [63] and plausibly explain the
accomplishment in the respective health topic, with decision [66, 67]? These criteria beyond mere performance
proficient knowledge in ML/AI, and with transversal should also be considered.
competences from areas such as ethics and statistics. During
a repeated review cycle, the working group and the experts The topic description document must capture a range of
check that the topic description documents are accurate, aspects related to the test data, because they determine
complete, sound, understandable and objective, and give largely if the evaluation procedure is appropriate and
according feedback for improvement to the respective topic meaningful. The procedure can return conclusive results if,
group and the entire focus group. The WG is in charge of and only if, the test data are realistic, i.e. close to the actual
providing a number of technical deliverables, given above. application, of representative coverage, and of traceable
provenance from different sources. Data acquisition must be
The working group data and AI solution handling takes transparently documented in full detail [cf. 68], including
charge for a range of tasks related to conducting the tests, annotation guidelines, for reproducibility, replicability, and
which requires bringing the test data and the to-be-tested AI scalability. All ethical and legal questions related to the
solutions together. Relevant aspects include, e.g. transfer acquisition, storage and processing of health data must be
agreements, secure data and solution transfer, data checks, taken into careful consideration. Bias must be controlled and
IT infrastructure, access rights, traceability, IT security, test documented clearly. The document shall specify quality and
implementation and report generation. quantity criteria for the test data, including corresponding
references. The annotation needs to be conducted by experts
The working group for regulatory considerations is involved with a defined level of expertise, with potentially several
in the entire process, with representatives of FDA (USA), independent annotations per sample (if applicable).
CMDE/ NMPA (China), CDSCO (India), EMA (Europe) Technical matters, e.g. data formats [cf. 69, 70] and data
and BfArM (Germany) so far. In close collaboration with the management [71], need to be specified. A reference model
WHO, the working group facilitates subsequent steps (e.g. can potentially be defined (e.g. “average human performance
AI testing process specification, clinical evaluation, for this task”, “best in class”). Limiting factors for data
certification etc.) towards deployment of the health AI availability should be referred to, such as finances or time.
solution in practice. The plan detailed in the topic description document must be
implemented in practice. The test data must be provided or
The topic groups, TGs, take charge of specific health acquired, and measures for quality assurance taken. The
domains with corresponding ML/AI tasks. They are evaluation routine must be implemented, and the code
providing the connection of the WGs with actual health published together with at least a few example data with
– 71 –