Page 89 - ITU KALEIDOSCOPE, ATLANTA 2019
P. 89

ICT for Health: Networks, standards and innovation




           Deutsches Institut für Normung (DIN) began drafting an “AI   Typically, AI serves as a multivariable prediction model that
           roadmap” in May 2019 “to create a framework for action for   maps  multidimensional  input  variables  to  one  or
           standardization”  [42].  DIN  has  also  founded  an   multidimensional output  variables,  e.g.  pictures  to disease
           interdisciplinary AI Working Committee [43] and is working   classification codes. Accordingly, the TRIPOD statement for
           on two DIN SPECs related to AI [44, 45].           the  “transparent  reporting  of  a  multivariable  prediction
                                                              model for individual prognosis or diagnosis” can serve as a
           Large companies lead the field in the area of AI and have   landmark for AI methods too. These guidelines have been
           started  joint  activities  on  safe  AI,  which  potentially  can   published  by  the  EQUATOR  Network,  an  organization
           establish  de-facto  standards  fast.  The  “Partnership  on   aiming  to  enhance  the  quality  and  transparency  of  health
           Artificial Intelligence to Benefit People and Society” is led   research [49, 50, 51]. Cf. [52] for a discussion about how the
           by representatives from large technology firms and several   TRIPOD statement relates to AI.
           other member organizations, also from academia and civil
           society. The first goal of this initiative is “to develop and   ML/AI models are implemented as pieces of software and
           share best-practice methods and approaches in the research,   hence belong to digital technologies in almost all cases (in
           development, testing, and fielding of AI technologies”. This   principle,  they  can  be  analogue  hardware,  too  [53]).  The
           includes  addressing  “the  trustworthiness,  reliability,   International Medical Device Regulators Forum has outlined
           containment, safety, and robustness of the technology”. They   principles for the clinical evaluation of software as a medical
           are particularly interested in “safety-critical application areas”  device in a draft from 2017 [54]. Three main topics structure
           and mention healthcare as an example [46].         this clinical evaluation process: (a) Assuring that there is a
                                                              “valid clinical association” between the software output and
           The “OpenAI” research center, which is well known in the   the “targeted clinical condition”. (b) Correct processing of
           ML/AI research community and backed by large investors,   the  “input  data  to  generate  accurate,  reliable,  and  precise
           has  recently  published  a  policy  paper  on  “the  role  of   output data”. (c) Achieving the “intended purpose in your
           cooperation  in  responsible  AI  development”,  “across   target population in the context of clinical care” using the
           organizational  and  national  borders”,  discussing  “joint   software  output  data.  The  English  National  Institute  for
           research  into  the  formal  verification  of  AI  systems’   Health  and  Care  Excellence  (NICE)  has  published  an
           capabilities  and  other  aspects  of  AI  safety”.  In  particular,   “evidence  standards  framework  for  digital  health
           they mention “various applied ‘AI for good’ projects whose   technologies”  in  March  2019  [55].  This  document
           results  might  have  wide  ranging  and  largely  positive   “describes standards for the evidence (…) of effectiveness
           applications (e.g. in domains like [...] health); coordinating   relevant to the intended use(s) of the technology”. Moreover,
           on  the  use  of  particular  benchmarks;  joint  creation  and   the  document  states  that  the  framework  is  applicable  to
           sharing of datasets that aid in safety research”. Moreover,   digital  health  technologies  “that  incorporate  artificial
           they raise the question of the role of “standardization bodies   intelligence using fixed algorithms”, excluding adaptive AI
           in resolving collective action problems between companies”,   algorithms.
           in  particular  internationally  [47].  OpenAI  claims,  “AI
           companies  can  work  to  develop  industry  norms  and   4.  ML/AI PERFORMANCE EVALUATION
           standards  that  ensure  systems  are  developed  and  released
           only if they are safe, and can agree to invest resources in   The ML/AI models are expected to return meaningful results
           safety during development and meet appropriate standards   that  are  accurate,  plausible  and  reliable,  when  processing
           prior  to  release”.  They  “anticipate  that  identifying  similar   completely novel data points that the model has never seen
           mechanisms to improve cooperation on AI safety between   before, during the actual usage in the “real world”. Out-of-
           states  and  with  other  non-industry  actors  will  be  of   sample tests make it possible to assess this capability to some
           increasing importance in the years to come” [48].   degree, if the tests are conducted appropriately. These tests
                                                              can be largely conducted in silico, at least as a first step,
                  3.  VALIDATING DIGITAL HEALTH               without  posing  the  potential  hazards  of  clinical  trials,  by
                             TECHNOLOGIES                     confronting the model with previously recorded test samples,
                                                              and by comparing the model output with the “ground truth”
           Previous  work  can  provide  orientation  for  future   for the respective task. This characteristic allows conducting
           international standards for the validation of novel ML/AI-  systematic  tests  at  large  scale  (e.g. using  databases  with
           based health technologies. Physicians, regulators, scientists   thousands of MRT images), replicable and fast (e.g. in the
           and engineers have long-ranging experience in dealing with   case of software updates, or adaptive algorithms).
           complex safety-critical health interventions and technologies
           that require careful validation checks prior to usage. These   The machine learning community evaluates the performance
           technologies  include,  for  instance,  clinical  interventions,   of  ML/AI  models  usually  as  follows:  First,  the  model  is
           surgical  procedures,  pharmaceutics,  medical  devices  and   tested out-of-sample, but in-house, by splitting the available
           software. Randomized controlled clinical trials, peer-review   data in a training and a test set, often in a cross-validation
           of scientific literature and standard tests in accredited testing   scheme. The trained model computes labels or other output
           laboratories  are  examples  of  well-established  methods  for   variables  from  the  input  data  of  the  test  set,  which  are
           assessing these interventions, substances or devices.   statistically compared with the “true” labels or annotations
                                                              (the comparison is summarized in a score). Then, method





                                                           – 69 –
   84   85   86   87   88   89   90   91   92   93   94