Page 92 - ITU KALEIDOSCOPE, ATLANTA 2019
P. 92

2019 ITU Kaleidoscope Academic Conference




           references (e.g. annotated images) to enable the developers   [29] and a white paper on the website, where also the full
           carrying out a trial run of their code.            documentation of all previous meetings is published.

           For a clean and fair evaluation, a trusted third party should       6.  OUTLOOK
           receive  the  trained  model,  as  independent  arbiter,  and
           conduct  the  tests  on  data  that  have  never  been  published   In summary, the ITU/WHO focus group on “AI for Health”
           before. This cautious procedure prevents unfair conduct, e.g.   has  taken  the  first  exploratory  steps  towards  international
           tuning the model for optimal performance on this particular   health ML/AI evaluation standards. For the future, we expect
           test  set  (“overfitting”),  without  actually  being  able  to   that  a  wide  spectrum  of  health  ML/AI  topics  will  be
           generalize well to real-world data, which can be expected in   addressed  and  that  insights  from  the  evaluation  will  be
           practice. Therefore, widely available, public data sets cannot   brought back to research and development. The evaluation
           be used for the evaluation and the entire test data set must   procedure will be continuously refined in a repeated cycle,
           remain  secret,  i.e.  neither  labeled  nor  unlabeled  test  data   considering  further  quality  criteria  beyond  mere
           should be made available. The model performance should be   performance,  and  including  high  quality  test  data  with
           evaluated  in  a  closed  computing  environment  without   increasing geographic coverage. For the years to come, we
           Internet access. Otherwise, test data could be leaked, against   also anticipate further deepening of cooperation on ML/AI
           the rules, and the model be tweaked on the test data. Besides,   between  standard  setting  organizations.  While  the
           leaderboard probing and other potential pitfalls known from   standardization activities on ML/AI differ in their thematic
           ML challenges must be kept in mind [72, 73]. The trusted   scope and particular objective (see section 2), they can profit
           third party is responsible to protect both test data and ML/AI   from  collaboration,  because  different  application  areas  of
           model. The test data have to remain secret for subsequent   ML/AI  often  share  problems  and  data  modalities.  For
           meaningful testing and the AI models may contain business-  instance, assuring robust automatic image interpretation can
           relevant trade secrets of the developer.           be relevant for a range of safety-critical application domains,
                                                              and is not limited to healthcare. At the same time, a generic
           In this spirit, focus group members have conducted a first   approach is often not possible, because the cross-sectional
           proof-of-concept benchmark for digital pathology, where an   ML/AI technologies require cooperation with the respective
           ML/AI model can provide diagnostic support by quantifying   domain experts. A good example for this multidisciplinary
           tumor infiltrating lymphocytes in breast cancer, from whole   cooperation is the joint focus group of ITU and WHO, which
           slide histopathology images, which is relevant for prognosis   brings together expertise from information technology and
           and  therapy  selection  [cf. 74,  75].    The  topic  group  had   health  standardization  bodies.  In  particular,  this  initiative
           defined the evaluation task and procedure, and had acquired   shows that global collaboration can leverage synergy effects,
           and annotated test data. The developer had trained a model   since many relevant issues are common across the world.
           on  own  training  data  to  predict  the  annotations  that  a
           pathologist  would  give  from  the  images.  A  focus  group       REFERENCES
           member  as  arbiter  provided  the  computing  infrastructure
           according to specifications of the developer (here a desktop   [1]    World Health Organization (2019) Global Strategy
           computer with a certain graphics processing unit, operating   on Digital Health 2020-2024. Retrieved from
           system, package manager, and ML framework installed) and   https://www.who.int/DHStrategy
           granted the developer access via the Internet  to install the
           prediction routine.  Few annotated example data enabled the   [2]    U.S. Food and Drug Administration (2018). FDA
           developer to test the prediction routine. After disconnecting   News Release. Retrieved from
           the  computer  from  the  Internet,  the  arbiter  uploaded   https://www.fda.gov/news-events/press-
           undisclosed test data, directly received from the topic group,   announcements/fda-permits-marketing-artificial-
           on the machine, and executed the prediction routine, which   intelligence-based-device-detect-certain-diabetes-
           processed  the  data  and  predicted  the  annotations.  Finally,   related-eye
           scores  (true  positive  rate  and  true  negative  rate)  were
           computed by comparison with the reference annotations, and   [3]    Mesko, B. (2019). FDA Approvals For Smart
           reported back to the topic group and the developer. Naturally,   Algorithms In Medicine In One Giant Infographic.
           this manual procedure can be automatized and scaled, e.g.   The Medical Futurist. Retrieved from
           with  one  of  the  ML  challenge  frameworks  mentioned  in   https://medicalfuturist.com/fda-approvals-for-
           section  4, potentially  installed  on  a  server on  ITU or  UN   algorithms-in-medicine
           premises.
                                                              [4]   U. S. Food and Drug Administration (2019).
           Interaction  with  further  health  institutions  will  be   Proposed Regulatory Framework for Modifications
           strengthened,  potentially,  e.g.,  with  the  International   to Artificial Intelligence/Machine Learning
           Association  of  National  Public  Health  Institutes,  the   (AI/ML)-Based Software as a Medical Device
           InterAcademy  Partnership  and  the  World  Health  Summit.   (SaMD) - Discussion Paper and Request for
           Further information about the scope and general process of   Feedback. Retrieved from
           the focus group can be found in a commentary in The Lancet   https://www.regulations.gov/document?D=FDA-
                                                                    2019-N-1185-0001




                                                           – 72 –
   87   88   89   90   91   92   93   94   95   96   97