Page 161 - Kaleidoscope Academic Conference Proceedings 2024
P. 161

Innovation and Digital Transformation for a Sustainable World




           Figure  1  presents  an  updated  overview  of  our  proposed   to downstream health tasks.  The pre-trained model is then
           system  architecture  for  AI-driven  personalized  health   fine-tuned  on  the  curated  health  corpus  using  supervised
           services. The system consists of three main components: (1)   training  objectives,  such  as  next-token  prediction  or
           a user interaction layer, (2) a generative AI model, and (3) a   sequence-to-sequence  translation.  We  experiment  with
           knowledge retrieval engine [28].  The user interaction layer   various  fine-tuning  approaches,  including  continued  pre-
           provides natural language interfaces, such as chatbots, voice   training  on  in-domain  data,  multi-task  learning  on  related
           assistants,  or  mobile  apps,  for  users  to  input  their  health   health tasks, and instruction-based fine-tuning using prompt
           queries, symptoms, or goals. These inputs are translated into   templates. Fine-tuning adapts the model to the target health
           structured prompts that specify the desired output format and   domain  and  improves  its  ability  to  generate  relevant,
           any  relevant  patient  context.  The  prompts  are  then   accurate health content. We also explore techniques for safe,
           augmented with relevant medical knowledge retrieved from   controllable generation, such as:
           the knowledge base. The augmented prompts are fed into the
           generative AI model, which is a large language model pre-  •   Controlled decoding methods that constrain model
           trained  on  general-purpose  text  data  and  fine-tuned  on   outputs to align with specified attributes or styles
           domain-specific  health  corpora  [29].  The  model  generates
           personalized  health  information  or  recommendations  as   •   Safety  classifiers  that  filter  or  mask  potentially
           output, tailored to the user's specific prompt and retrieved   unsafe or offensive content
           context [30]. Techniques for safe and controllable generation,
           such as domain-adaptive pretraining, content filtering, and   •   Reinforcement  learning  from  human  feedback  to
           human feedback,  are  applied  to  ensure outputs  align  with   reward desirable behaviors and outputs
           verified health guidelines. The knowledge retrieval engine
           consists  of  a knowledge  base  that  stores  structured  health   For model serving, we use a retrieval-augmented generation
           data  (e.g.,  ontologies,  clinical  guidelines,  drug databases),   (RAG)  approach  that  combines  the  strengths  of  the
           and a retrieval module that finds relevant information based   generative  model  and  knowledge  retrieval.  Given  a  user
           on the user prompt and generated output. The retriever uses   prompt, the retriever first searches the knowledge base for
           semantic search techniques (e.g., entity linking, embedding   relevant  context,  such  as  definitions  of  medical  terms,
           similarity)  to  map  natural  language  to  knowledge  base   clinical  guidelines  for  mentioned  conditions,  or  drug
           entries. Retrieved context is passed back to the generative   information for queried medications.  The retrieved context
           model to inform and ground its outputs [31][32].   is appended to the user prompt to create an augmented input
                                                              for  the  generator.  The  generative  model  then  produces  a
           3.2   Data and Knowledge Sources                   contextually appropriate response that is both personalized
                                                              to  the  user's  specific  query  and  grounded  in  the  retrieved
           Our  system  leverages  a  combination  of  large-scale   medical  knowledge  [39].  The  generated  output  can
           unstructured text corpora and structured knowledge bases to   optionally be fed back into the retriever for additional fact-
           train  the  generative  model  and  retrieval  engine.  For  pre-  checking and refinement.
           training the base language model, we use general-purpose
           text datasets containing billions of tokens, such as Common   3.4   Evaluation Framework
           Crawl [33] and The Pile. For fine-tuning, we curate a health-
           specific  corpus  containing  millions  of  documents  from   We conduct extensive evaluations of our system using both
           authoritative  sources  such  as  PubMed  [34],  UpToDate,   automated  metrics  and  human  judgments.  For  automated
           Merck Manuals, and MedlinePlus.  We apply data cleaning,   evaluation,  we  measure  the  quality  of  generated  outputs
           deduplication, and quality control techniques to ensure the   using standard language modeling metrics such as perplexity,
           fine-tuning data is relevant, reliable, and representative of   BLEU [40] and ROUGE. We also assess the factual accuracy
           the target health domains. To build the knowledge base for   of  outputs  by  cross-referencing  them  against  ground-truth
           retrieval,  we  integrate  existing  health  ontologies  and   health  information  using  textual  entailment  models  or
           knowledge  graphs,  such  as  ICD-11  [35],  SNOMED-CT,   medical fact-checking APIs [41]. To understand our system's
           DrugBank, and UMLS. We also create custom knowledge   practical utility and usability, we carry out user studies with
           bases  by  extracting  structured  information  from  semi-  target  stakeholders,  including  patients,  caregivers,  and
           structured health content, such as clinical practice guidelines,   healthcare  providers.  Study  designs  include  controlled
           drug package inserts, and patient FAQs. Knowledge entries   experiments  comparing  our  system  to  existing  baselines,
           are stored as subject-relation-object triples and indexed using   longitudinal  field  studies  examining  user  engagement  and
           efficient retrieval algorithms.                    behavior  change,  and  qualitative  interviews  probing  user
                                                              attitudes,  needs,  and  concerns.  Participants  perform
           3.3   Model Training and Inference                 representative health-related tasks using our system, such as
                                                              seeking information about specific conditions, interpreting
           The base language model is pre-trained on the general text   lab results, or managing chronic illnesses. We collect both
           corpus  using  self-supervised  objectives,  such  as  masked   objective usage metrics (e.g. task completion time, error rate,
           language modeling [36] or permutation language modeling   interaction  logs)  and  subjective  user  feedback  through
           [37].  Pre-training  allows  the  model  to  learn  generalizable   surveys and interviews. Experienced medical professionals
           language patterns and representations that can be transferred   also  review  a  sample  of  generated  outputs  to  rate  their





                                                          – 117 –
   156   157   158   159   160   161   162   163   164   165   166