Page 137 - ITU KALEIDOSCOPE, ATLANTA 2019
P. 137

ICT for Health: Networks, standards and innovation




           patients with missing primary cause of ESRD. We addressed   blood  type  AB  and  shorter  dialysis  durations  [23]),  we
           the  two  variables  with  significantly  missing  values  (peak   applied three regularized linear regression methods.
           PRA and primary cause of ESRD) using multiple imputation
           methods  in  sensitivity  analysis  (see  Sensitivity  Analysis).   3.1.1   Ridge regression
           Reference-based  characteristics  were  male;  age  71-80;
           Caucasian; from LHIN A; comorbidity-free; not sensitized   Ridge regression aims to minimize the sum of the original
           (peak  PRA  =  0%);  ESRD  caused  by  glomerulonephritis  /   loss  function     and  a  regularized  term,  known  as  the  L2
                                                                          0
           autoimmune; blood type A; and having received a first graft   Norm  [24]  that  we  described  below.  As  the  parameter   
           pre-emptively.                                     increases,     is shrunk towards 0. Optimal    is determined
                                                                          
                                                              by ten-fold CV within the training set such that the averaged
                         3.  DATA ANALYSIS                    test RMSE is minimized.

           We  first  summarized  patient  characteristics  at  baseline                        2
           (transplantation). Continuous variables were represented by      ‖  ‖ =    ∑    (   )
                                                                                                
                                                                                2
           means and standard deviations (SD), as well as by medians                      =1
           and inter-quantile ranges (IQR). Categorical variables were   3.1.2   Lasso regression
           summarized by counts and percentages.
                                                              Lasso or least absolute shrinkage and selection operator [25]
           Similar to the methods employed by Haddad et al. [10], we   searches  for  estimates  of     that  minimize  the  sun  of  the
                                                                                       
           included patients transplanted between 2012 and 2014 in the   original loss function     and the L1 Norm that we presented
           testing set (N=294, 22.1%) while the remainder were in the           0
           training set (N=1034, 77.9%), achieving a size ratio (testing:   below.  Compared  with  ridge  regression,  lasso  forces  the
                                                              weights of some predictors to be zero to achieve a sparse
           training) of roughly 2:8 [21]. This ratio is reflective of idea
           practice in machine learning whereby earlier data is used to   model. Similar with ridge regression, the optimal    value is
           construct models and more recent data (in our case, 2012-  selected by ten-fold CV such that the averaged test RMSE is
           2014) used to validate such models [10]. The optimal model   minimized within the training set.
           was selected by ten-fold cross-validation (CV) on the testing                   
           set based on the averaged test root mean square error (RMSE)     ‖  ‖ =    ∑     |   |
                                                                                                
                                                                                 1
           and R  value [22]. Since we had both categorical (e.g. age                      =1
                2
           groups) and continuous (i.e., pre-transplant healthcare use)
           variables  as  candidate  predictors,  they  were  standardized   3.1.3   Elastic net regression
           prior to model training and testing. Following Haddad et al.,
           we  log-transformed  healthcare  costs  incurred  during  pre-  Elastic  net  regression  is  a  compromise  between  ridge
           workup, workup, and the first post-transplant year in model   regression and lasso regression in a sense that it excludes
           training and testing. Results were then exponentiated to aid   irrelevant  predictors  but  keeps  both  of  the  correlated
           interpretation [10].                               predictors  [26].  Mathematically,  the  regularized  term  of
                                                              elastic net regression, ‖  ‖ , is a linear combination of the
                                                                                      
           3.1    Regularized linear regression               L1 Norm (‖  ‖ of lasso) and the L 2 Norm (‖  ‖  of ridge),
                                                                          1
                                                                                                     2
                                                              where      is  between  0  and  1.  The  optimal      and      are
           The goal of our analysis is to estimate a regression equation   selected by ten-fold CV.
           on the natural log of total healthcare costs over the first post-
           transplant year:                                                     1 −   
                                                                        ‖  ‖ =        ‖  ‖ +   ‖  ‖
                                                                                          1
                                                                                                   2
                                                                              
                                                                                  2
                                        
                        (  ) =    + ∑    (      )             3.2    Regression tree
                                0
                                                 
                                        =1
           where  y  is  the  log  of  one-year  heathcare  costs,      the   The regression tree partitions the feature space recursively to
                                                       0
           intercept of the equation (i.e., mean log cost for patients with   create a tree-like structure [27]. At each split in the tree a
           reference-level characteristics), and     the weight associated   node is created to ensure maximum homogeneity of the data
                                           
           with predictor    . Conventional ordinary least square (OLS)   being partitioned to the two regions. To train a full tree, we
                                                              selected features and the corresponding thresholds at each
                          
           methods  search  for  estimates  of     that  minimize  a  loss   node such that the squared loss function is minimized:
                                          
           function that is equal to the total sum of square errors:
                                                                                                        2
                                                                                    2
                                                                                                         
                                                                                    
                                                                                                   
                                                                              
                                                     2             ∑        ∈   1   (   −   ̂   ) + ∑       ∈   2   (   −   ̂   )
                   = ∑     (   − ∑     (   +       ))
                                          0
                                                      
                 0
                                
                          =1          =1
                                                              where R1 and R2 denote the two regions separated by the
           To  overcome  potential  model  overfitting  due  to   node. To avoid overfitting, tree pruning was performed on
           multicollinearity  among  candidate  predictors  (e.g.  having   the  full  tree  for  a  parsimonious  tree  (T)  such  that  the
                                                              following loss function is minimized:

                                                          – 117 –
   132   133   134   135   136   137   138   139   140   141   142