Page 137 - ITU KALEIDOSCOPE, ATLANTA 2019

P. 137

ICT for Health: Networks, standards and innovation

patients with missing primary cause of ESRD. We addressed blood type AB and shorter dialysis durations [23]), we
the two variables with significantly missing values (peak applied three regularized linear regression methods.
PRA and primary cause of ESRD) using multiple imputation
methods in sensitivity analysis (see Sensitivity Analysis). 3.1.1 Ridge regression
Reference-based characteristics were male; age 71-80;
Caucasian; from LHIN A; comorbidity-free; not sensitized Ridge regression aims to minimize the sum of the original
(peak PRA = 0%); ESRD caused by glomerulonephritis / loss function and a regularized term, known as the L2
0
autoimmune; blood type A; and having received a first graft Norm [24] that we described below. As the parameter
pre-emptively. increases, is shrunk towards 0. Optimal is determined

by ten-fold CV within the training set such that the averaged
3. DATA ANALYSIS test RMSE is minimized.

We first summarized patient characteristics at baseline 2
(transplantation). Continuous variables were represented by ‖ ‖ = ∑ ( )

2
means and standard deviations (SD), as well as by medians =1
and inter-quantile ranges (IQR). Categorical variables were 3.1.2 Lasso regression
summarized by counts and percentages.
Lasso or least absolute shrinkage and selection operator [25]
Similar to the methods employed by Haddad et al. [10], we searches for estimates of that minimize the sun of the

included patients transplanted between 2012 and 2014 in the original loss function and the L1 Norm that we presented
testing set (N=294, 22.1%) while the remainder were in the 0
training set (N=1034, 77.9%), achieving a size ratio (testing: below. Compared with ridge regression, lasso forces the
weights of some predictors to be zero to achieve a sparse
training) of roughly 2:8 [21]. This ratio is reflective of idea
practice in machine learning whereby earlier data is used to model. Similar with ridge regression, the optimal value is
construct models and more recent data (in our case, 2012- selected by ten-fold CV such that the averaged test RMSE is
2014) used to validate such models [10]. The optimal model minimized within the training set.
was selected by ten-fold cross-validation (CV) on the testing
set based on the averaged test root mean square error (RMSE) ‖ ‖ = ∑ | |

1
and R value [22]. Since we had both categorical (e.g. age =1
2
groups) and continuous (i.e., pre-transplant healthcare use)
variables as candidate predictors, they were standardized 3.1.3 Elastic net regression
prior to model training and testing. Following Haddad et al.,
we log-transformed healthcare costs incurred during pre- Elastic net regression is a compromise between ridge
workup, workup, and the first post-transplant year in model regression and lasso regression in a sense that it excludes
training and testing. Results were then exponentiated to aid irrelevant predictors but keeps both of the correlated
interpretation [10]. predictors [26]. Mathematically, the regularized term of
elastic net regression, ‖ ‖ , is a linear combination of the

3.1 Regularized linear regression L1 Norm (‖ ‖ of lasso) and the L 2 Norm (‖ ‖ of ridge),
1
2
where is between 0 and 1. The optimal and are
The goal of our analysis is to estimate a regression equation selected by ten-fold CV.
on the natural log of total healthcare costs over the first post-
transplant year: 1 −
‖ ‖ = ‖ ‖ + ‖ ‖
1
2

( ) = + ∑ ( ) 3.2 Regression tree
0

=1
where y is the log of one-year heathcare costs, the The regression tree partitions the feature space recursively to
0
intercept of the equation (i.e., mean log cost for patients with create a tree-like structure [27]. At each split in the tree a
reference-level characteristics), and the weight associated node is created to ensure maximum homogeneity of the data

with predictor . Conventional ordinary least square (OLS) being partitioned to the two regions. To train a full tree, we
selected features and the corresponding thresholds at each

methods search for estimates of that minimize a loss node such that the squared loss function is minimized:

function that is equal to the total sum of square errors:
2
2

2 ∑ ∈ 1 ( − ̂ ) + ∑ ∈ 2 ( − ̂ )
= ∑ ( − ∑ ( + ))
0

=1 =1
where R1 and R2 denote the two regions separated by the
To overcome potential model overfitting due to node. To avoid overfitting, tree pruning was performed on
multicollinearity among candidate predictors (e.g. having the full tree for a parsimonious tree (T) such that the
following loss function is minimized:

– 117 –

132 133 134 135 136 137 138 139 140 141 142