Page 122 - ITU Journal Future and evolving technologies Volume 2 (2021), Issue 4 – AI and machine learning solutions in 5G and future networks
P. 122
ITU Journal on Future and Evolving Technologies, Volume 2 (2021), Issue 4
The objective of XGBoost is to not only prevent overfitting but
ob-
tained by simplifying the objective functions that combine
predictive and regularization terms and maintain an op‑
timal computational speed.
The additive learning process in XGBoost is to it the irst
learner to the whole space of input data and then to it a
second model to these residuals to tackle the drawbacks
of a weak learner. This itting process will be repeated a
few times until the stopping criterion is met. The ultimate
prediction of the model is obtained by the sum of the pre‑ Fig. 8 – LightGBM level‑wise tree growth
diction of each learner. To learn the set of functions used
in the model, XGBoost minimizes the following regular‑ 3.2.6 LightGBM (LGBM)
ized objective.
LightGBM is a gradient boosting framework that uses
= ∑ ( , ̂ ) + ∑ Ω( ), ∈ tree‑based learning algorithms [6]. It proposed to solve
=1 =1 the problems of Gradient Boosting Decision Tree(GBDT)
1 in mass data. The main difference is that the decision
Ω( ) = + ‖ ‖ 2
2 tree in LightGBM are grown leaf‑wise, as shown in Fig. 7
and Fig. 8, instead of the traditional level‑wise that re‑
where is the loss function, is the number of observa‑ quires checking all of the previous leaves for each new
tions used, Ω is the regularization term, is the vector of leaf, which improves accuracy and prevents over itting.
scores in the leaves, is the regularization parameter, and Moreover, LightGBM uses a histogram to identify the op‑
is the minimum loss needed to further partition the leaf timal segmentation point. A histogram replaces the tra‑
node. Moreover, XGBoost can be extended to any user‑ ditional pre‑sorted, so in a sense, it sacri ices accuracy
de ined loss function by de ining a function that outputs for speed. There are three aspects of differences between
the gradient and the Hessian (second‑order gradient) and LightGBM and XGBoost.
passing it through the “objective” hyper‑parameter.
In addition, XGBoost implements several methods to in‑ • First is the computational complexity. Compared
crease the training speed of decision tree that are not di‑ with XGBoost, LightGBM develops two kinds of meth‑
rectly related to the accuracy of the ensemble. In par‑ ods to reduce the dimensions of input features so
ticular, XGBoost focuses on reducing the computational as to decrease the computational complexity. Based
complexity to ind the best split. This is the most time‑ on the graph algorithm, LightGBM employs Exclusive
consuming part of decision tree algorithms. Split‑ inding Feature Bundling (EFB) to reduce the total number of
algorithms typically list all possible candidate splits and input features. At the same time, LightGBM utilizes
select the one with the highest gain. This requires a lin‑ Gradient‑based One‑side Sampling (GOSS) to rank
ear scan over each sorted attribute to ind the best split the samples according to the gradients. The propor‑
for each node. To avoid repeatedly sorting the data in tion of features with large gradients increases by se‑
each node, XGBoost uses a speci ic compressed column‑ lecting the features with large gradients and com‑
based structure in which the data is stored pre‑sorted. bining some randomly selected features with smaller
gradients.
In this way, each attribute needs to be sorted only once.
This column‑based storage structure allows inding the
• Second is the difference in strategy. LightGBM em‑
best split for each considered attribute in parallel. Instead
ploys a leaf‑wise strategy, while XGBoost employs a
of scanning all possible candidate splits, XGBoost imple‑
level‑wise one. Resource wasting exists in XGBoost
ments a method based on percentiles of the data, testing
for there are indiscriminate nodes split into all lay‑
only a subset of the candidate splits and calculating their
ers even when the gain is minimal. On the other
gain using aggregated statistics. More detailed informa‑
hand, LightGBM only splits leaf nodes with the great‑
tion and computational procedures of the XGBoost algo‑
rithm can be found in Tianqi Chen [5]. est splitting gains. Such a greedy operation will also
lead to over itting and extremely large tree depth, so
the tree depth in LightBGM should be constrained.
• Third is the scale of parallelization operation. Com‑
pared with XGBoost which focuses on the parallelism
of features, LightGBM can parallelly deal with the fea‑
tures, data processing, and voting operations, which
makes it be able to handle a larger data set.
Fig. 7 – XGBoost level‑wise tree growth
106 © International Telecommunication Union, 2021