Page 122 - ITU Journal Future and evolving technologies Volume 2 (2021), Issue 4 – AI and machine learning solutions in 5G and future networks
P. 122

ITU Journal on Future and Evolving Technologies, Volume 2 (2021), Issue 4



          The objective of XGBoost is to not only prevent overfitting but
                        ob-
          tained by simplifying the objective functions that combine
          predictive and regularization terms and maintain an op‑
          timal computational speed.
          The additive learning process in XGBoost is to  it the  irst
          learner to the whole space of input data and then to  it a
          second model to these residuals to tackle the drawbacks
          of a weak learner.  This  itting process will be repeated a
          few times until the stopping criterion is met. The ultimate
          prediction of the model is obtained by the sum of the pre‑     Fig. 8 – LightGBM level‑wise tree growth
          diction of each learner.  To learn the set of functions used
          in  the  model,  XGBoost  minimizes  the  following  regular‑   3.2.6   LightGBM (LGBM)
          ized objective.
                                                               LightGBM  is  a  gradient  boosting  framework  that  uses
                       = ∑   (   , ̂    ) + ∑ Ω(   ),        ∈     tree‑based learning algorithms [6].  It proposed to solve
                                            
                              
                        =1           =1                        the problems of Gradient Boosting Decision Tree(GBDT)
                                     1                         in  mass  data.  The  main  difference  is  that  the  decision
                        Ω(  ) =      +   ‖  ‖ 2
                                     2                         tree in LightGBM are grown leaf‑wise, as shown in Fig. 7
                                                               and  Fig.  8,  instead  of  the  traditional  level‑wise  that  re‑
         where    is the loss function,    is the number of observa‑   quires  checking  all  of  the  previous  leaves  for  each  new
         tions used, Ω is the regularization term,    is the vector of   leaf,  which  improves  accuracy  and  prevents  over itting.
         scores in the leaves,    is the regularization parameter, and   Moreover, LightGBM uses a histogram to identify the op‑
            is the minimum loss needed to further partition the leaf   timal segmentation point.  A histogram replaces the tra‑
         node.  Moreover,  XGBoost  can  be  extended  to  any  user‑   ditional  pre‑sorted,  so  in  a  sense,  it  sacri ices  accuracy
         de ined loss function by de ining a function that outputs   for speed. There are three aspects of differences between
         the gradient and the Hessian (second‑order gradient) and   LightGBM and XGBoost.
         passing it through the “objective” hyper‑parameter.
         In addition, XGBoost implements several methods to in‑   • First  is  the  computational  complexity.  Compared
         crease the training speed of decision tree that are not di‑   with XGBoost, LightGBM develops two kinds of meth‑
         rectly  related  to  the  accuracy  of  the  ensemble.  In  par‑   ods  to  reduce  the  dimensions  of  input  features  so
         ticular,  XGBoost  focuses  on  reducing  the  computational   as to decrease the computational complexity.  Based
         complexity to  ind the best split.  This is the most time‑   on the graph algorithm, LightGBM employs Exclusive
         consuming part of decision tree algorithms.  Split‑ inding   Feature Bundling (EFB) to reduce the total number of
         algorithms typically list all possible candidate splits and   input features.  At the same time, LightGBM utilizes
         select the one with the highest gain.  This requires a lin‑   Gradient‑based  One‑side  Sampling  (GOSS)  to  rank
         ear scan over each sorted attribute to  ind the best split   the samples according to the gradients.  The propor‑
         for  each  node.  To  avoid  repeatedly  sorting  the  data  in   tion of features with large gradients increases by se‑
         each node, XGBoost uses a speci ic compressed column‑     lecting  the  features  with  large  gradients  and  com‑
         based  structure  in  which  the  data  is  stored  pre‑sorted.   bining some randomly selected features with smaller
                                                                   gradients.
         In this way, each attribute needs to be sorted only once.
         This  column‑based  storage  structure  allows    inding  the
                                                                 • Second is the difference in strategy.  LightGBM em‑
         best split for each considered attribute in parallel. Instead
                                                                   ploys a leaf‑wise strategy, while XGBoost employs a
         of scanning all possible candidate splits, XGBoost imple‑
                                                                   level‑wise one.  Resource wasting exists in XGBoost
         ments a method based on percentiles of the data, testing
                                                                   for there are indiscriminate nodes split into all lay‑
         only a subset of the candidate splits and calculating their
                                                                   ers  even  when  the  gain  is  minimal.  On  the  other
         gain using aggregated statistics.  More detailed informa‑
                                                                   hand, LightGBM only splits leaf nodes with the great‑
         tion and computational procedures of the XGBoost algo‑
         rithm can be found in Tianqi Chen [5].                    est splitting gains.  Such a greedy operation will also
                                                                   lead to over itting and extremely large tree depth, so
                                                                   the tree depth in LightBGM should be constrained.
                                                                 • Third is the scale of parallelization operation.  Com‑
                                                                   pared with XGBoost which focuses on the parallelism
                                                                   of features, LightGBM can parallelly deal with the fea‑
                                                                   tures, data processing, and voting operations, which
                                                                   makes it be able to handle a larger data set.



                    Fig. 7 – XGBoost level‑wise tree growth




          106                                © International Telecommunication Union, 2021
   117   118   119   120   121   122   123   124   125   126   127