Page 34 - Proceedings of the 2018 ITU Kaleidoscope
P. 34

2018 ITU Kaleidoscope Academic Conference




           impairments  in  a  few  groups  (the  clusters)  we  can   useful  when  there  is  a  notion  that  the  observations  in  a
           characterize a larger number of cases. This aims to help the   dataset  come  from  K  different  populations.  It  reveals  the
           operations  team  that  would  usually  spend  lots  of  effort   similarities and differences among them.
           analyzing each cable modem individually.
                                                              One of the challenges when applying cluster analysis to a
           It is essential that the identification of damage in the physical   dataset involves defining meaningful dimensions of analysis.
           be  more  rapid.  In  addition  to  the  characterization  of   In  this  case,  we  considered  each  one  of  the  4,458
           impairments, field service technicians need a geographical   measurements on each cable modem as a new dimension of
           reference to determine whether the damage could be inside a   analysis. Following this line, it is possible to interpret the
           client’s  home  (if  the  same  pattern  is  located  at  relatively   spectrum as a point in a highly dimensional space.
           distant and random points) or not (if the same pattern occurs
           in nearby locations).                                            3.1 The k-means algorithm

                         2.  MEASUREMENT                      The strategy to obtain the clusters that receives the name of
                                                              k-means consists on executing this algorithm [9]:
           Telecom Argentina’s FBC tool takes 24,000 measurements
           of  each  cable  modem,  in  real  time.  This  measures  the   1.  Set K points on the p-dimensional space as cluster
           spectrum  existing  between  45MHz  and  1,005  MHz.  For   centers,  based  on  previous  experience  or  in  a
           analysis purposes, the tool is configured to collect and record   random fashion.
           the data on a daily basis.                             2.  Calculate distances from all of the observations to
                                                                     the centers.
           In order to perform the proof of concept (PoC), we use the   3.  Assign each observation to the nearest center.
           data from February 5, 2018.  For the purpose of reducing the   4.  Use some criteria to evaluate the clusters.
           computational effort, we executed the cluster analysis using   5.  Recalculate the clusters’ centers.
           the data of frequencies bands where we know that ingress is   6.  Repeat  steps  two  to  five  until  there  is  no
           likely to take place. This reduces the number of data points   improvement in the evaluation made in step four.
           from 24,000 to 4,458 per cable modem. Table 1 shows the
           frequency bands in which we focused for this PoC.   The result may vary according to the centers selected in the
                                                              initial step. One way to overcome this limitation is to initiate
              Table 1 – Frequency bands [5] analyzed in this paper   the routine in different centers and then evaluate if the result
                                                              varies or not.
                   Frequencies (MHz)     Service
                                                              It is necessary to define a measurement of distance between
                        88 - 108         FM radio             observations.  For a set of  observations {   ,    , … ,    } we
                                      Digital public TV       use the Euclidean distance:        1  2      
                       518 - 541
                                     (Argentina’s TDA)
                       703 - 743        LTE uplink                     (   ,    ′) = [(   −    )′ ∙ (   −    )] 1/2
                                                                           
                                                                                                    ′
                                                                                         ′
                                                                                               
                                                                                     
                                                                              
                       758 - 803       LTE downlink
                       824 - 849        3G uplink             Where  each    is  a  vector  in  a  p-dimensional  space.  This
                                                                            
                       869 - 894       3G downlink            definition of distance is sensitive to scale variations. A best
                                                              practice  is  to  scale  measurements  when  the  variables
           We analyze the spectrum of all cable modems in a service   involved  in  the  clustering  process  have  different
           group.  In  the  HFC  plant  topology,  a  service  group  is  the   measurement units. It is better to use the original data when
           complete set of downstream and upstream channels within a   all  the  values  have  the  same  measurement  unit.  This  will
           single  CMTS (Cable Modem Termination  System), that a   serve to identify natural patterns throughout the analysis.
           single cable modem could potentially receive or transmit on
           [6].  At  the  time  of  data  retrieval,  there  were  421  cable   3.2 Clusters evaluation
           modems in the selected service group.
                                                              To evaluate the clusters, as stated in the step 4, we use the
           We made all of the data processing in R software [7]. For the   Within Cluster Sum of Squares (WCSS), which is the sum of
           cluster analysis, we used the MASS package [8]. In addition,   the Euclidean distances from all observations in the cluster
           we used Google Earth to obtain location data.      to its center:

                         3.  METHODOLOGY                                                 
                                                                                         2
                                                                          WCSS =  ∑ ∑    (   ,    ̅ )
                                                                                                 
                                                                                                  
           Cluster  analysis  is  a  term  that  encompasses  a  variety  of        =1    =1
           algorithms,  aimed  to  group  elements  in  a  way  that
           differences  among  observations  in  the  same  group  are   Where the subscript k means that the observation     belongs
                                                                                                         
           minimum, and the groups are as different from one another   to cluster k. This is also equivalent to a weighted sum of all
           as  it  is  possible.  This  kind  of  algorithms  are  particularly   within-cluster variances:



                                                           – 18 –
   29   30   31   32   33   34   35   36   37   38   39