Page 34 - Proceedings of the 2018 ITU Kaleidoscope
P. 34
2018 ITU Kaleidoscope Academic Conference
impairments in a few groups (the clusters) we can useful when there is a notion that the observations in a
characterize a larger number of cases. This aims to help the dataset come from K different populations. It reveals the
operations team that would usually spend lots of effort similarities and differences among them.
analyzing each cable modem individually.
One of the challenges when applying cluster analysis to a
It is essential that the identification of damage in the physical dataset involves defining meaningful dimensions of analysis.
be more rapid. In addition to the characterization of In this case, we considered each one of the 4,458
impairments, field service technicians need a geographical measurements on each cable modem as a new dimension of
reference to determine whether the damage could be inside a analysis. Following this line, it is possible to interpret the
client’s home (if the same pattern is located at relatively spectrum as a point in a highly dimensional space.
distant and random points) or not (if the same pattern occurs
in nearby locations). 3.1 The k-means algorithm
2. MEASUREMENT The strategy to obtain the clusters that receives the name of
k-means consists on executing this algorithm [9]:
Telecom Argentina’s FBC tool takes 24,000 measurements
of each cable modem, in real time. This measures the 1. Set K points on the p-dimensional space as cluster
spectrum existing between 45MHz and 1,005 MHz. For centers, based on previous experience or in a
analysis purposes, the tool is configured to collect and record random fashion.
the data on a daily basis. 2. Calculate distances from all of the observations to
the centers.
In order to perform the proof of concept (PoC), we use the 3. Assign each observation to the nearest center.
data from February 5, 2018. For the purpose of reducing the 4. Use some criteria to evaluate the clusters.
computational effort, we executed the cluster analysis using 5. Recalculate the clusters’ centers.
the data of frequencies bands where we know that ingress is 6. Repeat steps two to five until there is no
likely to take place. This reduces the number of data points improvement in the evaluation made in step four.
from 24,000 to 4,458 per cable modem. Table 1 shows the
frequency bands in which we focused for this PoC. The result may vary according to the centers selected in the
initial step. One way to overcome this limitation is to initiate
Table 1 – Frequency bands [5] analyzed in this paper the routine in different centers and then evaluate if the result
varies or not.
Frequencies (MHz) Service
It is necessary to define a measurement of distance between
88 - 108 FM radio observations. For a set of observations { , , … , } we
Digital public TV use the Euclidean distance: 1 2
518 - 541
(Argentina’s TDA)
703 - 743 LTE uplink ( , ′) = [( − )′ ∙ ( − )] 1/2
′
′
758 - 803 LTE downlink
824 - 849 3G uplink Where each is a vector in a p-dimensional space. This
869 - 894 3G downlink definition of distance is sensitive to scale variations. A best
practice is to scale measurements when the variables
We analyze the spectrum of all cable modems in a service involved in the clustering process have different
group. In the HFC plant topology, a service group is the measurement units. It is better to use the original data when
complete set of downstream and upstream channels within a all the values have the same measurement unit. This will
single CMTS (Cable Modem Termination System), that a serve to identify natural patterns throughout the analysis.
single cable modem could potentially receive or transmit on
[6]. At the time of data retrieval, there were 421 cable 3.2 Clusters evaluation
modems in the selected service group.
To evaluate the clusters, as stated in the step 4, we use the
We made all of the data processing in R software [7]. For the Within Cluster Sum of Squares (WCSS), which is the sum of
cluster analysis, we used the MASS package [8]. In addition, the Euclidean distances from all observations in the cluster
we used Google Earth to obtain location data. to its center:
3. METHODOLOGY
2
WCSS = ∑ ∑ ( , ̅ )
Cluster analysis is a term that encompasses a variety of =1 =1
algorithms, aimed to group elements in a way that
differences among observations in the same group are Where the subscript k means that the observation belongs
minimum, and the groups are as different from one another to cluster k. This is also equivalent to a weighted sum of all
as it is possible. This kind of algorithms are particularly within-cluster variances:
– 18 –