Page 220 - Kaleidoscope Academic Conference Proceedings 2020
P. 220

2020 ITU Kaleidoscope Academic Conference




           D.     Computation Algorithm                                        AN EXAMPLE
           Before  the  value  of  t 1  is overwritten  with  the  value  of  t 2,
           computation on the data has to be performed.  Assign the   Suppose that the maximum allowed values into the sketching
                                                              function is set to 10, as in Table 1.
           value of t1 to a temporary variable ttemp1, the value of t2 to
           temporary  variable  ttemp2  and  the  value  of  t3  to  temporary
           variable ttemp3. Compute the average tavg using (2). Whenever   Each  of  the  above  data  elements  will  pass  through  the
           a new tavg is computed it is added to the last tavg and once Smax   truncation  function  to  remove  the  decimal  points  before
           is reached tavg is divided by the number n of tavg computed to   being passed on to ten hash functions to produce ten unique
           give Zcompute which is a percentage of all the “non-normal”   hash values, but for the sake of brevity let’s assume that we
           data to the total data in the sketch, see (1).     are working with five hash functions each producing a one
                                                              digit hexadecimal representation of the data element; we use
           The total of the anomalous data to the total number of all data   Table 2 to clarify our point. The algorithm can be extended
           inside  the  sketch  can  be  calculated  using  (3)  to  give  an   to ten hash functions each giving a true hexadecimal value.
           indication  of  how  much  of  the  data  is  non-normal,  if  the
           computed data is greater than a preset threshold an alarm is   The data will then enter the sketching algorithm which will
           set at.                                            map each hash value to the appropriate row and the column
                                                              will reflect its place in the table. Go through the data stream
                           Zcompute = (∑tavg) / n        (1)   from  S 1  taking  care  to  only  increment  the  cell  values
                                                              t1,t2,t3,far,close only when they are considered “non-normal”
                                                              using the rules that were mentioned earlier. If we want to
                        tavg = (t temp1 + t temp2) / 2   (2)   map the hash values of the data element 38 that arrived at
                                                              time  0.5,  the  first  thing  we  should  do  is  determine  what
                                                              quadrant the hash value falls under. Since it is considered a
             % anomalous data = (mild +high +critical) / (normal   value in the mild quadrant as we stated earlier in the text we
                           +mild + high+ critical)        (3)   increment all cell values corresponding to the H rows that
                                                              fall under the mild column. According to the first table the
           The values of ttemp1 and ttemp2 t temp3 can also be used to deduce   “non-normal” value 38 arrived at time 0.5 then t2’s value will
           how close the “non-normal” values are to each other. Using   be assigned to t 1 to denote the time of the last “non-normal”
           (4)  we  substitute  the  values  of  ttemp1,  ttemp2  to  compute  a   value  and  t2’s  value  will  be  updated  to  0.5  to  reflect  the
           temporary value for the close column and subtract one from   current “non-normal” arrival time.
           the temporary value at the end of data insertion to calculate
           its  true  value.  Using  (5)  we  substitute the  values  of  t temp3,   To decide whether to update the close or far column we are
           ttemp2 to compute a temporary value for the far column and   going to assign the value of t3 to ttemp3 which in this case is
           divide the temporary value at the end of data insertion by   0.4, the value of t 2 to t temp2 which is 0.5 and the value of t 1 to
           three to calculate its integer value. Once Smax is reached we   ttemp1 which is 0.1 and find that ttemp2 - ttemp1 is equal to 0.4 and
           check using (6) if the percentage of close values is greater   ttemp3 -  t temp2  equal to  -0.1,  so  we  increment  the temporary
           than  a  preset  allowed  maximum  FC threshold  if  it  is  than  an   close variable.
           alarm is raised.
                                                              Once all the data element’s hash values have found their way
                           (t temp2 – t temp1) -1        (4)   into the table along with any supporting variables the table
                                                              will look like Table 3.

                           (t temp3 – t temp2) / 3       (5)   To prevent t1 and t 2 from losing their values assign t1 to ttemp1
                                                              and t2 to ttemp2.

                        close / (far+ close) > FC threshold    (6)   Ttemp1 = 0.1 and ttemp2 = 0.5

           E.     Reset
                                                              Using (2) we calculate tavg to be equal to 0.2. We then add all
                                                              computed averages so far together. Once Smax is
           The main concern of the count-min sketch is that as the data
           volume  increases  the  probability  of  collisions  increases
           proportionally, which could lead to misleading information.
           To  prevent  that  from  happening  the  reset  function  will
           overwrite all the table cells to zero if the data sketch reaches
           Rmax values or the time elapsed from the system start reaches
           reset timer set by the user.  To understand how the algorithm
           works we illustrate the pseudocode in Figure 2.










                                                          – 162 –
   215   216   217   218   219   220   221   222   223   224   225