Page 223 - Kaleidoscope Academic Conference Proceedings 2020
P. 223

Industry-driven digital transformation




           an  element  is  near  or  far  based  on  arrival  time  using   [3]  Yang, T., Jiang, J., Liu, P., Huang, Q., Gong, J., Zhou,
           equations 4 and 5 along with setting the hash depth to 10   Y.,  ...  &  Uhlig,  S.  (2018,  August).  Elastic  sketch:
                                                                  Adaptive  and  fast  network-wide  measurements.
           contribute to the two second computation time per data item.   In Proceedings of the 2018 Conference of  the ACM
           However, after the data starts scaling beyond 2000 data items   Special Interest Group on Data Communication (pp.
           we  see  a  considerable  increase  in  processing  time  which   561-575). ACM.
           means that the sketch is starting to get saturated. We see a   [4]  Rettig,  L.,  Khayati,  M.,  Cudré-Mauroux,  P.,  &
           real issue as the data escalates to between the 10000-15000   Piórkowski, M. (2019). Online anomaly detection over
           items marking a real breakdown in processing time. As such   big  data  streams.  In Applied Data Science (pp.  289-
                                                                  312). Springer, Cham.
           to keep the sketch efficient and lightweight we recommend
           using  a  sketch  size  of  no  more  than  4000  data  items   [5]  Guha, S., Mishra, N., Roy, G., & Schrijvers, O. (2016,
                                                                  June).  Robust  random  cut  forest  based  anomaly
           depending on usage scenario before clearing the sketch for   detection  on  streams.  In International conference  on
           two reasons a) reduce wait time before having to do a system   machine learning (pp. 2712-2721).
           checkup;  b)  some  situations  are  time  sensitive  requiring   [6]  Dewaele, G., Borgnat, P., Abry, P., Aussibal, J., Gallon,
           immediate  attention.  We  mentioned  earlier  that  we   L., Owezarski, P., & Veitch, D. Statistical Sketch based
           incorporated a timer into the design to check the data more   Anomaly Detection and Validation using an Anomaly
           frequently and this way the user doesn’t have to wait for the   Database.
           sketch to reach a preassigned byte size to check the data.    [7]  Li, X., Bian, F., Crovella, M., Diot, C., Govindan, R.,
                                                                  Iannaccone,  G.,  &  Lakhina,  A.  (2006,  October).
           We  believe that the  solution can  be  versatile  in  situations   Detection and identification of network anomalies using
                                                                  sketch  subspaces.  In Proceedings of  the 6th ACM
           where  some  time  delay  is  acceptable  like  hospital  floor   SIGCOMM  conference on Internet  measurement (pp.
           recovery  rooms  or  social  network  fraudulent  account   147-152). ACM.
           checking  with  the  data  size  being  modified  depending  on   [8]  Huang, H., & Kasiviswanathan, S. P. (2015). Streaming
           preference.                                            anomal   detection    usi   randomize   matri
                                                                  sketching. Proceedings of the VLDB Endowment, 9(3),
                          5.  CONCLUSION                          192-203.
                                                              [9]  Shamseddine, M., Itani, W., Chehab, A., & Kayssi, A.
           Our segmented time controlled count-min sketch provides a   (2018).  Network  Programming  and  Probabilistic
                                                                  Sketching  for  Securing  the  Data  Plane. Security and
           new way of looking at the incoming data. The segmented   Communication Networks, 2018.
           time  controlled  count-min  sketch  reduces  collisions  by   [10]  Lakhina, A., Crovella, M., & Diot, C. (2005, August).
           providing more hash functions and the choice to clear the   Mining  anomalies  using  traffic  feature  distributions.
           sketch  after  either  a  maximum  elapsed  time  or  data  item   In ACM  SIGCOMM  computer  communication
           count is reached. Traditionally, the count-min sketch looks   review (Vol. 35, No. 4, pp. 217-228). ACM.
           at the minimum values produced by the hash functions for a   [11]  Kulkarni, A., Shea, C., Homayoun, H., & Mohsenin, T.
           particular  value  which  could  produce  misleading  results   (2017, March). Less: Big data sketching and encryption
           especially if the sketch is saturated so we decided to section   on low power platform. In Design, Automation & Test
                                                                  in Europe Conference & Exhibition (DATE), 2017 (pp.
           the values into categories based on a predetermined range.   1631-1634). IEEE.
           We also introduced the time element into the sketch as a way   [12]  Gelogo,  Y.  E.,  Hwang,  H.  J.,  &  Kim,  H.  K.  (2015).
           of determining the frequency of the anomalous data. We also   Internet  of  things  (IoT)  framework  for  u-healthcare
           proved the compactness of the sketch design by showing its   system. International  Journal of Smart Home, 9(11),
           consistency with the data increase.                    323-330.
                                                              [13]  https://www.s-yata.jp/madoka/
                            REFERENCES
           [1]  Cormode,  G.,  &  Muthukrishnan,  M.  (2011).
              Approximating  data  with  the  count-min  sketch.  IEEE
              software, 29(1), 64-69.
           [2]  P.  Flajolet,  Éric  Fusy,  O.  Gandouet,  and  F.
              Meunier.Hyperloglog:  The  analysis  of  a  near-optimal
              cardinality  estimation  algorithm.  In  Analysis  of
              Algorithms (AOFA), pages 127–146, 2007.






















                                                          – 165 –
   218   219   220   221   222   223   224   225   226   227   228