Page 218 - Kaleidoscope Academic Conference Proceedings 2020
P. 218

2020 ITU Kaleidoscope Academic Conference




           by gaining more information on the frequency of anomalous   the authors of [7] decided to use data sketching but propose
           data arrivals which would aid in situation assessment.   creating  data  sketches  from  existing  data  sketches.  The
                                                              authors create sketches from the aggregate data stream which
           The rest of the paper is organized as follows: Section 2 gives   it  then  shuffles.  Then  their  algorithm  uses  a  voting
           a  review  of  the  literature  so  far.  Section  3  introduces  the   mechanism to increase data truthfulness when pinpointing an
           proposed  system  design.  Section  4  describes  the  system   anomaly  and  reduce  mistakenly  identifying  the  data  as  a
           design  implementation  and  presents  our  results  with  a   threat.
           discussion. Finally, conclusions are presented in Section 5.
                                                              The  authors  in  [8]  introduce  an  unsupervised  anomaly
                         2.  RELATED WORK                     detection method which can detect anomalies by making one
                                                              iteration  over  the  data  which  can  save  a  lot  of  time  and
           The  authors in [3]  propose  the elastic  sketch as  a  way  of   memory  which  make  it  ideal  for  IoT  applications.  The
           controlling the bandwidth, packet rate and packet size. For   authors utilize matrix sketching to create orthogonal vectors
           the  bandwidth,  rather  than  monitoring  the  flow  of  data   of the data stream which can then be used to spot anomalies,
           packets  through  the  network  as  a  whole  randomized  data   by performing a reconstruction error test.
           selection called  sketching is  performed  on the  data  which
           then goes through compression to free up the link for other   Thus far data sketching has been used to detect data that is
           data.  Next  comes  packet  rate  which  can  be  impossible  to   out of place once it arrives at the destination however the
           quantify  due  to  packet  arrival  randomness  so  data  is   authors  in  [9]  used  sketching  to  authenticate  data  at  the
           measured per unit time which the authors take as one hour.   source and sink in a software-defined networking scenario
           To optimize memory usage the authors decided to discard   because it can scale easily with minimal cost of infrastructure
           packet headers from the overall analysis of packet size which   and time setting up the network. The authors used sketching
           falls into one of two categories: mouse data and elephant data.  coupled with some probing mechanisms to correctly identify
           The data is segregated by way of a voting system to data in   the  switches  responsible  for  the  malicious  activity  on  the
           a predefined bucket.                               network.  At  the  destination  the  sketches  are  compared;  if
                                                              they  are  different  this  means  that  data  was  compromised
           The authors of [4] opted for relative entropy, used to detect   during transmission so it is discarded.
           gradual  changes in  data, and  Pearson  correlation  which  is
           used  to detect  sudden changes in  incoming  data.  Relative   As in [4], the authors in [10] use entropy to capture dirty data
           entropy  processes  data  in  sliding  window  fashion  by   which  in  this  case  are  anomalies.  Instead  of  separating
           comparing its data at the present time to that of the previous   anomalies into data sets, data is mined to detect similar data
           iteration.  Person  correlation  works  by  measuring  data   occurrences in data across the whole network. Entropy, used
           statistically  as  being  positively,  negatively  correlated  or   as  a  method  for  summarization,  is  used  to  detect  unusual
           having no correlation to the data stream. Data is manipulated   traffic  patterns  across  multiple  traffic  variables  spanning
           in either stream or batch format using the anomaly detection   different time instances.
           algorithm  which  is  then  sent  to  a  dashboard  for  analysis,
           activating an alarm in the presence of an anomaly.     The authors in [11] created an encryption algorithm for low
                                                              power systems using data sketching which could be used for
           Unlike the authors in [3] and [4] which considered the data   encrypting data rather than the advanced encryption standard
           stream  from  end  devices  like  mobile  phones  or  desktop   typically  used  in  IoT  application  due  to  its  low  energy
           personal  computers  [5]  created  an  anomaly  detection   consumption. The algorithm uses one time linear projection
           algorithm, robust random cut forest data structure, for data   to protect against known text attacks by encrypting the data
           streams arriving from IoT devices such as wearable devices   using data sketching to select random data from the stream.
           and  wireless  medical  gadgets.  The  algorithm  works  by   The stream is then reconstructed at the slave node before data
           randomly selecting data to build out a tree with the anomaly   sketching algorithms are applied again and passed through
           being a part of it. The anomaly which the authors call a point   the one time linear projection once to prevent the constructed
           is discovered by the distance it incurs on other data in the   matrix from being discovered by performing a man in the
           tree as it increases in complexity and size.       middle attack on the network.

           Going  back  to  anomaly  detection  of  data  streams  in   The  authors  in  [12]  created  a  framework  for  ubiquitous
           traditional networks [6] proposes to use volume anomalies to   healthcare  or  u-health  of  IoT devices  used  in  the medical
           build out a database for faster identification. The sketches are   setting. The traditional uhealth system architecture consists
           constructed out of data it selects in a quasi-random manner   of the body area network which is in charge of monitoring
           which then goes through some hash functions.       the  patient  vitals  and  sending  it  to  the  intelligent  medical
                                                              server which is in charge of receiving the patient data and
           The use of sketches has become a common theme among   detecting patterns in the data as well as data discrepancies
           researchers  because  of  its  ability  to  substantially  reduce   which it will then send to the hospital system. At the hospital
           dimensionality with some margin of acceptable error. The   system data is augmented by authorized personnel. The idea
           use  of  randomness  reduces  false  positives  when  an   suggested by the authors is to unify all body area network
           acceptable data set is selected from the overall data. Like [6]   devices regardless of communication protocols, whether that





                                                          – 160 –
   213   214   215   216   217   218   219   220   221   222   223