Page 218 - Kaleidoscope Academic Conference Proceedings 2020
P. 218
2020 ITU Kaleidoscope Academic Conference
by gaining more information on the frequency of anomalous the authors of [7] decided to use data sketching but propose
data arrivals which would aid in situation assessment. creating data sketches from existing data sketches. The
authors create sketches from the aggregate data stream which
The rest of the paper is organized as follows: Section 2 gives it then shuffles. Then their algorithm uses a voting
a review of the literature so far. Section 3 introduces the mechanism to increase data truthfulness when pinpointing an
proposed system design. Section 4 describes the system anomaly and reduce mistakenly identifying the data as a
design implementation and presents our results with a threat.
discussion. Finally, conclusions are presented in Section 5.
The authors in [8] introduce an unsupervised anomaly
2. RELATED WORK detection method which can detect anomalies by making one
iteration over the data which can save a lot of time and
The authors in [3] propose the elastic sketch as a way of memory which make it ideal for IoT applications. The
controlling the bandwidth, packet rate and packet size. For authors utilize matrix sketching to create orthogonal vectors
the bandwidth, rather than monitoring the flow of data of the data stream which can then be used to spot anomalies,
packets through the network as a whole randomized data by performing a reconstruction error test.
selection called sketching is performed on the data which
then goes through compression to free up the link for other Thus far data sketching has been used to detect data that is
data. Next comes packet rate which can be impossible to out of place once it arrives at the destination however the
quantify due to packet arrival randomness so data is authors in [9] used sketching to authenticate data at the
measured per unit time which the authors take as one hour. source and sink in a software-defined networking scenario
To optimize memory usage the authors decided to discard because it can scale easily with minimal cost of infrastructure
packet headers from the overall analysis of packet size which and time setting up the network. The authors used sketching
falls into one of two categories: mouse data and elephant data. coupled with some probing mechanisms to correctly identify
The data is segregated by way of a voting system to data in the switches responsible for the malicious activity on the
a predefined bucket. network. At the destination the sketches are compared; if
they are different this means that data was compromised
The authors of [4] opted for relative entropy, used to detect during transmission so it is discarded.
gradual changes in data, and Pearson correlation which is
used to detect sudden changes in incoming data. Relative As in [4], the authors in [10] use entropy to capture dirty data
entropy processes data in sliding window fashion by which in this case are anomalies. Instead of separating
comparing its data at the present time to that of the previous anomalies into data sets, data is mined to detect similar data
iteration. Person correlation works by measuring data occurrences in data across the whole network. Entropy, used
statistically as being positively, negatively correlated or as a method for summarization, is used to detect unusual
having no correlation to the data stream. Data is manipulated traffic patterns across multiple traffic variables spanning
in either stream or batch format using the anomaly detection different time instances.
algorithm which is then sent to a dashboard for analysis,
activating an alarm in the presence of an anomaly. The authors in [11] created an encryption algorithm for low
power systems using data sketching which could be used for
Unlike the authors in [3] and [4] which considered the data encrypting data rather than the advanced encryption standard
stream from end devices like mobile phones or desktop typically used in IoT application due to its low energy
personal computers [5] created an anomaly detection consumption. The algorithm uses one time linear projection
algorithm, robust random cut forest data structure, for data to protect against known text attacks by encrypting the data
streams arriving from IoT devices such as wearable devices using data sketching to select random data from the stream.
and wireless medical gadgets. The algorithm works by The stream is then reconstructed at the slave node before data
randomly selecting data to build out a tree with the anomaly sketching algorithms are applied again and passed through
being a part of it. The anomaly which the authors call a point the one time linear projection once to prevent the constructed
is discovered by the distance it incurs on other data in the matrix from being discovered by performing a man in the
tree as it increases in complexity and size. middle attack on the network.
Going back to anomaly detection of data streams in The authors in [12] created a framework for ubiquitous
traditional networks [6] proposes to use volume anomalies to healthcare or u-health of IoT devices used in the medical
build out a database for faster identification. The sketches are setting. The traditional uhealth system architecture consists
constructed out of data it selects in a quasi-random manner of the body area network which is in charge of monitoring
which then goes through some hash functions. the patient vitals and sending it to the intelligent medical
server which is in charge of receiving the patient data and
The use of sketches has become a common theme among detecting patterns in the data as well as data discrepancies
researchers because of its ability to substantially reduce which it will then send to the hospital system. At the hospital
dimensionality with some margin of acceptable error. The system data is augmented by authorized personnel. The idea
use of randomness reduces false positives when an suggested by the authors is to unify all body area network
acceptable data set is selected from the overall data. Like [6] devices regardless of communication protocols, whether that
– 160 –