Page 222 - Kaleidoscope Academic Conference Proceedings 2020

P. 222

2020 ITU Kaleidoscope Academic Conference

Table 3 - Resultant table of example data stream

NORMAL MILD HIGH CRITICAL T 1 T 2 T 3 CLOSE FAR

H 1 4 2 1 3 0.9 1.0 0.6 4 1

H 2 4 2 1 3
H 3 4 2 1 3

H 4 4 2 1 3

H 5 4 2 1 3

reached the resulting Zcompute will be calculated using (1). Table 4 - Results of running different data sets

∑tavg = ( 0.1+0.5 ) + ( 0.5+0.7 ) + ( 0.7+0.8 ) + ( 0.8+0.9 ) + ( 0.9+1.0 ) DATA PROCESSING TABLE FILE
2 2 2 2 2
SET TIME(S) SIZE(BYTES) SIZE(BYTES)
∑ = 0.3 + 0.6 + 0.75 + 0.85 +0.95
100 208 8388608 8388688
Zcompute = 3.45 = 0.69 which gives us an indication that all the 1000 2037 8388608 8388688
5
“non-normal” data exists in 60% of the m which in this case
is 10. 2000 4039 8388608 8388688

Using (3) we can calculate % anomalous data assuming an 4000 9351 8388608 8388688
allowed threshold of 30%
6000 14692 8388608 8388688
= (2+1+3) / (4+2+1+3) = 0.66667 which means that 66% of 8000 17913 8388608 8388688
the data is anomalous which is a lot greater than our set
threshold so we set off an alarm to fix the system. 10000 24407 8388608 8388688

Assume we set FCthreshold to 10 percent then using (6) we can 15000 35171 8388608 8388688
find the frequency of appearance of the “non-normal” data.

> FC threshold
4
5
The computed value is equal to 0.8 which means that the
“non-normal” data has appeared 80% more frequently than
our allowed preset threshold value of 10% and as such we
need to send an alarm to alert the responsible persons.

4. SYSTEM SETUP AND RESULTS

The simulation was executed on a linux virtual machine
running on a 2 gigabyte RAM memory with 4 cores of an
Intel i7 2.50 gigahertz processor. We used the madoka data
sketching library [13] which is built using C++, has its own
compiler and uses the MurmurHash3 to compute its hash
values. Figure 3 - Variation of processing time with the increase in
set size
The table and graphs represent how the processing time
increases with the influx of data arriving at the sketch. The Discussion
processing time rises linearly at a rate of approximately 2
seconds for every data item; after the 2000 item mark the At the start of the sketch all the elements are initialized to
data starts rising exponentially to the increase of items being zero to indicate that the sketch is empty. As data starts
added to the sketch. Notice that even with the increase in data coming in and the appropriate sketch elements are
the sketch size remains unchanged. incremented, computations are made to determine whether

– 164 –

217 218 219 220 221 222 223 224 225 226 227