Page 223 - Kaleidoscope Academic Conference Proceedings 2020
P. 223
Industry-driven digital transformation
an element is near or far based on arrival time using [3] Yang, T., Jiang, J., Liu, P., Huang, Q., Gong, J., Zhou,
equations 4 and 5 along with setting the hash depth to 10 Y., ... & Uhlig, S. (2018, August). Elastic sketch:
Adaptive and fast network-wide measurements.
contribute to the two second computation time per data item. In Proceedings of the 2018 Conference of the ACM
However, after the data starts scaling beyond 2000 data items Special Interest Group on Data Communication (pp.
we see a considerable increase in processing time which 561-575). ACM.
means that the sketch is starting to get saturated. We see a [4] Rettig, L., Khayati, M., Cudré-Mauroux, P., &
real issue as the data escalates to between the 10000-15000 Piórkowski, M. (2019). Online anomaly detection over
items marking a real breakdown in processing time. As such big data streams. In Applied Data Science (pp. 289-
312). Springer, Cham.
to keep the sketch efficient and lightweight we recommend
using a sketch size of no more than 4000 data items [5] Guha, S., Mishra, N., Roy, G., & Schrijvers, O. (2016,
June). Robust random cut forest based anomaly
depending on usage scenario before clearing the sketch for detection on streams. In International conference on
two reasons a) reduce wait time before having to do a system machine learning (pp. 2712-2721).
checkup; b) some situations are time sensitive requiring [6] Dewaele, G., Borgnat, P., Abry, P., Aussibal, J., Gallon,
immediate attention. We mentioned earlier that we L., Owezarski, P., & Veitch, D. Statistical Sketch based
incorporated a timer into the design to check the data more Anomaly Detection and Validation using an Anomaly
frequently and this way the user doesn’t have to wait for the Database.
sketch to reach a preassigned byte size to check the data. [7] Li, X., Bian, F., Crovella, M., Diot, C., Govindan, R.,
Iannaccone, G., & Lakhina, A. (2006, October).
We believe that the solution can be versatile in situations Detection and identification of network anomalies using
sketch subspaces. In Proceedings of the 6th ACM
where some time delay is acceptable like hospital floor SIGCOMM conference on Internet measurement (pp.
recovery rooms or social network fraudulent account 147-152). ACM.
checking with the data size being modified depending on [8] Huang, H., & Kasiviswanathan, S. P. (2015). Streaming
preference. anomal detection usi randomize matri
sketching. Proceedings of the VLDB Endowment, 9(3),
5. CONCLUSION 192-203.
[9] Shamseddine, M., Itani, W., Chehab, A., & Kayssi, A.
Our segmented time controlled count-min sketch provides a (2018). Network Programming and Probabilistic
Sketching for Securing the Data Plane. Security and
new way of looking at the incoming data. The segmented Communication Networks, 2018.
time controlled count-min sketch reduces collisions by [10] Lakhina, A., Crovella, M., & Diot, C. (2005, August).
providing more hash functions and the choice to clear the Mining anomalies using traffic feature distributions.
sketch after either a maximum elapsed time or data item In ACM SIGCOMM computer communication
count is reached. Traditionally, the count-min sketch looks review (Vol. 35, No. 4, pp. 217-228). ACM.
at the minimum values produced by the hash functions for a [11] Kulkarni, A., Shea, C., Homayoun, H., & Mohsenin, T.
particular value which could produce misleading results (2017, March). Less: Big data sketching and encryption
especially if the sketch is saturated so we decided to section on low power platform. In Design, Automation & Test
in Europe Conference & Exhibition (DATE), 2017 (pp.
the values into categories based on a predetermined range. 1631-1634). IEEE.
We also introduced the time element into the sketch as a way [12] Gelogo, Y. E., Hwang, H. J., & Kim, H. K. (2015).
of determining the frequency of the anomalous data. We also Internet of things (IoT) framework for u-healthcare
proved the compactness of the sketch design by showing its system. International Journal of Smart Home, 9(11),
consistency with the data increase. 323-330.
[13] https://www.s-yata.jp/madoka/
REFERENCES
[1] Cormode, G., & Muthukrishnan, M. (2011).
Approximating data with the count-min sketch. IEEE
software, 29(1), 64-69.
[2] P. Flajolet, Éric Fusy, O. Gandouet, and F.
Meunier.Hyperloglog: The analysis of a near-optimal
cardinality estimation algorithm. In Analysis of
Algorithms (AOFA), pages 127–146, 2007.
– 165 –