Summarization Algorithm for Data Stream to Speed up Outlier Data Detection

Document Type : Research Article

Authors

Faculty of Electrical & Computer Engineering, University of Birjand, Birjand, Iran.

Abstract

Outlier detection in data streams is an essential issue in data processing. Today, due to the massive growth of streaming data generated by the spread of the Internet of Things, outlier detection has become a significant challenge. Much progress has been made in outlier detection based on local outlier detection algorithms, such as density-based local outlier factor algorithms, suitable for static data. The incremental version of these algorithms is used to detect the local outliers in streaming data. However, outlier detection in streaming data faces the challenges of limited memory capacity, high execution time, inaccessibility of all data at one time, and changes in data distribution (increasing and decreasing input rates, uncertainty, etc.). In this paper, we propose a density-based summarization algorithm, which summarizes data, every time the buffer is filled. The proposed algorithm maintains the desired shape of the clusters, with a low computational cost. To this end, larger clusters are selected and the data of their dense areas are reduced so that the shape of the old clusters is not lost. The proposed summarization algorithm reduces execution time and increases precision, recall, and F1 score compared with the evaluated algorithms.

Keywords

Main Subjects


[1] CC. Aggarwal and PS. U. An effective and efficient algorithm for high-dimensional outlier detection. The VLDB journal, 14(2):211--21, 2005 Apr. [ bib | DOI ]
[2] A. Arning, R. Agrawal, and P. Raghavan. A Linear Method for Deviation Detection in Large Databases. InKDD, 1141(50):972--981, 1996 Aug 2. [ bib | DOI ]
[3] J. Han and M. Kamber. Data mining: concepts and techniques. 2nd. University of Illinois at Urbana Champaign: Morgan Kaufmann, 1141(50):972--981, 2006. [ bib | DOI ]
[4] M. Kantardzic. DataMining Concepts. 2003. [ bib ]
[5] B. Saneja and R. Rani. An efficient approach for outlier detection in big sensor data of health care. International journal of communication systems, 30(17), 2017 Nov 25. [ bib | DOI ]
[6] V. Chandola, A. Banerjee, and V. Kamber. Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3):1--58, 2019 Ju 30. [ bib | DOI ]
[7] DS. Shukla, AC. Pandey, and A. Kulhari. Outlier detection: A survey on techniques of WSNs involving event and error based outliers. In2014 Innovative Applications of Computational Intelligence on Power, Energy, and Controls with their impact on Humanity (CIPECH), 41(3):113--116, 2014 Nov. [ bib | DOI ]
[8] G. Lin, L. Xin, H. Feng, and L. Ying. A new outlier detection algorithm and its application in intelligent transportation system. In2014 IEEE 7th Joint International Information Technology and Artificial Intelligence Conference, pages 442--445, 2014 Dec 20. [ bib | DOI ]
[9] X. Zhou, P. Zhao, Y. Liu, and Z. Cui. Semi-supervised Based Training Set Construction for Outlier Detection. In In2013 International Conference on Cloud Computing and Big Data. IEEE, 2013 Dec. [ bib | DOI ]
[10] JP. Vaumi, BO. Yenke, N. Bame, and I. Sarr. Outliers Detection in One Dimensional Meteorological Data Stream. In In2018 14th International Conference on Signal-Image Technology and Internet-Based Systems (SITIS), pages 574--579. IEEE, 2018 Nov 26. [ bib | DOI ]
[11] B. Krawczyk, LL. Minku, J. Gama, and J. Stefanowski M. Wo┼║niak. Ensemble learning for data stream analysis: A survey. Information Fusion, 37:132--56, 2017 Sep. [ bib | DOI ]
[12] MM. Breunig, HP. Kriegel, RT. Ng, and J. Sander. LOF: identifying density-based local outliers. In InProceedings of the 2000 ACM SIGMOD international conference on Management of data, pages 93--104, 2000 May 16. [ bib | DOI ]
[13] D. Pokrajac, A. Lazarevic, and LJ. Latecki. LOF: identifying density-based local outliers. In , pages 504--515. In2007 IEEE symposium on computational intelligence and data mining, 2007 Mar 1. [ bib | DOI ]
[14] M. Salehi, C. Leckie, JC. Bezdek, T. Vaithianathan, and X. Zhang. Fast memory efficient local outlier detection in data streams. IEEE Transactions on Knowledge and Data Engineering, 28(12):3246--60, 2016 Aug. [ bib | DOI ]
[15] GS. Na, D. Kim, and H. Yu. Dilof. Effective and memory efficient local outlier detection in data streams. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, pages 1993--2002, 2018 Jul 19. [ bib | DOI ]
[16] JW. Huang, MX. Zhong, and BP. Jaysawal. Tadilof: time aware density-based incremental local outlier detection in data streams. Sensors, 20(20):5829, 2020 Oct 15. [ bib | DOI ]
[17] R Alsini, O. Alghushairy, X. Ma, and T. Soule. A grid partition-based local outlier factor for data stream processing. In Advances in Artificial Intelligence and Applied Cognitive Computing, 20(20):1047--1060, 2021. [ bib | DOI ]
[18] Y. Yang, L. Chen, and C. Fan. ELOF: fast and memory-efficient anomaly detection algorithm in data streams. Soft Computing., 25(6):4283--94, 2021 Mar. [ bib | DOI ]
[19] Datasets. https://archive.ics.uci.edu/ml/index.php, Date Accessed: June 29, 2019. [ bib ]
  • Receive Date: 07 September 2022
  • Revise Date: 17 February 2023
  • Accept Date: 15 March 2023
  • First Publish Date: 15 March 2023