An Intelligent Framework for Log Anomaly Detection Based on Log Template Extraction

An Intelligent Framework for Log Anomaly Detection Based on Log Template Extraction

Lei Pan, Huichang Zhu
Copyright: © 2023 |Pages: 23
DOI: 10.4018/JCIT.330145
Article PDF Download
Open access articles are freely available for download

Abstract

Log anomaly detection holds great significance in computer systems and network security. A large amount of log data is generated in the background of various information systems and equipment, so automated methods are required to identify abnormal behavior that may indicate security threats or system malfunctions. The traditional anomaly detection methods usually rely on manual statistical discovery, or match by regular expression which are complex and time-consuming. To prevent system failures, minimize troubleshooting time, and reduce service interruptions, a log template-based anomaly detection method has been proposed in this context. This approach leverages log template extraction, log clustering, and classification technology to timely detect abnormal events within the information system. The effectiveness of this method has been thoroughly tested and compared against traditional log anomaly detection systems. The results demonstrate improvements in log analysis depth, event recognition accuracy, and overall efficiency.
Article Preview
Top

Currently, log anomaly detection involves several distinct processing steps. In this section, we review some notable practices associated with each step.

Log Parsing

The raw log file has a semi-structured text format and cannot be directly used for machine learning and data mining purposes. To enable effective analysis, it is essential to preprocess the raw logs by log parsing to extract key information and eliminate redundant events and irrelevant elements from the raw logs. The traditional approach of log parsing is to process it through regular expressions which is time-consuming and challenging to implement in practice. Several effective methods for log parsing exist, such as LKE (Fu et al., 2009), LogSig (Mizutani, 2013), LogMine (Hamooni et al., 2016), and SHISO (Zhu et al., 2010). These methods use similarity-based clustering techniques to compute distances between logs and cluster them based on their similarity. Another category is frequency-based clustering, which includes approaches like LFA (Nagappan & Vouk, 2010), SLCT (Vaarandi, 2003), and LogCluster (Vaarandi & Pihelgas, 2015). These methods group log items into multiple clusters based on the frequency of their occurrence. The third category comprises heuristic methods that utilize specific data structures to parse logs into multiple templates. Representative techniques in this category include FT-Tree (Zhang et al., 2017), Drain (He et al., 2017), Spell (Du & Li, 2016), and Logstamp (Tao et al., 2022). These methods employ heuristic rules and data structures to identify common log patterns and generate templates for log parsing.

By employing these various log parsing techniques, the raw log data can be transformed into a structured format suitable for machine learning and data mining tasks, facilitating efficient analysis and knowledge extraction from log files.

Complete Article List

Search this Journal:
Reset
Volume 26: 1 Issue (2024)
Volume 25: 1 Issue (2023)
Volume 24: 5 Issues (2022)
Volume 23: 4 Issues (2021)
Volume 22: 4 Issues (2020)
Volume 21: 4 Issues (2019)
Volume 20: 4 Issues (2018)
Volume 19: 4 Issues (2017)
Volume 18: 4 Issues (2016)
Volume 17: 4 Issues (2015)
Volume 16: 4 Issues (2014)
Volume 15: 4 Issues (2013)
Volume 14: 4 Issues (2012)
Volume 13: 4 Issues (2011)
Volume 12: 4 Issues (2010)
Volume 11: 4 Issues (2009)
Volume 10: 4 Issues (2008)
Volume 9: 4 Issues (2007)
Volume 8: 4 Issues (2006)
Volume 7: 4 Issues (2005)
Volume 6: 1 Issue (2004)
Volume 5: 1 Issue (2003)
Volume 4: 1 Issue (2002)
Volume 3: 1 Issue (2001)
Volume 2: 1 Issue (2000)
Volume 1: 1 Issue (1999)
View Complete Journal Contents Listing