Article Preview
TopIntroduction
Security analytics systems rely upon data sourced from multiple network infrastructure devices such as Intrusion Prevention and Detection Systems (IDPS), network firewalls and routers, network switches and various application firewalls. Before this data can be analyzed for possible security threats, it needs to be collected. Therefore, data collection is a crucial and critical step in the cyber analytics process. Consequently, data collection might as well be a performance-critical path for analytics systems (Ramah et al., 2006), (Qadeer et al., 2010), especially when the need to consume big data or perform analysis in real time arises.
The figure below presents a typical cyber analytics process.
Figure 1.
Typical analytics process
Different analytics applications will be the consumers of the data collected and processed in the preceding stages. Suppose the analytics applications and consequent processes are time sensitive. In that case, the data collection stage must make the ingested data available as quickly as possible, while at the same time collecting sufficient data on which to form accurate inferences.
While detection capability remains key in any cyber response system, the timeliness of the detection is even more paramount as attacks detected too late would have caused significant damage by the re-action time.
This paper presents a data collection framework that enables the real time detection and response of cyber threats. The near total elimination of local long-term storage of collected data saves significant time cost complexity. The use of state-of-the-art real time streaming technologies ensures that data is available to analytics applications as soon as practically possible, enabling our analytics applications to implement real-time reactions.
The innovation of our solution comes in several ways. Firstly, the system ingests from multiple source types including external cyber threat intelligence. This improves the maturity capability of the overall security operations. Secondly, the architecture improves the storage layer by allowing in-memory analytics, which improves the overall detection and response time. Further, the architecture embraces modern technologies to enable real time streaming and analysis of security events, mitigating technology limitations prevalent in the state-of-the-art solutions.
Our contribution is a flexible, scalable, expansible, and multi-source collection architecture and framework for data collection that enables timely detection of security threats and response.
The rest of this paper is organized as follows:
First, we review some of the recent research in data collection for cyber security, where we critically analyze and highlight the research gaps this paper addresses. Then we present our proposed framework, highlighting the architectural pillars that differentiate our work. We then illustrate an implementation based on our Framework, followed by experimentation and results. We conclude this work and propose some future work.
TopVarious research on data collection methods and technologies can be found in the research literature.
The collection module proposed in (Razaq et al., 2016) populates security-related data in a local MySQL database, after which a Hadoop snoop job exports the data to an off-shore data store based on Hadoop File System. Analytics applications then run atop the data in the Hadoop system.
The real-time cyber threat detection platform in (Carvalho et al., 2016) collects data from both internal and external sources. After some pre-processing, the data is loaded into multiple databases according to data type (Malware Database, Social Media Database, Email Database, etc.). Big data analytics is then deployed using machine learning algorithms that train and detect threats in real-time data flows.
Open Source technologies are used in (R. More et al., 2017) to detect threats in real time. Captured Sensor data is uploaded to Apache Hadoop Clusters before being trained and classified using Apache Mahout.