A Web Semantic Mining Method for Fake Cybersecurity Threat Intelligence in Open Source Communities

A Web Semantic Mining Method for Fake Cybersecurity Threat Intelligence in Open Source Communities

Zhihua Li, Xinye Yu, Yukai Zhao
Copyright: © 2024 |Pages: 22
DOI: 10.4018/IJSWIS.350095
Article PDF Download
Open access articles are freely available for download

Abstract

In order to overcome the challenges of inadequate classification accuracy in existing fake cybersecurity threat intelligence mining methods and the lack of high-quality public datasets for training classification models, we propose a novel approach that significantly advances the field. We improved the attention mechanism and designed a generative adversarial network based on the improved attention mechanism to generate fake cybersecurity threat intelligence. Additionally, we refine text tokenization techniques and design a detection model to detect fake cybersecurity threats intelligence. Using our STIX-CTIs dataset, our method achieves a remarkable accuracy of 96.1%, outperforming current text classification models. Through the utilization of our generated fake cybersecurity threat intelligence, we successfully mimic data poisoning attacks within open-source communities. When paired with our detection model, this research not only improves detection accuracy but also provides a powerful tool for enhancing the security and integrity of open-source ecosystems.
Article Preview
Top

Introduction

In recent years, generative artificial intelligence technology has spawned products and services represented by large language models that can understand and output multi-modal content. As an essential component of generative artificial intelligence technology, deep fake has been widely applied in many fields due to its “convenience of usage and high generation quality,” (author, year, p.) such as generating various texts, synthesizing speech, and swapping faces in videos. However, while advancing the technology industry, deep fake technology also introduces significant cybersecurity risks. With its powerful generation ability, deep fake technology can generate realistic fake cybersecurity threat intelligence (CTI) (Bai & Wang, 2020; Wang et al., 2024). For example, in February 2023, there was a CTI about an ESXi ransomware attack claiming that the ESXiArgs ransomware was deployed using a vulnerability in VMware ESXi. In January 2024, there was also a fake CTI alleging that data from several large organizations, including Procter & Gamble and the Toronto municipal government, was stolen by exploiting a zero-day vulnerability in the GoAnywhere platform. As these fake CTIs spread in the open-source community, the misuse of these fake CTIs as training samples by cybersecurity defense systems can lead to fake alarms and missed reports of real cyberattacks. Under this circumstance, how to effectively detect fake CTIs in open-source communities is a challenging problem.

Research (Cui et al., 2022; Saxena & Gayathri, 2022) describes the harm caused by fake CTIs. For example, fake CTIs can be used as materials for data poisoning attacks to mislead cybersecurity defense models, causing them to make incorrect judgments. According to their generation method, these fake CTIs can be classified into manually generated fake CTIs, fake CTIs generated from erroneous data caused by cyberattacks, and fake CTIs generated by deep fake technology. In response to these fake CTIs generation methods, various mining methods have been designed in research (Mavroeidis & Bromander, 2017; Kim et al., 2018; Umar & Felemban, 2021; Islam et al., 2022; Skopik & Pahi, 2020; Choudhary & Singh, 2022; Sahoo et al., 2019; Liu et al., 2022; Ahmed et al., 2023; Devlin et al., 2018). Early research (Mavroeidis & Bromander, 2017; Kim et al., 2018; Umar & Felemban, 2021) used expert knowledge to design a structured scoring method to evaluate the confidence of CTIs, thereby detecting these CTIs. Although this kind of method can efficiently detect CTIs in a short time with high accuracy, it only focuses on limited entity features of CTIs. Therefore, when faced with fake CTIs generated by other methods in the open-source community, the classification accuracy of this kind of method is insufficient.

To address the issues of classification accuracy caused by the diversity of CTIs in the open-source community, research (Islam et al., 2022; Skopik & Pahi, 2020; Choudhary & Singh, 2022; Sahoo et al., 2019) classifies fake CTIs through data verification methods. This approach uses cross-validation on multiple data sources to detect fake CTIs derived from cyberattacks which are difficult to detect using structured scoring methods in the open-source community. This method improves the accuracy of the classification to some extent. However, its accuracy depends on the quality and reliability of the selected data sources. When the data source contains erroneous or outdated information, the classification accuracy of this method will decrease. In addition, since deep fake technology uses real data to generate fake CTIs, this method struggles to detect fake CTIs generated by deep fake technology.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 1 Issue (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing