Supervised Machine Learning Methods for Cyber Threat Detection Using Genetic Algorithm

Supervised Machine Learning Methods for Cyber Threat Detection Using Genetic Algorithm

Copyright: © 2023 |Pages: 24
DOI: 10.4018/978-1-6684-7702-1.ch002
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Security threats continue to pose enormous challenges to network and applications security, particularly with the emerging IoT technologies and cloud computing services. Current intrusion and threat detection schemes still experience low detection rates and high rates of false alarms. In this study, an optimal set of features were extracted from CSE-CIC-IDS2018 using genetic algorithm. Machine learning algorithms, including random forest, support vector machines, logistic regression, gradient boosting, and naïve bayes were employed for classification and the results compared. Evaluation of the performance of the proposed cyber security threat detection models found random forest as the highest attacks detection with 99.99% accuracy. K-nearest neighbor achieved 99.96% while a detection accuracy of 97.39% was obtained by support vector machines. The model which used gradient boosting obtained an accuracy of 99.97%, and the logistic regression model achieved a 94.94% accuracy. The lowest accuracy rate was obtained by the naïve bayes model with a detection accuracy of 68.84%.
Chapter Preview
Top

Introduction

Cybersecurity measures contain processes, tools, and technologies with various cyber defense systems designed to protect information systems infrastructure (Wazid et al., 2022) to ensure the confidentiality, integrity, and availability of information assets (Almasoudy, Al-Yaseen, & Idrees, 2020; Dua & Xian, 2011; Thakkar & Lohiya, 2020). The cyber threats landscape keeps evolving and threat actors have become more sophisticated with advanced persistent threats (APTs) (Colorossi, 2015). Also, social engineering, ransomware, and fraud are committed through digital identity theft (Wazid et al., 2022). Network intrusion, malware attacks, phishing, unauthorized modification of information, and denial of service attacks negatively impact information systems (Arabo, Dijoux, Poulain, & Chevalier, 2020; Vani & Krishnamurthy, 2018). Though detection and prevention systems exist, attackers strive to evade or adapt to detection schemes to actively exploit vulnerabilities in systems. However, anomalies or sudden changes in systems and user behaviors can be detected if network systems are effectively monitored by intrusion detection systems and appropriate actions are taken (Muller et al., 2018).

Intrusion detection systems leveraging Machine learning (ML) techniques become crucial in detecting malicious activities (Singh et al., 2022), particularly zero-day attacks (Siddique, Akhtar, Lee, Kim, & Kim, 2017). Machine learning algorithms give computers the ability to learn directly from examples and experiences in the form of data points (The Royal Society, 2017; Wazid et al., 2022). They are capable of processing large amounts of data and then making predictions or decisions (Zou, Cui, Huang, & Zhang, 2008). Despite the recent deployments of ML in cyber threat detection, the development of robust and efficient intrusion detection systems remains an ongoing research problem (Gauthama Raman et al., 2017a). Machine learning methods may be employed in cybersecurity to select features from large datasets to improve intrusion detection rates and adaptability (Kunhare et al., 2022). However, most of the techniques suffer some setbacks including dependency on domain knowledge, big data issues resulting in insufficient learning capability, and the apparent lack of modularity and transferability (Wang, 2018). Moreover, intrusion detection systems may encounter high error rates, low true positive rates, and low accuracy. These challenges limit the effectiveness of intrusion detection systems in cybersecurity operations.

Improving the detection of network intrusion depends largely on the comprehensiveness of the dataset used in the training of the ML models and the feature set selected from the dataset (Saibene & Gasparini, 2023). Feature selection is a complex NP-hard problem, nonetheless, it is a determinant of the performance of classification models (Abdulhussien et al., 2023; Vijayanand et al., 2018). Whereas several feature selection methods have been applied to improve ML classifiers, the concept of Genetic Algorithm (GA) for feature reduction and extraction to enhance the performance of intrusion detection systems has not been extensively employed. Therefore, the objectives of this study are to utilize Genetic Algorithm to select optimal features from a modern dataset (CSE-CIC-IDS2018), develop ML models, and evaluate the performance of the models with six classifiers, including Random Forest (RF), K-Nearest Neighbor (KNN), Support Vector Machines (SVM), Gradient Boosting, Logistic Regression, and Naïve Bayes.

Complete Chapter List

Search this Book:
Reset