Malware Analysis With Machine Learning: Methods, Challenges, and Future Directions

Malware Analysis With Machine Learning: Methods, Challenges, and Future Directions

Ravi Singh, Piyush Kumar
Copyright: © 2023 |Pages: 23
DOI: 10.4018/978-1-6684-8666-5.ch010
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Malware attacks are growing years after years because of increasing android, IOT along with traditional computing devices. To protect all these devices malware analysis is necessary so that interest of the organizations and individuals can be protected. There are different approaches of malware analysis like static, dynamic and heuristic. As the technology is advancing malware authors also use the advanced malware attacking techniques like obfuscation and packing techniques, which cannot be detect by signature based on static approaches. To overcome all these problems behavior of malware must be analyzed using dynamic approaches. Now a days malware author using some more advanced evasion techniques in which malware suspends its malicious behavior after detecting virtual environment. So, evasion techniques give a new challenge to malware analysis because even dynamic approach some time fails to detect and analyze the malwares.
Chapter Preview
Top

1. Introduction

Malware is any malicious code that can perform some action on the devices without consent of the user. The device may window based devices, android devices or IOT devices. Malware can perform action like information theft or hide, information can encrypt, utilize the system resources like battery, memory and CPU, control the whole system using command and control techniques and some malware can damage the hardware of the devices. So, we can say that malware is big threat to system security because the devices directly or indirectly connected to local network or internet. Nowadays Malware is also using advanced techniques like obfuscation, emulation evasion and crypter and packer so challenges to detect and analyze the malware also becomes tough to cyber security day by day.

In starting machine learning techniques are cast-off for malware detection and classification for known type of malware detection in which whole data set is separated into two parts training dataset and testing dataset. After splitting of data model is trained with training dataset and tested with the testing dataset. Some time dataset is not homogeneous in nature so that to make correct prediction we use k-fold cross validation to understand overall pattern of the dataset by the model.

In traditional machine learning there are different approach for feature extraction which is root parameter of classifier development like static, dynamic and hybrid. In static approach generally we study the structure of malware binaries without execution and for this no need of virtual environment. In dynamic analysis, we look at how malware binaries behave once they've been run. And for the study of behavior of malware analysis safe virtual environment is required. Virtual environment protects the host computer from malware binaries. In hybrid approach we use a combination of both static and dynamic approach. But the problem with hybrid approach is that it enhances the complexity of model.

In Classification we categorize the data points into different classes whatever is exist in that data pattern. In general, different classes is labelled with some label called target or category. In ML model is trained by set of data called training data that may be labelled or un labelled, depends on we are developing supervised model or unsupervised model and accuracy of the model is checked by another set of data called testing data set. In this study we found that different malware detection models gave different techniques as best in extraction, detection, classification, evaluation.

Yet among all these many strategies are:

  • Ranking and choosing features by determining feature significance scores.

  • Dimensionality reduction reduces bias and noise by transforming features into a lower dimension.

  • Ensemble models, which can be combined with either of the preceding two methods, integrate the output of various base models to improve the overall classification performance.

As we know the impact of machine learning techniques are growing exponentially in solving different real time problems like object identification, text recognition, speech recognition and so on in the same way demand of ML model is also increasing in malware detection system because malware authors are using day by day advanced malware development techniques like obfuscation, encryption, packing, evasion etc. And all these advanced malware techniques cannot be defended by traditional malware detection system. Hence now a days we are focusing over advanced machine learning techniques to neutralized the advanced malware development techniques. The problem with machine learning technique is that there is no common consensus i.e., generalization of classification, detection and evaluation. The absence of a common consensus creates misunderstanding, lead to half-true or even wrong simplifications, and even misinform future work. To mitigate these issues, this work full fill following objectives:

  • Provides a thorough mapping of the modern ML methods for the Android malware detection system that have been put forth in the literature.

  • Classifies each contribution according to four different criteria: the metrics selected, the age of the dataset, the classification models, and the performance improvement methods.

  • Gives motivation to researchers to develop a common approach in every aspect including detection, classification and evaluation for malware detection system.

Complete Chapter List

Search this Book:
Reset