Laying out the groundwork for the series, making it accessible for amateurs and specialists in Information Security with no knowledge of Machine Learning and visa versa. The main focus here is giving an overview of machine learning methods and exploring the nature of the dataset (PE32(+) executable files).
Here we explore the problem by inference, evaluating work completed by others to date. The goals here are centred around ascertaining the state of the art and what needs to be done to drive forward better solutions in ML for static PE malware detection. The largest analysis of works to date was performed and is linked here.
This article builds on what can be drawn from the surveys and analysis that have been completed, creating a roadmap for future development.
The next write-up to be produced will focus on the preceding steps from the third piece in deploying the roadmap laid out.
Unique points of note in this article series (where novel work has been completed) are as follows:
- A ‘survey of surveys’, curation and timeline of prior literature reviews on the topic of the application of machine learning to static malware detection methods.
- An extension of the work of Shalaginov et al, producing the largest survey of works completed in the field to date, along with a centralized location for progression and further development in this area.
- Preliminary analysis of patents as a source of information on this topic largely neglected by academics to date.
- Conceptualization of a black box approach to the derivation of a methodology.
- Conceptualization of an agile framework for machine learning research as an alternative to a conventional waterfall based approach with the goal of increasing research quality and efficiency.
Research Project Motivation
Despite the growing importance of the ability of organisations to combat malware, the ability of antivirus solutions to detect new species remains limited, with a large range of public information on how AV can easily be evaded. This series focuses on file based malware detection as the most important subdivision of the overarching problem, with 65% of attacks on enterprises in 2017 originating from file based malware. More anecdotally; there is a substantial amount of online that can be used to create payloads capable of evading all current AV solutions with relative ease.
Click here for a list of resources on performing effective AV evasion.
It seems clear that conventional static detection methodologies are not satisfactory as a means of effective malware detection. Conversely, as the main inhibiting factors on the application of machine learning approaches to this problem are reduced, it seems that ML represents an increasingly promising solution going into the future.
If you wish to contribute to this knowledge base, feel free to contact us here and we’ll get back to you within 24 hours – [email protected] (All posts will be fully credited).