Continuing from the second piece in the series, we will now use the review work that has been performed to lay out a roadmap for future development.
Development from this point onward will take part in two separate but linked sub-projects:
- Model Development
- Development of Associated Technologies to assist with model development.
Development Methodologies and associated research
We noted the approach taken to model development was in every case waterfall based (Fig 1), a development methodology that suits projects with a clearly defined start and finish, with consistent resources assumed throughout. We would suggest that for a field such as this – applying machine learning to malware detection, an agile, iterative approach would much better suit the problem, ideally one that allowed researchers with specialisms in either machine learning or malware detection to quickly make a contribution at any time to an ongoing project.
On this basis we derived an agile framework for iterative ML model development in this field (fig 3), which should go a long way towards catering for contributions from people with specialisms in either machine learning or malware (but not necessarily both). As well as those that do not necessarily have the time to dedicate to full (or even part time) research and want to make a quick but useful contribution.
While the usage of Fig 3 (and developed variants of Fig 3) would theoretically provide substantial benefits over a conventional approach, practical application and adoption would remain problems that exist independently of this. To enhance the direct practical applicability of Fig 3 we have conceptualized and performed initial design work for a web application to assist with this. This application will not only assist with developing models for malware detection, but also more generally for models
Model DEVELOPMENT – Starting the Lifecycle
Immediate next steps for model development will be to build upon the agile framework documented in the previous article, documenting the process as it progresses. We will also look to build on what was found in the analysis and the conclusions that were drawn as a result. The first material result of this will be to find and attempt to both standardize and centralize malicious and benign datasets.