Over a decade of continual expansion in networking and cloud computing has naturally created an increased demand for cybersecurity solutions. Due to the large number of communication devices and content, it is ideal that these cybersecurity solutions are automated. Unfortunately, malicious content and/or activity is often designed to “look” normal and new malicious attacks are repeatedly being developed as older attacks become detectable. Furthermore, the amount of normal content is much greater than the amount of malicious content which creates an imbalance in the data. These adversarial issues create a particularly difficult environment for an automated detection system. In this research, we present a framework for an HTML malicious file detection system that uses various aspects of machine learning techniques (feature preprocessing, generating synthetic samples, ensemble classifiers, etc...) to mitigate the issue with data imbalance. Additionally, we analyze and compare the performance of various detection techniques used in our framework