Skip to main navigation Skip to search Skip to main content

GAN-based PE file synthesizer for balanced malware detection datasets: safe generation of synthetic benign executables

  • Asad Mehmood
  • , Muhammad Rizwan
  • , Muhammad Iqbal
  • , Muhammad Hanif
  • , Muhammad Majid Gulzar*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

The performance of machine learning (ML)-based malware detection systems strongly depends on the availability of balanced and well-labeled datasets. However, the scarcity of benign Portable Executable (PE) files often leads to class imbalance, compromising classifier performance. This paper presents a novel PE File Synthesizer (PEFS) framework that employs Generative Adversarial Networks (GANs) to generate synthetic benign data for augmenting malware detection datasets. The proposed method extracts structured PE header features from real benign files and trains a Wasserstein GAN to produce 9,000 synthetic feature vectors. These vectors are then reconstructed into structurally valid but non-executable PE files, designed solely for static analysis and classifier training. The generated samples are validated using sandbox analysis (Cuckoo) and VirusTotal to ensure benign characteristics. When integrated into the training data, these synthesized instances improve classifier performance, particularly in detecting benign samples, as demonstrated using Random Forest metrics such as precision, recall, and F1-score. This work highlights the viability of GAN-based data augmentation for enhancing PE datasets while ensuring safety and privacy through non-executable synthetic file generation.

Original languageEnglish
Article number36
JournalJournal of Computer Virology and Hacking Techniques
Volume22
Issue number1
DOIs
StatePublished - Dec 2026

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive licence to Springer-Verlag France SAS, part of Springer Nature 2026.

Keywords

  • Generative Adversarial Networks (GANs)
  • Machine learning in cybersecurity
  • Malware detection
  • Portable Executable (PE) Files
  • Synthetic data generation

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Software
  • Hardware and Architecture
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'GAN-based PE file synthesizer for balanced malware detection datasets: safe generation of synthetic benign executables'. Together they form a unique fingerprint.

Cite this