Skip to main navigation Skip to search Skip to main content

VULGEN: Realistic Vulnerability Generation Via Pattern Mining and Deep Learning

  • Yu Nong
  • , Yuzhe Ou
  • , Michael Pradel
  • , Feng Chen
  • , Haipeng Cai
  • Washington State University Pullman
  • University of Texas at Dallas
  • University of Stuttgart

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

35 Scopus citations

Abstract

Building new, powerful data-driven defenses against prevalent software vulnerabilities needs sizable, quality vulnerability datasets, so does large-scale benchmarking of existing defense solutions. Automatic data generation would promisingly meet the need, yet there is little work aimed to generate much-needed quality vulnerable samples. Meanwhile, existing similar and adaptable techniques suffer critical limitations for that purpose. In this paper, we present VULGEN, the first injection-based vulnerability-generation technique that is not limited to a particular class of vulnerabilities. VULGEN combines the strengths of deterministic (pattern-based) and probabilistic (deep-learning/DL-based) program transformation approaches while mutually overcoming respective weaknesses. This is achieved through close collaborations between pattern mining/application and DL-based injection localization, which separates the concerns with how and where to inject. By leveraging large, pretrained programming language modeling and only learning locations, VULGEN mitigates its own needs for quality vulnerability data (for training the localization model). Extensive evaluations show that VULGEN significantly outperforms a state-of-the-art (SOTA) pattern-based peer technique as well as both Transformer- and GNN-based approaches in terms of the percentages of generated samples that are vulnerable and those also exactly matching the ground truth (by 38.0-430.1% and 16.3-158.2%, respectively). The VULGEN-generated samples led to substantial performance improvements for two SOTA DL-based vulnerability detectors (by up to 31.8% higher in F1), close to those brought by the ground-truth real-world samples and much higher than those by the same numbers of existing synthetic samples.

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE/ACM 45th International Conference on Software Engineering, ICSE 2023
PublisherIEEE Computer Society
Pages2527-2539
Number of pages13
ISBN (Electronic)9781665457019
DOIs
StatePublished - Jul 26 2023
Event45th IEEE/ACM International Conference on Software Engineering, ICSE 2023 - Melbourne, Australia
Duration: May 15 2023May 16 2023

Publication series

NameProceedings - International Conference on Software Engineering
ISSN (Print)0270-5257

Conference

Conference45th IEEE/ACM International Conference on Software Engineering, ICSE 2023
Country/TerritoryAustralia
CityMelbourne
Period05/15/2305/16/23

Keywords

  • Software vulnerability
  • bug injection
  • data generation
  • deep learning
  • pattern mining
  • vulnerability detection

Fingerprint

Dive into the research topics of 'VULGEN: Realistic Vulnerability Generation Via Pattern Mining and Deep Learning'. Together they form a unique fingerprint.

Cite this