Skip to main navigation Skip to search Skip to main content

Generating vulnerable code via learning-based program transformations

  • Washington State University Pullman
  • University of Texas at Dallas

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

5 Scopus citations

Abstract

Software vulnerabilities are a major source of cybersecurity threats. Therefore, it is of paramount importance to defend against (e.g., detect and repair) them. Data-driven approaches, especially those based on machine/deep learning (ML/DL), have demonstrated a great potential to that end. To achieve practical efficacy, these approaches rely on a large number of training samples. However, currently such samples, especially those that are known as vulnerable, are not richly available, immediately impeding ML/DL applications for software vulnerability analysis. Moreover, these samples would also meet the critical need for making scientific progress in software assurance through objective benchmarking of existing techniques and tools. Sensor attacks are a severe threat in cyber-physical systems (CPSs) and may cause serious personal casualties and huge economic losses. Adversaries can even non-invasively launch such sensor attacks without much domain knowledge or expensive equipment. The increasingly large scale and high autonomy in CPSs also emphasizes this issue. The strong need motivates many sensor attack detection methods to defend CPSs. AI-enabled sensor attack detection methods stand out among them because they are suitable for dealing with a large amount of CPS data with temporal and spatial dependencies while not requiring domain-specific knowledge. This chapter introduces the background of CPSs and sensor attacks, and demonstrates the workflow of designing AI-enabled sensor attack detectors. Finally, two case studies show how AI empowers sensor attack detection.In this chapter, we describe a learning-based approach to generating vulnerable code samples, so as to empower both the scientific assessment of extant software security defense solutions and the development of new ones. We formulate the sample generation problem as that of learning the common patterns of code changes that introduce vulnerabilities in existing (seed) samples, followed by applying such changes to given clean programs. We also present our empirical results that show the promise and discuss the gaps with our approach, while examining several key factors in the design of effective DL-based sample generation.

Original languageEnglish
Title of host publicationAI Embedded Assurance for Cyber Systems
PublisherSpringer International Publishing
Pages123-138
Number of pages16
ISBN (Electronic)9783031426377
ISBN (Print)9783031426360
DOIs
StatePublished - Dec 12 2023

Fingerprint

Dive into the research topics of 'Generating vulnerable code via learning-based program transformations'. Together they form a unique fingerprint.

Cite this