Skip to main navigation Skip to search Skip to main content

AtomicVAD: A tiny voice activity detection model for efficient inference in intelligent IoT systems

  • SUNY Buffalo
  • Universidad del Norte

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

This paper introduces AtomicVAD, an ultra-lightweight, end-to-end voice activity detection (VAD) model designed for inference on resource-constrained microcontrollers at the extreme edge. Existing VAD models often rely on large architectures with thousands of trainable parameters, making them impractical for deployment on low-power microcontrollers commonly used in internet of things systems. Even with compression methods such as quantization or pruning, these models typically fail to achieve low-latency performance under strict power and memory limits. AtomicVAD overcomes these limitations through the introduction of the General Growing Cosine Unit, a trainable oscillatory activation function that embeds feature learning within periodic modulations. This design enables remarkable efficiency with approximately 0.3k trainable parameters, representing a 99.7 % reduction compared to commonly used baselines like MarbleNet, while maintaining competitive accuracy. Evaluated on the challenging AVA-Speech benchmark, AtomicVAD achieves an AUROC of 0.903 and an F2-score of 0.891, outperforming larger state-of-the-art systems and demonstrating robustness to background noise and music. Optimized for extreme efficiency, AtomicVAD enables ultra-low latency inference —as low as 26 ms on a 240 MHz Cortex-M7 and 1.22 s on a 64 MHz Cortex-M4F— facilitated by INT8 quantization. Its memory footprint remains below 75 kB Flash and 65 kB SRAM. A real-world LoRaWAN field trial further validated its practicality, showing that on-device speech gating eliminates unnecessary, bandwidth-intensive audio uploads, reducing over-the-air delays from minutes to milliseconds. Key use cases include remote monitoring, smart-home control, disaster-response sensor networks, and other long-range, low-power systems requiring efficient, always-on audio processing.

Original languageEnglish
Article number101822
JournalInternet of Things (The Netherlands)
Volume35
DOIs
StatePublished - Jan 2026

Keywords

  • Internet of things
  • Microcontrollers
  • Oscillatory activation functions
  • TinyML
  • Voice activity detection

Fingerprint

Dive into the research topics of 'AtomicVAD: A tiny voice activity detection model for efficient inference in intelligent IoT systems'. Together they form a unique fingerprint.

Cite this