Skip to main navigation Skip to search Skip to main content

EMLIO: Minimizing I/O Latency and Energy Consumption for Large-Scale AI Training

  • SUNY Buffalo
  • Tennessee Technological University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Large-scale deep learning workloads increasingly suffer from I/O bottlenecks as datasets grow beyond local storage capacities and GPU compute outpaces network and disk latencies. While recent systems optimize data-loading time, they overlook the energy cost of I/O-a critical factor at large scale. We introduce EMLIO, an Efficient Machine Learning I/O service that jointly minimizes end-to-end data-loading latency (T) and I/O energy consumption (E) across variable-latency networked storage. EMLIO deploys a lightweight data-serving daemon on storage nodes that serializes and batches raw samples, streams them over TCP with out-of-order prefetching, and integrates seamlessly with GPU-accelerated (NVIDIA DALI) preprocessing on the client side. In exhaustive evaluations over local disk, LAN (0.05 ms & 10 ms round trip time (RTT)), and WAN (30 ms RTT) environments, EMLIO delivers on average up to 8.6× faster I/O and 10.9× lower energy use compared to state-of-the-art loaders, while maintaining constant performance and energy profiles irrespective of network distance. EMLIO’s service-based architecture offers a scalable blueprint for energy-aware I/O in next-generation AI clouds.

Original languageEnglish
Title of host publicationProceedings of 2025 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC 2025 Workshops
PublisherAssociation for Computing Machinery, Inc
Pages2022-2031
Number of pages10
ISBN (Electronic)9798400718717
DOIs
StatePublished - Nov 15 2025
Event2025 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC 2025 Workshops - St. Louis, United States
Duration: Nov 16 2025Nov 21 2025

Publication series

NameProceedings of 2025 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC 2025 Workshops

Conference

Conference2025 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC 2025 Workshops
Country/TerritoryUnited States
CitySt. Louis
Period11/16/2511/21/25

Keywords

  • GPU-accelerated preprocessing
  • I/O latency
  • data-loading
  • deep learning
  • distributed storage
  • energy-efficency

Fingerprint

Dive into the research topics of 'EMLIO: Minimizing I/O Latency and Energy Consumption for Large-Scale AI Training'. Together they form a unique fingerprint.

Cite this