Skip to main navigation Skip to search Skip to main content

Predictive Modeling of HPC Job Queue Times: Improving User Decision-Making and Resource Utilization

  • Tufts University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This work presents a framework for estimating job wait times in High-Performance Computing (HPC) scheduling queues, leveraging historical job scheduling data and real-time system metrics. Using machine learning techniques, specifically Random Forest and Multi-Layer Perceptron (MLP) models, we demonstrate high accuracy in predicting wait times, achieving 94.2% reliability within a 10-minute error margin. The framework incorporates key features such as requested resources, queue occupancy, and system utilization, with ablation studies revealing the significance of these features. Additionally, the framework offers users wait time estimates for different resource configurations, enabling them to select optimal resources, reduce delays, and accelerate computational workloads. Our approach provides valuable insights for both users and administrators to optimize job scheduling, contributing to more efficient resource management and faster time to scientific results.

Original languageEnglish
Title of host publicationPEARC 2025 - Practice and Experience in Advanced Research Computing 2025
Subtitle of host publicationThe Power of Collaboration
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9798400713989
DOIs
StatePublished - Jul 18 2025
Event2025 Practice and Experience in Advanced Research Computing, PEARC 2025 - Columbus, United States
Duration: Jul 20 2025Jul 24 2025

Publication series

NamePEARC 2025 - Practice and Experience in Advanced Research Computing 2025: The Power of Collaboration

Conference

Conference2025 Practice and Experience in Advanced Research Computing, PEARC 2025
Country/TerritoryUnited States
CityColumbus
Period07/20/2507/24/25

Fingerprint

Dive into the research topics of 'Predictive Modeling of HPC Job Queue Times: Improving User Decision-Making and Resource Utilization'. Together they form a unique fingerprint.

Cite this