Skip to main navigation Skip to search Skip to main content

Cloud Services Enable Efficient AI-Guided Simulation Workflows across Heterogeneous Resources

  • Logan Ward
  • , J. Gregory Pauloski
  • , Valerie Hayot-Sasson
  • , Ryan Chard
  • , Yadu Babuji
  • , Ganesh Sivaraman
  • , Sutanay Choudhury
  • , Kyle Chard
  • , Rajeev Thakur
  • , Ian Foster
  • Argonne National Laboratory
  • The University of Chicago
  • Pacific Northwest National Laboratory

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

Applications that fuse machine learning and simulation can benefit from the use of multiple computing resources, with, for example, simulation codes running on highly parallel supercomputers and AI training and inference tasks on specialized accelerators. Here, we present our experiences deploying two AI-guided simulation workflows across such heterogeneous systems. A unique aspect of our approach is our use of cloud-hosted management services to manage challenging aspects of cross-resource authentication and authorization, function-as-a-service (FaaS) function invocation, and data transfer. We show that these methods can achieve performance parity with systems that rely on direct connection between resources. We achieve parity by integrating the FaaS system and data transfer capabilities with a system that passes data by reference among managers and workers, and a user-configurable steering algorithm to hide data transfer latencies. We anticipate that this ease of use can enable routine use of heterogeneous resources in computational science.

Original languageEnglish
Title of host publication2023 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages32-41
Number of pages10
ISBN (Electronic)9798350311990
DOIs
StatePublished - 2023
Event2023 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2023 - St. Petersburg, United States
Duration: May 15 2023May 19 2023

Publication series

Name2023 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2023

Conference

Conference2023 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2023
Country/TerritoryUnited States
CitySt. Petersburg
Period05/15/2305/19/23

Keywords

  • Computational Steering
  • Distributed Systems
  • Function-as-a-Service
  • Heterogeneous Computing
  • Machine Learning

Fingerprint

Dive into the research topics of 'Cloud Services Enable Efficient AI-Guided Simulation Workflows across Heterogeneous Resources'. Together they form a unique fingerprint.

Cite this