Skip to main navigation Skip to search Skip to main content

Comprehensive job level resource usage measurement and analysis for XSEDE HPC systems

  • SUNY Buffalo
  • University of Texas at Austin

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

This paper presents a methodology for comprehensive job level resource use measurement and analysis and applications of the analyses to planning for HPC systems and a case study application of the methodology to the XSEDE Ranger and Lonestar4 systems at the University of Texas. The steps in the methodology are: System-wide collection of resource use and performance statistics at the job and node levels, mapping and storage of the resultant job-wise data to a relational database which eases further implementation and transformation of data to the formats required by specific statistical and analytical algorithms. Analyses can be carried out at different levels of granularity: job, user, or system-wide basis. Measurements are based on a novel lightweight job-centric measurement tool "TACCStats" [1], which gathers a comprehensive set of metrics on all compute nodes. The data mapping and analysis tools will be an extension to the XDMoD project [2] for the XSEDE community. This paper also reports the preliminary results from the analysis of measured data for Texas Advanced Computing Center's Lonestar4 and Ranger supercomputers. The case studies presented indicate the level of detailed information that will be available for all resources when TACCStats is deployed throughout the XSEDE system. The methodology can be applied to any system that runs the TACCStats measurement tool.

Original languageEnglish
Title of host publicationProceedings of the XSEDE 2013 Conference
Subtitle of host publicationGateway to Discovery
DOIs
StatePublished - 2013
EventConference on Extreme Science and Engineering Discovery Environment, XSEDE 2013 - San Diego, CA, United States
Duration: Jul 22 2013Jul 25 2013

Publication series

NameACM International Conference Proceeding Series

Conference

ConferenceConference on Extreme Science and Engineering Discovery Environment, XSEDE 2013
Country/TerritoryUnited States
CitySan Diego, CA
Period07/22/1307/25/13

Keywords

  • Performance analysis
  • System management
  • Tacc-stats
  • Usage analysis
  • Xdmod
  • Xsede

Fingerprint

Dive into the research topics of 'Comprehensive job level resource usage measurement and analysis for XSEDE HPC systems'. Together they form a unique fingerprint.

Cite this