Abstract
The unbounded increase in the size of data generated by scientific applications necessitates collaboration and sharing among the nation’s education and research institutions. Simply purchasing high-capacity, high-performance storage systems and adding them to the existing infrastructure of the collaborating institutions does not solve the underlying and highly challenging data handling problem. Scientists are compelled to spend a great deal of time and energy on solving basic data-handling issues, such as the physical location of data, how to access it, and/or how to move it to visualization and/or compute resources for further analysis. This chapter presents the design and implementation of a reliable and efficient distributed data storage system, PetaShare, which spans multiple institutions across the state of Louisiana. At the back-end, PetaShare provides a unified name space and efficient data movement across geographically distributed storage sites. At the front-end, it provides light-weight clients the enable easy, transparent, and scalable access. In PetaShare, the authors have designed and implemented an asynchronously replicated multi-master metadata system for enhanced reliability and availability. The authors also present a high level cross-domain metadata schema to provide a structured systematic view of multiple science domains supported by PetaShare.
| Original language | English |
|---|---|
| Title of host publication | Data Intensive Distributed Computing |
| Subtitle of host publication | Challenges and Solutions for Large-Scale Information Management |
| Publisher | IGI Global |
| Pages | 118-139 |
| Number of pages | 22 |
| ISBN (Electronic) | 9781615209729 |
| ISBN (Print) | 9781615209712 |
| DOIs | |
| State | Published - Jan 1 2012 |
Fingerprint
Dive into the research topics of 'Metadata Management in PetaShare Distributed Storage Network'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver