Skip to main navigation Skip to search Skip to main content

Provenance-aware versioned dataworkspaces

  • Xing Niu
  • , Bahareh Sadat Arab
  • , Dieter Gawlick
  • , Zhen Hua Liu
  • , Vasudha Krishnaswamy
  • , Oliver Kennedy
  • , Boris Glavic
  • Illinois Institute of Technology
  • Oracle Corporation

Research output: Contribution to conferencePaperpeer-review

5 Scopus citations

Abstract

Data preparation, curation, and analysis tasks are often exploratory in nature, with analysts incrementally designing workflows that transform, validate, and visualize their input sources. This requires frequent adjustments to data and workflows. Unfortunately, in current data management systems, even small changes can require time- and resource-heavy operations like materialization, manual version management, and re-execution. This added overhead discourages exploration. We present Provenance-aware Versioned Dataworkspaces (PVDs), our vision of a sandboxed environment in which users can apply — and more importantly, easily undo — changes to their data and workflows. A PVD keeps a log of the user’s operations in a light-weight version graph structure. We describe a model for PVDs that admits efficient automatic refresh, merging of histories, reenactment, and automated conflict resolution. We also highlight the conceptual and technical challenges that need to be overcome to create a practical PVD.

Original languageEnglish
StatePublished - 2016
Event8th USENIX Workshop on the Theory and Practice of Provenance, TaPP 2016 - Washington, United States
Duration: Jun 8 2016Jun 9 2016

Conference

Conference8th USENIX Workshop on the Theory and Practice of Provenance, TaPP 2016
Country/TerritoryUnited States
CityWashington
Period06/8/1606/9/16

Fingerprint

Dive into the research topics of 'Provenance-aware versioned dataworkspaces'. Together they form a unique fingerprint.

Cite this