Skip to main navigation Skip to search Skip to main content

CIF21 DIBBs: EI: Vizier, Streamlined Data Curation

Project: Research

Project Details

Description

Big Data promises to have a positive impact on many aspects of our lives, but assembling the data to answer questions or derive predictive models can be challenging. Data scientists must typically go through multiple rounds of curation, or 'wrangling,' where data are organized, refined, cleaned up, and merged together before they can be analyzed. Curation is often slow and costly, but is essential for obtaining useful and trustworthy answers. This project develops a software tool called Vizier that aims to streamline data curation and enable domain experts who do not have computer science expertise to curate their own data. Easier curation magnifies the value of big data by enabling a wide range of users to improve data quality, and in doing so benefits numerous types of data-driven work in government, industry, and science. Vizier features an intuitive interface combining elements of notebooks and spreadsheets, allowing analysts to quickly see, edit, and revise data. This capability is complemented by a framework for automated data cleaning steps that are seamlessly integrated with manual curation operations. The heart of Vizier is a system for managing uncertainty and provenance of curation workflows and data, enabling the user to keep track of higher-level curation operations as well as track the lineage of data. By transparently maintaining the history of all the user's actions and their effect on the curated data, Vizier enables regret-free exploration and curation where any changes to the data and their transitive effects can be undone. By learning from past curation histories, the system will also be able to provide users with context-dependent recommendations for additional curation actions. This award by the Advanced Cyberinfrastructure Division is jointly supported by the NSF Directorate for Social, Behavioral and Economic Sciences (Division of Social and Economic Sciences).
StatusFinished
Effective start/end date10/3/1606/30/21

Funding

  • National Science Foundation: $2,749,699.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.