TY - GEN
T1 - SchemaDrill
T2 - 2018 Workshop on Human-In-the-Loop Data Analytics, HILDA 2018
AU - Spoth, William
AU - Xie, Ting
AU - Kennedy, Oliver
AU - Yang, Ying
AU - Hammerschmidt, Beda
AU - Liu, Zhen Hua
AU - Gawlick, Dieter
N1 - Publisher Copyright:
© 2018 Copyright held by the owner/author(s). Publication rights licensed to the Association for Computing Machinery.
PY - 2018/6/10
Y1 - 2018/6/10
N2 - Ad-hoc data models like JSON make it easy to evolve schemas and to multiplex different data-types into a single stream. This flexibility makes JSON great for generating data, but also makes it much harder to query, ingest into a database, and index. In this paper, we explore the first step of JSON data loading: schema design. Specifically, we consider the challenge of designing schemas for existing JSON datasets as an interactive problem. We present SchemaDrill, a roll-up/drill-down style interface for exploring collections of JSON records. SchemaDrill helps users to visualize the collection, identify relevant fragments, and map it down into one or more flat, relational schemas. We describe and evaluate two key components of SchemaDrill: (1) A summary schema representation that significantly reduces the complexity of JSON schemas without a meaningful reduction in information content, and (2) A collection of schema visualizations that help users to qualitatively survey variability amongst different schemas in the collection.
AB - Ad-hoc data models like JSON make it easy to evolve schemas and to multiplex different data-types into a single stream. This flexibility makes JSON great for generating data, but also makes it much harder to query, ingest into a database, and index. In this paper, we explore the first step of JSON data loading: schema design. Specifically, we consider the challenge of designing schemas for existing JSON datasets as an interactive problem. We present SchemaDrill, a roll-up/drill-down style interface for exploring collections of JSON records. SchemaDrill helps users to visualize the collection, identify relevant fragments, and map it down into one or more flat, relational schemas. We describe and evaluate two key components of SchemaDrill: (1) A summary schema representation that significantly reduces the complexity of JSON schemas without a meaningful reduction in information content, and (2) A collection of schema visualizations that help users to qualitatively survey variability amongst different schemas in the collection.
UR - https://www.scopus.com/pages/publications/85050258950
U2 - 10.1145/3209900.3209908
DO - 10.1145/3209900.3209908
M3 - Conference contribution
AN - SCOPUS:85050258950
T3 - Proceedings of the Workshop on Human-In-the-Loop Data Analytics, HILDA 2018
BT - Proceedings of the Workshop on Human-In-the-Loop Data Analytics, HILDA 2018
PB - Association for Computing Machinery, Inc
Y2 - 10 June 2018
ER -