Skip to main navigation Skip to search Skip to main content

Reducing Ambiguity in Json Schema Discovery

  • William Spoth
  • , Oliver Kennedy
  • , Ying Lu
  • , Beda Hammerschmidt
  • , Zhen Hua Liu
  • SUNY Buffalo
  • Oracle Corporation

Research output: Contribution to journalConference articlepeer-review

20 Scopus citations

Abstract

Ad-hoc data models like Json simplify schema evolution and enable multiplexing various data sources into a single stream. While useful when writing data, this flexibility makes Json harder to validate and query, forcing such tasks to rely on automated schema discovery techniques. Unfortunately, ambiguity in the schema design space forces existing schema discovery systems to make simplifying, data-independent assumptions about schema structure. When these assumptions are violated, most notably by APIs, the generated schemas are imprecise, creating numerous opportunities for false positives during validation. In this paper, we propose Jxplain, a Json schema discovery algorithm with heuristics that mitigate common forms of ambiguity. Although Jxplain is slightly slower than state of the art schema extractors, we show that it produces significantly more precise schemas.

Original languageEnglish
Pages (from-to)1732-1744
Number of pages13
JournalProceedings of the ACM SIGMOD International Conference on Management of Data
DOIs
StatePublished - 2021
Event2021 International Conference on Management of Data, SIGMOD 2021 - Virtual, Online, China
Duration: Jun 20 2021Jun 25 2021

Keywords

  • entity detection
  • json
  • json-schema
  • semi-structured

Fingerprint

Dive into the research topics of 'Reducing Ambiguity in Json Schema Discovery'. Together they form a unique fingerprint.

Cite this