TY - GEN
T1 - Overlay Spreadsheets
AU - Kennedy, Oliver
AU - Glavic, Boris
AU - Brachmann, Michael
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/7/21
Y1 - 2023/7/21
N2 - Efforts to scale spreadsheets either follow a 'virtual' strategy that layers a spreadsheet interface on top of an existing database engine or a 'materialized' strategy based on re-engineering a spreadsheet engine. Because databases are not optimized for spreadsheet access patterns, the materialized approach has better performance. However, the virtual approach offers several advantages that can not be easily replicated in the materialized approach, including the ability to re-apply user interactions to an updated input dataset. We propose the overlay update model, a hybrid approach that overlays user updates on an existing dataset (as in the virtual approach) and indexes user updates (as in the materialized approach). A key feature of our approach is storing updates generated by bulk operations (e.g., copy/paste) as compact "patterns"that can be leveraged to reduce execution costs. We implement an overlay spreadsheet over Apache Spark and demonstrate that, compared to DataSpread (a materialized spreadsheet), it can significantly reduce execution costs.
AB - Efforts to scale spreadsheets either follow a 'virtual' strategy that layers a spreadsheet interface on top of an existing database engine or a 'materialized' strategy based on re-engineering a spreadsheet engine. Because databases are not optimized for spreadsheet access patterns, the materialized approach has better performance. However, the virtual approach offers several advantages that can not be easily replicated in the materialized approach, including the ability to re-apply user interactions to an updated input dataset. We propose the overlay update model, a hybrid approach that overlays user updates on an existing dataset (as in the virtual approach) and indexes user updates (as in the materialized approach). A key feature of our approach is storing updates generated by bulk operations (e.g., copy/paste) as compact "patterns"that can be leveraged to reduce execution costs. We implement an overlay spreadsheet over Apache Spark and demonstrate that, compared to DataSpread (a materialized spreadsheet), it can significantly reduce execution costs.
UR - https://www.scopus.com/pages/publications/85167947315
U2 - 10.1145/3597465.3605220
DO - 10.1145/3597465.3605220
M3 - Conference contribution
AN - SCOPUS:85167947315
T3 - HILDA 2023 - Workshop on Human-In-the-Loop Data Analytics - Co-located with SIGMOD 2023
BT - HILDA 2023 - Workshop on Human-In-the-Loop Data Analytics - Co-located with SIGMOD 2023
PB - Association for Computing Machinery, Inc
T2 - 2023 Workshop on Human-In-the-Loop Data Analytics, HILDA 2023 - Co-located with SIGMOD 2023
Y2 - 18 June 2023 through 18 June 2023
ER -