TY - GEN
T1 - Towards a distributed infrastructure for data-driven discoveries & analysis
AU - Elshambakey, Mohammed
AU - Khalefa, Mohamed
AU - Tolone, William J.
AU - Das Bhattacharjee, Sreyasee
AU - Lee, Huikyo
AU - Cinquini, Luca
AU - Schlueter, Shannon
AU - Cho, Isaac
AU - Dou, Wenwen
AU - Crichton, Daniel J.
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/7/1
Y1 - 2017/7/1
N2 - Big data analytics traditionally involves download of massive amounts of datasets to common server/cluster for processing. Analytic process gets slower with increasing size of required data and network conditions. Data scientists also need explicit access to data locations to download required data. Explicit access to required data may not always be granted due to security reasons. To simplify and accelerate the analytics process on distributed big data with security considerations, we proposed the Virtual Information Fabric Infrastructure (VIFI) for data driven discoveries. Instead of moving large amounts of data to a common place of processing, VIFI allows automatic transfer of required analytics programs to the distributed data locations for in-place processing of relevant data. VIFI allows data scientists to conduct and coordinate complex analytics processes on distributed data repositories using containerization technology and open-source workflow design tools. VIFI alleviates users from having detailed knowledge of distributed data locations, as well as required dependencies, installation and configuration of analytical libraries. In this paper, we demonstrate our current and future work to improve the VIFI architecture using previous and additional uses cases, data management layer that simplifies search of relevant data sets through addition of metadata, integration with security policies at different institutions with the proposed VIFI security layer, and the use of a user-friendly web interface to carry different VIFI activities.
AB - Big data analytics traditionally involves download of massive amounts of datasets to common server/cluster for processing. Analytic process gets slower with increasing size of required data and network conditions. Data scientists also need explicit access to data locations to download required data. Explicit access to required data may not always be granted due to security reasons. To simplify and accelerate the analytics process on distributed big data with security considerations, we proposed the Virtual Information Fabric Infrastructure (VIFI) for data driven discoveries. Instead of moving large amounts of data to a common place of processing, VIFI allows automatic transfer of required analytics programs to the distributed data locations for in-place processing of relevant data. VIFI allows data scientists to conduct and coordinate complex analytics processes on distributed data repositories using containerization technology and open-source workflow design tools. VIFI alleviates users from having detailed knowledge of distributed data locations, as well as required dependencies, installation and configuration of analytical libraries. In this paper, we demonstrate our current and future work to improve the VIFI architecture using previous and additional uses cases, data management layer that simplifies search of relevant data sets through addition of metadata, integration with security policies at different institutions with the proposed VIFI security layer, and the use of a user-friendly web interface to carry different VIFI activities.
UR - https://www.scopus.com/pages/publications/85047760387
U2 - 10.1109/BigData.2017.8258526
DO - 10.1109/BigData.2017.8258526
M3 - Conference contribution
AN - SCOPUS:85047760387
T3 - Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
SP - 4738
EP - 4740
BT - Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
A2 - Nie, Jian-Yun
A2 - Obradovic, Zoran
A2 - Suzumura, Toyotaro
A2 - Ghosh, Rumi
A2 - Nambiar, Raghunath
A2 - Wang, Chonggang
A2 - Zang, Hui
A2 - Baeza-Yates, Ricardo
A2 - Baeza-Yates, Ricardo
A2 - Hu, Xiaohua
A2 - Kepner, Jeremy
A2 - Cuzzocrea, Alfredo
A2 - Tang, Jian
A2 - Toyoda, Masashi
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 5th IEEE International Conference on Big Data, Big Data 2017
Y2 - 11 December 2017 through 14 December 2017
ER -