TY - GEN
T1 - A data-aware workflow scheduling algorithm for heterogeneous distributed systems
AU - Yin, Dengpan
AU - Kosar, Tevfik
PY - 2011
Y1 - 2011
N2 - The workflow scheduling problem in heterogeneous distributed systems is hard to solve due to both intermediate data transfer time and the computation time for each task being considered. The heterogeneity of the computing power of distributed computational sites and the bandwidth between them makes the scheduling problem challenging. In this study, we improve a heuristic-based data-aware algorithm to find the optimal scheduling so that the turnaround time of the workflow is minimized. Our improved algorithm outperforms the existing algorithms in both performance and time efficiency in most cases. We also extend our algorithm to solve the co-scheduling problem. In this problem, each task of the workflow can request data from a remote data site before its execution; and also store important intermediate data to a remote data site after the execution. The results show that the turnaround time of the workflow can be shortened significantly using our data-aware algorithm compared to the existing optimal algorithms.
AB - The workflow scheduling problem in heterogeneous distributed systems is hard to solve due to both intermediate data transfer time and the computation time for each task being considered. The heterogeneity of the computing power of distributed computational sites and the bandwidth between them makes the scheduling problem challenging. In this study, we improve a heuristic-based data-aware algorithm to find the optimal scheduling so that the turnaround time of the workflow is minimized. Our improved algorithm outperforms the existing algorithms in both performance and time efficiency in most cases. We also extend our algorithm to solve the co-scheduling problem. In this problem, each task of the workflow can request data from a remote data site before its execution; and also store important intermediate data to a remote data site after the execution. The results show that the turnaround time of the workflow can be shortened significantly using our data-aware algorithm compared to the existing optimal algorithms.
KW - Data intensive supercomputing
KW - Grid and cluster computing
KW - Large scale scientific computing
KW - Large scale systems
KW - Workflow scheduling
UR - https://www.scopus.com/pages/publications/80053038167
U2 - 10.1109/HPCSim.2011.5999814
DO - 10.1109/HPCSim.2011.5999814
M3 - Conference contribution
AN - SCOPUS:80053038167
SN - 9781612843810
T3 - Proceedings of the 2011 International Conference on High Performance Computing and Simulation, HPCS 2011
SP - 114
EP - 120
BT - Proceedings of the 2011 International Conference on High Performance Computing and Simulation, HPCS 2011
T2 - 2011 International Conference on High Performance Computing and Simulation, HPCS 2011
Y2 - 4 July 2011 through 8 July 2011
ER -