TY - GEN
T1 - Distributed Optimization over Block-Cyclic Data
AU - Ding, Yucheng
AU - Niu, Chaoyue
AU - Yan, Yikai
AU - Zheng, Zhenzhe
AU - Wu, Fan
AU - Chen, Guihai
AU - Tang, Shaojie
AU - Jia, Rongfei
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s).
PY - 2024/12/26
Y1 - 2024/12/26
N2 - We consider practical data characteristics underlying federated learning, where unbalanced and non-i.i.d. data from clients have a block-cyclic structure: each cycle contains several blocks, and each client’s training data follow block-specific and non-i.i.d. distributions. Such a data structure would introduce client and block biases during the collaborative training: the single global model would be biased towards the client or block specific data. To overcome the biases, we propose a new distributed optimization algorithm called multi-model parallel stochastic gradient descent (MM-PSGD) with a convergence rate of O(1/√NT), where N is the number of total clients and T is the total iteration number, achieving a linear speedup with respect to the number of clients. In particular, MM-PSGD adopts the block-mixed training strategy and creates a specific predictor for each block by averaging the historical global models generated in this block from different cycles. We extensively evaluate our algorithm over the CIFAR-10 dataset. Evaluation results demonstrate that our algorithm significantly outperforms the conventional federated averaging algorithm in terms of test accuracy, and also preserves robustness for the variance of critical parameters.
AB - We consider practical data characteristics underlying federated learning, where unbalanced and non-i.i.d. data from clients have a block-cyclic structure: each cycle contains several blocks, and each client’s training data follow block-specific and non-i.i.d. distributions. Such a data structure would introduce client and block biases during the collaborative training: the single global model would be biased towards the client or block specific data. To overcome the biases, we propose a new distributed optimization algorithm called multi-model parallel stochastic gradient descent (MM-PSGD) with a convergence rate of O(1/√NT), where N is the number of total clients and T is the total iteration number, achieving a linear speedup with respect to the number of clients. In particular, MM-PSGD adopts the block-mixed training strategy and creates a specific predictor for each block by averaging the historical global models generated in this block from different cycles. We extensively evaluate our algorithm over the CIFAR-10 dataset. Evaluation results demonstrate that our algorithm significantly outperforms the conventional federated averaging algorithm in terms of test accuracy, and also preserves robustness for the variance of critical parameters.
KW - Block-Cyclic Data
KW - Federated Learning
UR - https://www.scopus.com/pages/publications/85216557763
U2 - 10.1145/3700410.3702128
DO - 10.1145/3700410.3702128
M3 - Conference contribution
AN - SCOPUS:85216557763
T3 - Proceedings of the 6th ACM International Conference on Multimedia in Asia Workshops, MMAsia 2024 Workshops
BT - Proceedings of the 6th ACM International Conference on Multimedia in Asia Workshops, MMAsia 2024 Workshops
PB - Association for Computing Machinery, Inc
T2 - 6th ACM International Conference on Multimedia in Asia Workshops, MMAsia 2024 Workshops
Y2 - 3 December 2024 through 6 December 2024
ER -