TY - GEN
T1 - Optimizing Long-Term Efficiency and Fairness in Ride-Hailing via Joint Order Dispatching and Driver Repositioning
AU - Sun, Jiahui
AU - Jin, Haiming
AU - Yang, Zhaoxing
AU - Su, Lu
AU - Wang, Xinbing
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/8/14
Y1 - 2022/8/14
N2 - The ride-hailing service offered by mobility-on-demand platforms, such as Uber and Didi Chuxing, has greatly facilitated people's traveling and commuting, and become increasingly popular in recent years. Efficiency (e.g., gross merchandise volume) has always been an important metric for such platforms. However, only focusing on the efficiency inevitably ignores the fairness of driver incomes, which could impair the sustainability of the overall ride-hailing system in the long run. To optimize the aforementioned two essential metrics, order dispatching and driver repositioning play an important role, as they impact not only the immediate, but also the future order-serving outcomes of drivers. Thus, in this paper, we aim to exploit joint order dispatching and driver repositioning to optimize both the long-term efficiency and fairness for ride-hailing platforms. To address this problem, we propose a novel multi-agent reinforcement learning framework, referred to as JDRL, to help drivers make distributed order selection and repositioning decisions. Specifically, to cope with the variable action space, JDRL segments the action space into a fixed number of action groups, and fixes the policy output dimension for order selection as the number of action groups. In terms of the fairness criterion, JDRL adopts the max-min fairness, and augments the vanilla policy gradient to an iterative training algorithm that alternates between a minimization step and a policy improvement step to maximize both the worst and the overall performance of agents. In addition, we provide the theoretical convergence guarantee of our JDRL training algorithm even under non-convex policy networks and stochastic gradient updating. Extensive experiments are conducted with three public real-world ride-hailing order datasets, including over 2 million orders in Haikou, China, over 5 million orders in Chengdu, China, and over 6 million orders in New York City, USA. Experimental results show that JDRL demonstrates a consistent advantage compared to state-of-the-art baselines in terms of both efficiency and fairness. To the best of our knowledge, this is the first work that exploits joint order dispatching and driver repositioning to optimize both the long-term efficiency and fairness in a ride-hailing system.
AB - The ride-hailing service offered by mobility-on-demand platforms, such as Uber and Didi Chuxing, has greatly facilitated people's traveling and commuting, and become increasingly popular in recent years. Efficiency (e.g., gross merchandise volume) has always been an important metric for such platforms. However, only focusing on the efficiency inevitably ignores the fairness of driver incomes, which could impair the sustainability of the overall ride-hailing system in the long run. To optimize the aforementioned two essential metrics, order dispatching and driver repositioning play an important role, as they impact not only the immediate, but also the future order-serving outcomes of drivers. Thus, in this paper, we aim to exploit joint order dispatching and driver repositioning to optimize both the long-term efficiency and fairness for ride-hailing platforms. To address this problem, we propose a novel multi-agent reinforcement learning framework, referred to as JDRL, to help drivers make distributed order selection and repositioning decisions. Specifically, to cope with the variable action space, JDRL segments the action space into a fixed number of action groups, and fixes the policy output dimension for order selection as the number of action groups. In terms of the fairness criterion, JDRL adopts the max-min fairness, and augments the vanilla policy gradient to an iterative training algorithm that alternates between a minimization step and a policy improvement step to maximize both the worst and the overall performance of agents. In addition, we provide the theoretical convergence guarantee of our JDRL training algorithm even under non-convex policy networks and stochastic gradient updating. Extensive experiments are conducted with three public real-world ride-hailing order datasets, including over 2 million orders in Haikou, China, over 5 million orders in Chengdu, China, and over 6 million orders in New York City, USA. Experimental results show that JDRL demonstrates a consistent advantage compared to state-of-the-art baselines in terms of both efficiency and fairness. To the best of our knowledge, this is the first work that exploits joint order dispatching and driver repositioning to optimize both the long-term efficiency and fairness in a ride-hailing system.
KW - driver repositioning
KW - joint order dispatching
KW - long-term efficiency and fairness
KW - ride-hailing
UR - https://www.scopus.com/pages/publications/85137151212
U2 - 10.1145/3534678.3539060
DO - 10.1145/3534678.3539060
M3 - Conference contribution
AN - SCOPUS:85137151212
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 3950
EP - 3960
BT - KDD 2022 - Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
T2 - 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022
Y2 - 14 August 2022 through 18 August 2022
ER -