Abstract
Crowdsourced delivery platforms face challenges in matching couriers to customer orders due to fluctuating demand and uncertain courier availability. The platform's courier workforce has two types: committed couriers who commit to working for a specific time, and occasional couriers who log in to the platform at a time of their choice. Traditionally, these platforms establish "offline"schedules in advance for committed couriers based on forecasts of anticipated deliveries within defined time windows. However, since actual order numbers are unpredictable in real-time, efficient operations require reactive scheduling to optimally match resources with changing demand trends. A recent paper, Saleh et al (2024), proposed a strategy of extending the shifts of committed couriers as needed in response to the change of demand trends throughout the day. The problem was formulated as a Markov Decision Process (MDP) and utilized a popular value-based algorithm, Deep Q-Network (DQN), to maximize the platform's expected reward. In this work, we extend the work of Saleh et al (2024) by investigating two other alternative approaches belonging to the class of policy gradient-based algorithms, namely, Proximal Policy Optimization (PPO) and Advantage Actor Critic (A2C). PPO and A2C optimize policy parameters to maximize the expected rewards. Through a comparative analysis, we evaluate the effectiveness of PPO and A2C versus DQN in addressing the scheduling challenges faced by crowdsourced delivery platforms. The results show that PPO achieved the best results in terms of total rewards, lost requests, and shift extension costs by effectively learning an optimized policy through its use of the clipping objective function. While A2C had a higher expected reward than DQN, it struggled with directly optimizing its policy and prioritizing timely deliveries. Finally, DQN is found to excessively rely on shift extensions.
| Original language | English |
|---|---|
| Pages (from-to) | 121-128 |
| Number of pages | 8 |
| Journal | Transportation Research Procedia |
| Volume | 84 |
| DOIs | |
| State | Published - 2025 |
| Event | 1st Internation Conference on Smart Mobility and Logistics Ecosystems, SMiLE 2024 - Dhahran, Saudi Arabia Duration: 17 Sep 2024 → 19 Sep 2024 |
Bibliographical note
Publisher Copyright:© 2024 The Authors. Published by ELSEVIER B.V.
Keywords
- A2C
- Crowd-sourced last mile delivery
- DQN
- Deep Reinforcement Learning
- PPO
ASJC Scopus subject areas
- Transportation