卢笑, 竺一薇, 阳牡花, 周炫余, 王耀南. 联合图像与单目深度特征的强化学习端到端自动驾驶决策方法[J]. 武汉大学学报 ( 信息科学版), 2021, 46(12): 1862-1871. DOI: 10.13203/j.whugis20210409
引用本文: 卢笑, 竺一薇, 阳牡花, 周炫余, 王耀南. 联合图像与单目深度特征的强化学习端到端自动驾驶决策方法[J]. 武汉大学学报 ( 信息科学版), 2021, 46(12): 1862-1871. DOI: 10.13203/j.whugis20210409
LU Xiao, ZHU Yiwei, YANG Muhua, ZHOU Xuanyu, WANG Yaonan. Reinforcement Learning Based End-to-End Autonomous Driving Decision-Making Method by Combining Image and Monocular Depth Features[J]. Geomatics and Information Science of Wuhan University, 2021, 46(12): 1862-1871. DOI: 10.13203/j.whugis20210409
Citation: LU Xiao, ZHU Yiwei, YANG Muhua, ZHOU Xuanyu, WANG Yaonan. Reinforcement Learning Based End-to-End Autonomous Driving Decision-Making Method by Combining Image and Monocular Depth Features[J]. Geomatics and Information Science of Wuhan University, 2021, 46(12): 1862-1871. DOI: 10.13203/j.whugis20210409

联合图像与单目深度特征的强化学习端到端自动驾驶决策方法

Reinforcement Learning Based End-to-End Autonomous Driving Decision-Making Method by Combining Image and Monocular Depth Features

  • 摘要: 现有的基于深度强化学习(deep reinforcement learning, DRL)的端到端自动驾驶决策方法鲁棒性较低,存在安全隐患,且单纯依赖图像特征难以正确推断出复杂场景下的最优动作。对此,提出了一种联合图像与单目深度特征的强化学习端到端自动驾驶决策方案。首先,建立了基于竞争深度Q网络(dueling deep Q-network, Dueling DQN)的端到端决策模型,以提高模型的策略评估能力和鲁棒性。该模型根据观测数据获取当前状态,输出车辆驾驶动作(油门、转向和刹车)的离散控制量。然后,在二维图像特征的基础上提出了联合单目深度特征的状态感知方法,在自监督情况下有效提取场景深度特征,结合图像特征共同训练智能体网络,协同优化智能体的决策。最后,在模拟仿真环境下对不同的行驶环境和任务进行算法验证。结果表明,该模型可以实现鲁棒的端到端无人驾驶决策,且与仅依赖图像特征的方法相比,所提出的方法具有更强的状态感知能力与更准确的决策能力。

     

    Abstract:
      Objectives  Existing deep reinforcement learning (DRL)based end-to-end autonomous driving decision-making method is low robustness to noise, which would lead to safety problem. It is difficult to infer the optimal decision accurately by relying solely on the image features when facing with the complex scenes.
      Methods  An end-to-end decision-making model based on dueling deep Q-network(Dueling DQN) is established to improve the ability of decision evaluation and improve the robustness of the model. It obtains the current state according to the observed data, and outputs discrete quantities for controlling the vehicle (including throttle, steering and brake). The monocular depth feature is extracted accurately in a self-supervised learning manner, and which is combined with the image features for better representation of the current state.
      Results  The proposed method is tested in a simulation environment. (1) The comparison results with the state-of-the-art A3C model show that our Dueling DQN-based model is more robustness. (2) The comparison results with the image feature-based model show that combining the image and depth features is more beneficial to improve the decision-making accuracy.
      Conclusions  Training an agent with Dueling DQN is beneficial to alleviate the security risks caused by making different decision when facing similar scenes. Training an agent together with image features and depth features is beneficial to enhance the agent̓s ability of environment perception, and improve the decision-making accuracy.

     

/

返回文章
返回