赵卫锋, 李清泉, 李必军. 空间认知导向下利用分层强化学习的最优路径规划[J]. 武汉大学学报 ( 信息科学版), 2012, 37(11): 1271-1275.
引用本文: 赵卫锋, 李清泉, 李必军. 空间认知导向下利用分层强化学习的最优路径规划[J]. 武汉大学学报 ( 信息科学版), 2012, 37(11): 1271-1275.
ZHAO Weifeng, LI Qingquan, LI Bijun. Spatial Cognition Oriented Optimal Route Planning with Hierarchical Reinforcement Learning[J]. Geomatics and Information Science of Wuhan University, 2012, 37(11): 1271-1275.
Citation: ZHAO Weifeng, LI Qingquan, LI Bijun. Spatial Cognition Oriented Optimal Route Planning with Hierarchical Reinforcement Learning[J]. Geomatics and Information Science of Wuhan University, 2012, 37(11): 1271-1275.

空间认知导向下利用分层强化学习的最优路径规划

Spatial Cognition Oriented Optimal Route Planning with Hierarchical Reinforcement Learning

  • 摘要: 针对空间认知导向下模型驱动型路径规划和人们认知偏好多样性之间的矛盾,提出了一种基于分层强化学习的交互学习型路径规划方法。该方法将最优路径标准转换为路口处转向决策的瞬时奖励值,并通过预学习和实时学习两个阶段实现高效地发现总奖励值最大的最优路径策略。其中,预学习阶段自动发现子目标节点,并构建包含局部最优策略的子任务;实时学习阶段利用预定义策略实现高效的Q值更新,并根据Q值追溯最优路径。实验表明,该方法具有足够好的实时性和最优性。

     

    Abstract: Against the contradictions between model-driven route planning and the diversity of human cognitive preferences for spatial cognition oriented optimal routes,we present a kind of interactive route planning approach based on hierarchical reinforcement learning.In this approach,optimal route criterias are translated into immediate rewards of turning decisions at intersections,and optimal route policies with maximal cumulative rewards can be found through a two-stage learning process.The first pre-learning stage automatically identifies some nodes in road network as subgoals and constructs corresponding subtasks containing local optimal route policies for achieving the subgoals.The second real-time learning stage focuses on efficiently updating the Q values of every available state-action pair using predefined policies,and tracing the optimal routes according to Q values.The experimental results show that our proposed approach learns effectively enough and ensures the routes found close to global optimal ones.

     

/

返回文章
返回