融合视觉语言模型的车辆驾驶意图建模与轨迹预测方法

李必军; 田子岳; 钟浩然; 江水云; 骆嫚; 周剑

doi:10.13203/j.whugis20260089

融合视觉语言模型的车辆驾驶意图建模与轨迹预测方法

Autonomous Driving Trajectory Prediction Framework Integrating Visual-Language Model Intention Modeling

摘要

摘要: 自动驾驶场景下，对周边车辆未来运动轨迹的准确预测是保障行车安全和提高交通运行效率的重要前提。然而，现有轨迹预测方法大多基于仿真环境进行设计与评估，泛化到真实场景时存在性能不足的问题。为此，提出了一种基于视觉语言模型的车辆驾驶意图推理与轨迹预测方法。方法利用视觉语言模型对目标车辆的驾驶意图进行推理，提取驾驶意图点作为中间语义表征；设计了基于驾驶意图点的轨迹预测解码器，实现驾驶意图引导的轨迹预测，从而增强模型对车辆行为演化趋势的刻画能力。在真实车辆场景数据上的实验结果表明，相较于基准模型，融合驾驶意图信息能够有效提升轨迹预测模型在真实场景中的预测精度与泛化能力，在平均精度指标上提升了4个百分点，在最小平均位移误差和最小最终位移误差指标上分别降低了0.84 m和1.5 m。

Abstract: Objectives: Accurate prediction of surrounding vehicles' future trajectories is vital for enhancing driving safety and improving traffic efficiency, particularly in the context of autonomous driving. Although many existing trajectory prediction methods demonstrate promising results within simulated environments, they often falter when applied to real-world conditions, leading to significant performance degradation. This study aims to bridge this gap by proposing a novel vehicle driving-intent inference and trajectory prediction method grounded in a vision-language model. By integrating insights from both vision and language domains, this method aspires to improve the accuracy and robustness of trajectory predictions in autonomous driving scenarios. Methods: Our approach begins with the utilization of a vision-language model to infer the target vehicle's driving intent. This process involves extracting intent points, which serve as an intermediate semantic representation that encapsulates critical features of the vehicle's anticipated behavior. Building on these intent points, we have designed a trajectory prediction decoder that is guided by the inferred driving intent, thus enabling a more accurate and contextually relevant trajectory forecasting. This intent-point-based approach not only helps in delineating the subtle behaviors of vehicles but also enhances the overall predictive capabilities of the model. Results: Through extensive experiments conducted on real-world driving-scene datasets, we demonstrate the efficacy of our proposed method. The results indicate a marked improvement in prediction accuracy and generalization capabilities when driving intent information is incorporated. Specifically, our model outperforms baseline models, achieving an increase in mean prediction accuracy by 4 percentage points. Furthermore, we observe significant reductions in two critical metrics: the minimum average displacement error, which decreases by 0.84 meters, and the minimum final displacement error, which improves by 1.5 meters. These metrics underscore the superiority of our approach in adapting to the first-hand variability and complexity of real-world driving conditions. Conclusions: The proposed vehicle driving-intent inference and trajectory prediction method represents a significant advancement in the field of autonomous driving research. By effectively leveraging a vision-language model to integrate driving-intent information into trajectory forecasting, we provide a solution that meets the pressing needs for improved prediction accuracy and generalization in unpredictable real-world environments. Our findings highlight the importance of understanding vehicle behaviors in a more nuanced manner and suggest future avenues for enhancing the interplay between machine learning techniques and real-world applications. This work serves as a foundational step toward more reliable and intelligent autonomous driving systems that can safely navigate complex traffic scenarios.

HTML全文

参考文献(0)

施引文献

资源附件(0)