Autonomous Driving Trajectory Prediction Framework Integrating Visual-Language Model Intention Modeling

LI Bijun; TIAN Ziyue; ZHONG Haoran; JIANG Shuiyun; LUO Man; ZHOU Jian

doi:10.13203/j.whugis20260089

LI Bijun, TIAN Ziyue, ZHONG Haoran, JIANG Shuiyun, LUO Man, ZHOU Jian. Autonomous Driving Trajectory Prediction Framework Integrating Visual-Language Model Intention ModelingJ. Geomatics and Information Science of Wuhan University. DOI: 10.13203/j.whugis20260089

Citation:

Autonomous Driving Trajectory Prediction Framework Integrating Visual-Language Model Intention Modeling

Abstract

Abstract

Objectives: Accurate prediction of surrounding vehicles' future trajectories is vital for enhancing driving safety and improving traffic efficiency, particularly in the context of autonomous driving. Although many existing trajectory prediction methods demonstrate promising results within simulated environments, they often falter when applied to real-world conditions, leading to significant performance degradation. This study aims to bridge this gap by proposing a novel vehicle driving-intent inference and trajectory prediction method grounded in a vision-language model. By integrating insights from both vision and language domains, this method aspires to improve the accuracy and robustness of trajectory predictions in autonomous driving scenarios. Methods: Our approach begins with the utilization of a vision-language model to infer the target vehicle's driving intent. This process involves extracting intent points, which serve as an intermediate semantic representation that encapsulates critical features of the vehicle's anticipated behavior. Building on these intent points, we have designed a trajectory prediction decoder that is guided by the inferred driving intent, thus enabling a more accurate and contextually relevant trajectory forecasting. This intent-point-based approach not only helps in delineating the subtle behaviors of vehicles but also enhances the overall predictive capabilities of the model. Results: Through extensive experiments conducted on real-world driving-scene datasets, we demonstrate the efficacy of our proposed method. The results indicate a marked improvement in prediction accuracy and generalization capabilities when driving intent information is incorporated. Specifically, our model outperforms baseline models, achieving an increase in mean prediction accuracy by 4 percentage points. Furthermore, we observe significant reductions in two critical metrics: the minimum average displacement error, which decreases by 0.84 meters, and the minimum final displacement error, which improves by 1.5 meters. These metrics underscore the superiority of our approach in adapting to the first-hand variability and complexity of real-world driving conditions. Conclusions: The proposed vehicle driving-intent inference and trajectory prediction method represents a significant advancement in the field of autonomous driving research. By effectively leveraging a vision-language model to integrate driving-intent information into trajectory forecasting, we provide a solution that meets the pressing needs for improved prediction accuracy and generalization in unpredictable real-world environments. Our findings highlight the importance of understanding vehicle behaviors in a more nuanced manner and suggest future avenues for enhancing the interplay between machine learning techniques and real-world applications. This work serves as a foundational step toward more reliable and intelligent autonomous driving systems that can safely navigate complex traffic scenarios.

FullText(HTML)

References (0)

Cited By

Autonomous Driving Trajectory Prediction Framework Integrating Visual-Language Model Intention Modeling

Abstract

Catalog

Export File

Citation

Format

Content