Citation: | HAN Ting, CHEN Siyu, MA Jin, CAI Guorong, ZHANG Wuming, CHEN Yiping. Road Image Free Space Detection via Learnable Deep Position Encoding[J]. Geomatics and Information Science of Wuhan University, 2024, 49(4): 582-594. DOI: 10.13203/j.whugis20230252 |
The freespace detection is a crucial foundation for scene perception in advanced driver assistance systems. Convolutional neural network-based methods are unable to build global contextual infortmation that generate voids and interruptions in predicted results. At the same time, Transformer-based methods lack local understanding resulting in boundary misalignment and exceed.
To this end, we propose a pyramid Transformer architecture with learnable deep position encoding for road freespace detection. First, the pyramid Transformer backbone is designed to extract road features from global perspectives. Second, local window attention is employed in dual-Transformer blocks to compensate for detail loss. Finally, to address the problem that traditional unlearnable position encoding ignores the spatial correlation between pixels and the real world, a learnable position encoding from deep convolutional features is constructed to solve the attention and semantic misalignment.
This model is tested and evaluated on KITTI road, Cityscapes, and Xiamen road datasets. The results show that our method achieves maximum F measure of 97.53% and 98.54% in KITTI and Cityscapes, respectively.
Our method outperforms existing algorithms in the KITTI road benchmark by ensuring higher efficiency while providing higher stability and accuracy. Meanwhile, our method provides high-precision semantic prior information for tasks such as path planning and trajectory prediction in automotive driving assistance systems.
[1] |
崔明阳, 黄荷叶, 许庆, 等. 智能网联汽车架构、功能与应用关键技术[J]. 清华大学学报(自然科学版), 2022, 62(3): 493-508.
Cui Mingyang, Huang Heye, Xu Qing, et al. Survey of Intelligent and Connected Vehicle Technologies: Architectures, Functions and Applications[J]. Journal of Tsinghua University (Science and Technology), 2022, 62(3): 493-508.
|
[2] |
Zhang Yanjie, Huang Wei, Liu Xintao, et al. An Approach for High Definition (HD) Maps Information Interaction for Autonomous Driving[J]. Geomatics and Information Science of Wuhan University,2023,DOI: 10.13203/j.whugis20230166.(张焱杰, 黄炜, 刘信陶, 等. 自动驾驶高精地图信息交互方法[J]. 武汉大学学报(信息科学版),2023,DOI: 10.13203/j.whugis20230166.) doi: 10.13203/j.whugis20230166
|
[3] |
Ying Shen, Jiang Yuewen, Gu Jiangyan, et al. High Definition Map Model for Autonomous Driving and Key Technologies[J]. Geomatics and Information Science of Wuhan University,2023,DOI: 10.13203/j.whugis20230227. (应申, 蒋跃文, 顾江岩, 等. 面向自动驾驶的高精地图模型及关键技术[J]. 武汉大学学报(信息科学版),2023,DOI: 10.13203/j.whugis20230227.) doi: 10.13203/j.whugis20230227
|
[4] |
Daoud M A, Mehrez M W, Rayside D, et al. Simultaneous Feasible Local Planning and Path-Following Control for Autonomous Driving[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(9): 16358-16370.
|
[5] |
Pan J C, Sun H Y, Xu K C, et al. Lane-Attention: Predicting Vehicles’ Moving Trajectories by Learning Their Attention over Lanes[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems,Las Vegas, USA, 2020.
|
[6] |
Weber M, Xie J, Collins M D, et al. STEP: Segmenting and Tracking Every Pixel[C]//The 35th Conference on Neural Information Processing Systems Datasets and Benchmarks Track, New Orleans, USA, 2021.
|
[7] |
Shinzato P Y, Wolf D F. A Road Following Approach Using Artificial Neural Networks Combinations[J]. Journal of Intelligent & Robotic Systems, 2011, 62(3): 527-546.
|
[8] |
Alvarez J M, Gevers T, LeCun Y, et al. Road Scene Segmentation from a Single Image[C]//The 12th European Conference on Computer Vision: Volume Part VII, Florence, Italy, 2012.
|
[9] |
Passani M, Yebes J J, Bergasa L M. CRF-Based Semantic Labeling in Miniaturized Road Scenes[C]//The 17th International IEEE Conference on Intelligent Transportation Systems, Qingdao, China, 2014.
|
[10] |
Passani M, Yebes J J, Bergasa L M. Fast Pixelwise Road Inference Based on Uniformly Reweighted Belief Propagation[C]//IEEE Intelligent Vehicles Symposium, Seoul, 2015.
|
[11] |
Vitor G B, Victorino A, Ferreira J V. A Probabilistic Distribution Approach for the Classification of Urban Roads in Complex Environments[C]//IEEE Workshop on International Conference on Robotics and Automation, Hong Kong, China, 2014.
|
[12] |
Munoz D, Bagnell J A, Hebert M. Stacked Hierarchical Labeling[C]//The 11th European Conference on Computer Vision: Part VI, Heraklion, Crete, Greece, 2010.
|
[13] |
Mendes C C T, Frémont V, Wolf D F. Exploiting Fully Convolutional Neural Networks for Fast Road Detection[C]//IEEE International Conference on Robotics and Automation, Stockholm, Sweden, 2016.
|
[14] |
Muñoz-Bulnes J, Fernandez C, Parra I, et al. Deep Fully Convolutional Networks with Random Data Augmentation for Enhanced Generalization in Road Detection[C]//The 20th International Conference on Intelligent Transportation Systems, Yokohama, Japan, 2017.
|
[15] |
车满强, 李树斌, 李铭. 基于HarDNet全卷积网络的道路路面语义分割方法[J]. 计算机应用, 2021, 41(S2): 76-80.
Che Manqiang, Li Shubin, Li Ming. Road Surface Semantic Segmentation Method Based on HarDNet Fully Convolutional Network[J]. Journal of Computer Applications, 2021, 41(S2): 76-80.
|
[16] |
蒋腾平, 杨必胜, 周雨舟, 等. 道路点云场景双层卷积语义分割[J]. 武汉大学学报(信息科学版), 2020, 45(12): 1942-1948.
Jiang Tengping, Yang Bisheng, Zhou Yuzhou, et al. Bilevel Convolutional Neural Networks for 3D Semantic Segmentation Using Large-Scale LiDAR Point Clouds in Complex Environments[J]. Geomatics and Information Science of Wuhan University, 2020, 45(12): 1942-1948.
|
[17] |
Yu B, Lee D, Lee J S, et al. Free Space Detection Using Camera-LiDAR Fusion in a Bird's Eye View Plane[J]. Sensors, 2021, 21(22): 7623.
|
[18] |
Chen L, Yang J, Kong H. LiDAR-Histogram for Fast Road and Obstacle Detection[C]//IEEE International Conference on Robotics and Automation, Singapore, 2017.
|
[19] |
Gu S, Zhang Y G, Yang J, et al. Two-View Fusion Based Convolutional Neural Network for Urban Road Detection[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems, Macau, China, 2019.
|
[20] |
Fan R, Wang H L, Cai P D, et al. Learning Collision-Free Space Detection from Stereo Images: Homography Matrix Brings Better Data Augmentation[J]. IEEE/ASME Transactions on Mechatronics, 2022, 27(1): 225-233.
|
[21] |
Chen Z, Zhang J, Tao D C. Progressive LiDAR Adaptation for Road Detection[J]. IEEE/CAA Journal of Automatica Sinica,2019,6(3): 693-702.
|
[22] |
Khan A A, Shao J, Rao Y B, et al. LRDNet: Lightweight LiDAR Aided Cascaded Feature Pools for Free Road Space Detection[J]. IEEE Transactions on Multimedia, 2022, 99: 1-13.
|
[23] |
Wang H L, Fan R, Sun Y X, et al. Applying Surface Normal Information in Drivable Area and Road Anomaly Detection for Ground Mobile Robots[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems, Las Vegas, USA, 2020.
|
[24] |
Fan R, Wang H L, Cai P D, et al. SNE-RoadSeg: Incorporating Surface Normal Information into Semantic Segmentation for Accurate Freespace Detection[C]//The 16th European Conference, Glasgow, UK, 2020.
|
[25] |
Wang H L, Fan R, Sun Y X, et al. Dynamic Fusion Module Evolves Drivable Area and Road Anomaly Detection: A Benchmark and Algorithms[J]. IEEE Transactions on Cybernetics, 2022, 52(10): 10750-10760.
|
[26] |
Wang H L, Fan R, Cai P D, et al. SNE-RoadSeg: Rethinking Depth-Normal Translation and Deep Supervision for Freespace Detection[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems, Prague, 2021.
|
[27] |
宋爽, 陈驰, 杨必胜, 等. 低成本大视场深度相机阵列系统[J]. 武汉大学学报(信息科学版), 2018, 43(9): 1391-1398.
Song Shuang, Chen Chi, Yang Bisheng, et al. Large Field of View Array System Using Low Cost RGB-D Camerasin[J]. Geomatics and Information Science of Wuhan University, 2018, 43(9): 1391-1398.
|
[28] |
孟怡悦, 郭迟, 刘经南. 基于注意力机制和奖励塑造的深度强化学习视觉目标导航方法[J]. 武汉大学学报(信息科学版), 2023, DOI: 10.13203/j.whugis20230193. doi: 10.13203/j.whugis20230193
Meng Yiyue, Guo Chi, Liu Jingnan. Deep Reinforcement Learning Visual Target Navigation Method Based on Attention Mechanism and Reward Shaping[J]. Geomatics and Information Science of Wuhan University,2023,DOI:10.13203/j.whugis20230193. doi: 10.13203/j.whugis20230193
|
[29] |
Bai L, Lyu Y C, Huang X M. RoadNet-RT: High Throughput CNN Architecture and SoC Design for Real-Time Road Segmentation[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2021, 68(2): 704-714.
|
[30] |
艾青林, 张俊瑞, 吴飞青. 基于小目标类别注意力机制与特征融合的AF-ICNet非结构化场景语义分割方法[J]. 光子学报, 2023, 52(1): 0110001.
Ai Qinglin, Zhang Junrui, Wu Feiqing. AF-ICNet Semantic Segmentation Method for Unstructured Scenes Based on Small Target Category Attention Mechanism and Feature Fusion[J]. Acta Photonica Sinica, 2023, 52(1): 0110001.
|
[31] |
Sun J Y, Kim S W, Lee S W, et al. Reverse and Boundary Attention Network for Road Segmentation[C]//IEEE/CVF International Conference on Computer Vision Workshop , Seoul, 2019.
|
[32] |
Wang W H, Xie E Z, Li X, et al. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction Without Convolutions[C]//IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021.
|
[33] |
Xie E, Wang W, Yu Z, et al. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers[J]. Advances in Neural Information Processing Systems, 2021, 34: 12077-12090.
|
[34] |
Liu Z, Lin Y T, Cao Y, et al. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows[C]//IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021.
|
[35] |
Fritsch J, Kühnl T, Geiger A. A New Performance Measure and Evaluation Benchmark for Road Detection Algorithms[C]//The 16th International Conferen‑ce on Intelligent Transportation Systems, The Hague, Netherlands, 2013.
|
[36] |
Geiger A, Lenz P, Stiller C, et al. Vision Meets Robotics: The KITTI Dataset[J].International Journal of Robotics Research, 2013,32(11):1231-1237.
|
[37] |
Cordts M,Omran M,Ramos S,et al.The Cityscapes Dataset for Semantic Urban Scene Understanding[C]//IEEE Conference on Computer Vision and Pattern Recognition,Las Vegas,USA, 2016.
|
[38] |
Chang Y C, Xue F, Sheng F, et al. Fast Road Segmentation via Uncertainty-Aware Symmetric Network[C]//International Conference on Robotics and Automation, Philadelphia, USA, 2022.
|
[39] |
Caltagirone L, Bellone M, Svensson L, et al. LiDAR-Camera Fusion for Road Detection Using Fully Convolutional Neural Networks[J]. Robotics and Autonomous Systems, 2019, 111: 125-131.
|
[40] |
Gu S, Zhang Y, Tang J, et al. Road Detection Through CRF Based LiDAR-Camera Fusion[C]//2019 International Conference on Robotics and Automation, Montreal, Canada, 2019.
|
[41] |
Han Z, Zhang C, Fu H, et al. Trusted Multi-view Classification[C]//International Conference on Learning Representations, New York, USA, 2020.
|
[42] |
Gu S, Zhang Y G, Yuan X, et al. Histograms of the Normalized Inverse Depth and Line Scanning for Urban Road Detection[J]. IEEE Transactions on Intelligent Transportation Systems, 2019, 20(8): 3070-3080.
|
[43] |
Lyu Y C, Bai L, Huang X M. Road Segmentation Using CNN and Distributed LSTM[C]//IEEE International Symposium on Circuits and Systems , Sapporo, Japan, 2019.
|
[44] |
Zhang S C, Zhang Z, Sun L B, et al. One for All: A Mutual Enhancement Method for Object Detection and Semantic Segmentation[J].Applied Sciences, 2019, 10(1): 13.
|
[45] |
Reis F A L, Almeida R, Kijak E, et al. Combining Convolutional Side-Outputs for Road Image Segmentation[C]//International Joint Conference on Neural Networks, Budapest, Hungary, 2019.
|
[46] |
Oeljeklaus M. An Integrated Approach for Traffic Scene Understanding from Monocular Cameras[M]. Düsseldorf: VDI Verlag, 2021.
|
[47] |
Gu S,Yang J,Kong H.A Cascaded LiDAR-Camera Fusion Network for Road Detection[C]//IEEE International Conference on Robotics and Automation, Xi’an, China, 2021.
|
[48] |
Han T, Li C M, Chen S Y, et al. HEAT: Incorporating Hierarchical Enhanced Attention Transformation into Urban Road Detection[J]. IET Intelligent Transport Systems, 2023(1): 1–20.
|
[49] |
Shelhamer E, Long J, Darrell T. Fully Convolutional Networks for Semantic Segmentation[C]// IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015.
|
[50] |
Badrinarayanan V, Kendall A, Cipolla R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495.
|
[51] |
Zhang J M, Liu H Y, Yang K L, et al. CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(12): 14679-14694.
|
[1] | HU Deyong, QIAO Kun, WANG Xingling, ZHAO Limin, JI Guohua. Comparison of Three Single-window Algorithms for Retrieving Land-Surface Temperature with Landsat 8 TIRS Data[J]. Geomatics and Information Science of Wuhan University, 2017, 42(7): 869-876. DOI: 10.13203/j.whugis20150164 |
[2] | FENG Qi, CHENG Xuejun, SHEN Xin, XIAO Xiao, WANG Lihui, ZHANG Wen. Inland Riverine Turbidity Estimation for Hanjiang River with Landsat 8 OLI Imager[J]. Geomatics and Information Science of Wuhan University, 2017, 42(5): 643-647. DOI: 10.13203/j.whugis20141002 |
[3] | WANG Yuzhuo, LIU Xiuguo, ZHANG Wei. Raster River Networks Extraction Based on Parallel Multiple Flow Direction Algorithms[J]. Geomatics and Information Science of Wuhan University, 2015, 40(12): 1646-1652,1682. DOI: 10.13203/j.whugis20140645 |
[4] | LI Yuguang, LI Qingquan. A Fast Algorithm for Huge Volume Floating Car Data Map-Matching:A Vector to Raster Map Conversion Approach[J]. Geomatics and Information Science of Wuhan University, 2014, 39(6): 724-728. DOI: 10.13203/j.whugis20140071 |
[5] | DONG Jian, PENG Rencan, CHEN Yi, LI Ning. An Algorithm for Centre Line Generation Based on Model of Approaching Intersection of Buffering Borderline from Reciprocal Direction[J]. Geomatics and Information Science of Wuhan University, 2011, 36(9): 1120-1123. |
[6] | ZHANG Junfeng, FEI Lifan, HUANG Lina, LIU Yining. Real-Time Dynamic Rendering Algorithm of Terrain Using 3D_DP Method and Quad_TIN Model[J]. Geomatics and Information Science of Wuhan University, 2011, 36(3): 346-350. |
[7] | LAN Qiuping, FEI Lifan, LIU Yining. An Approach on Calculating Firn Volume Change from Multi-temporal DEMs[J]. Geomatics and Information Science of Wuhan University, 2010, 35(10): 1222-1225. |
[8] | HUANG Lina, FEI Lifan. Experimental Investigation on the Three Dimension Generalization of Contour Lines using 3D D-P Algorithm[J]. Geomatics and Information Science of Wuhan University, 2010, 35(1): 55-58. |
[9] | YAN Huiwu, ZHU Guorui, XU Zhiyong, GAO Shan. Volume Rendering and 3D Modeling of Hydrogeologic Layer Based on Kriging Algorithm[J]. Geomatics and Information Science of Wuhan University, 2004, 29(7): 611-614. |
[10] | CHENG Penggen, GONG Jianya, SHI Wenzhong, LIU Shaohua. Geological Object Modeling Based on Quasi Tri-prism Volume and Its Application[J]. Geomatics and Information Science of Wuhan University, 2004, 29(7): 602-307. |