Abstract:
Objectives In recent years, 3D spatiotemporal information has become a cornerstone of national new infrastructure construction, with airborne point clouds playing a critical role in large-scale 3D data acquisition. However, effectively interpreting these point clouds and extracting their semantic richness remain significant challenges. Traditional methods, including geometry-driven feature extraction and data-driven deep learning models, have achieved limited progress but struggle with the complexity and scale of airborne point clouds. Advances in large-scale foundation models (FM), particularly in natural language processing and vision-language integration, offer new opportunities for point cloud understanding. Large language models exhibit exceptional generalization and cross-modal semantic capabilities, enabling a language-centered paradigm. By leveraging language models to map point cloud data into a semantically enriched space, this approach addresses limitations of traditional methods.
Methods The evolution of point cloud understanding is examined across geometry- and data-driven approaches, self-supervised paradigms, and FM-driven methodologies. A language-centered framework for airborne point cloud understanding is proposed, tackling high-level semantic modeling, cross-modal alignment, and downstream task adaptation. Results demonstrate enhanced semantic representation, improved generalization, and significant advantages in complex scenarios.
Results The findings provide new insights into point cloud understanding and establish a foundation for integrating large-scale models into 3D applications.
Conclusions These contributions offer innovative perspectives and technical solutions for advancing point cloud technologies in national infrastructure projects.