Citation: | YANG Bisheng, CHEN Yiping, ZOU Qin. Opportunities and Challenges of Spatiotemporal Information Intelligent Processing of Surveying and Mapping in the Era of Large Models[J]. Geomatics and Information Science of Wuhan University, 2023, 48(11): 1756-1768. DOI: 10.13203/j.whugis20230378 |
Currently, spatiotemporal information, positioning and navigation have become important new infrastructures. Driven by general artificial intelligence, the era of intelligence led by large models has arrived. The increasingly powerful large models will play a more and more important role in the intelligent processing and applications of spatiotemporal information. First, this paper uses large models as the research paradigm and summarizes the current status and progress of large models in intelligent processing of spatiotemporal information in surveying and mapping. In addition, it analyzes the challenges faced by large models in intelligent processing of spatiotemporal information in surveying and mapping. We elaborate three key technologies in spatiotemporal information large models for surveying and mapping, including multi-modal fusion and understanding architecture design, prompt engineering optimization by fine-tuning, and human-in-the-loop guidance for decision-making. Finally, the depth of industry understanding, data security risks, content credibility, and training deployment cost optimization are addressed to predict the development trend of spatiotemporal information large models.
[1] |
Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[J]. Advances in Neural Information Processing Systems, 2017, 30: 1706.03762.
|
[2] |
OpenAI Team. ChatGPT: Optimizing Language Models for Dialogue[EB/OL]. (2022-09-20) [2023-09-27]. https://openai.com/blog/chatgpt/.
|
[3] |
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[EB/OL]. (2018-10-11) [2023-09-20]. https://arxiv.org/abs/1810.04805.
|
[4] |
Raffel C, Shazeer N, Roberts A, et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer [J]. Journal of Machine Learning Research, 2020, 21(1): 5485-5551.
|
[5] |
Radford A, Narasimhan K. Improving Language Understanding by Generative Pre-training. [EB/OL]. (2018-05-16) [2023-09-27]. https://blog.openai.com/language-unsupervised.
|
[6] |
Radford A, Wu J, Child R, et al. Language Models Are Unsupervised Multitask Learners[J]. OpenAI Blog, 2019, 1(8): 9.
|
[7] |
Zhao W X, Zhou K, Li J Y, et al. A Survey of Large Language Models[EB/OL]. (2023-03-31) [2023-09-22]. https://arxiv.org/abs/2303.18223.
|
[8] |
OpenAI. GPT-4 Technical Report[EB/OL]. (2023-03-15) [2023-09-27]. https://arxiv.org/abs/2303.08774.
|
[9] |
Chowdhery A, Narang S R, Devlin J, et al. PaLM: Scaling Language Modeling with Pathways[EB/OL]. (2022-04-05) [2023-09-25]. https://arxiv.org/abs/2204.02311.
|
[10] |
Touvron H, Lavril T, Izacard G, et al. LLaMA: Open and Efficient Foundation Language Models[EB/OL]. (2023-02-27) [2023-09-20]. https://arxiv.org/abs/2302.13971.
|
[11] |
Dosovitskiy A, Beyer L, Kolesnikov A, et al. An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale[EB/OL]. (2020-10-22) [2023-09-28]. https://arxiv.org/abs/2010.11929.
|
[12] |
Riquelme C, Puigcerver J, Mustafa B, et al. Scaling Vision with Sparse Mixture of Experts[EB/OL]. (2021-11-10) [2023-09-18]. https://arxiv.org/abs/2106.05974.
|
[13] |
Kirillov A, Mintun E, Ravi N, et al. Segment Anything[EB/OL]. (2023-04-05) [2023-09-22]. https://arxiv.org/abs/2304.02643.
|
[14] |
Darmé L, Degrande C, Duhr C, et al. UFO 2.0: The Universal Feynman Output Format[EB/OL]. (2023-04-19) [2023-09-27]. https://arxiv.org/abs/2304.09883.
|
[15] |
Bi K F, Xie L X, Zhang H H, et al. Accurate Medium-Range Global Weather Forecasting with 3D Neural Networks[J]. Nature, 2023, 619(7970): 533-538. doi: 10.1038/s41586-023-06185-3
|
[16] |
InternLM Team. Internlm: A Multilingual Language Model with Progressively Enhanced Capabilities. [EB/OL]. (2023-01-06) [2023-09-27]. https://github.com/InternLM/InternLM.
|
[17] |
Sun C, Myers A, Vondrick C, et al. VideoBERT: A Joint Model for Video and Language Representation Learning[C]//IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, South Korea, 2019.
|
[18] |
Zhu L C, Yang Y. ActBERT: Learning Global-Local Video-Text Representations[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2020.
|
[19] |
Li L J, Chen Y C, Cheng Y, et al. HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training[EB/OL]. (2021-05-01) [2023-09-24]. https://arxiv.org/abs/2005.00200.
|
[20] |
Feichtenhofer C, Fan H Q, Li Y H, et al. Masked Autoencoders as Spatiotemporal Learners[EB/OL]. (2022-11-01) [2023-09-27]. https://arxiv.org/abs/2205.09113.
|
[21] |
Jia M L, Tang L M, Chen B C, et al. Visual Prompt Tuning[C]//European Conference on Computer Vision, Cham, Switzerland, 2022.
|
[22] |
Wang X, Yuan H J, Zhang S W, et al. VideoComposer: Compositional Video Synthesis with Motion Controllability[EB/OL]. (2023-06-03) [2023-09-20]. https://arxiv.org/abs/2306.02018.
|
[23] |
Zhang D Y, Liang D K, Yang H C, et al. SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model[EB/OL]. (2023-06-04) [2023-09-23]. https://arxiv.org/abs/2306.02245.
|
[24] |
Mildenhall B, Srinivasan P P, Tancik M, et al. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis [J]. Communications of the ACM, 2022, 65(1): 99-106. doi: 10.1145/3503250
|
[25] |
Cen J Z, Zhou Z W, Fang J M, et al. Segment Anything in 3D with NeRFs[EB/OL]. (2023-04-24) [2023-09-22]. https://arxiv.org/abs/2304.12308.
|
[26] |
Zhao Z R, Mao Y J, Ding Y, et al. Visual-based Semantic SLAM with Landmarks for Large-Scale Outdoor Environment[C]//The 2nd China Symposium on Cognitive Computing and Hybrid Intelligence (CCHI), Xi'an, China, 2019.
|
[27] |
Cardinali R, Anniballi E, Bongioanni C, et al. ARGUS 3D: Security Enhancements Through Innovative Radar Technologies[C]//International Conference on Availability, Reliability and Security, Regensburg, Germany, 2013.
|
[28] |
Shao J, Chen S Y, Li Y G, et al. INTERN: A New Learning Paradigm Towards General Vision[EB/OL]. (2021-11-16) [2023-09-29]. https://arxiv.org/abs/2111.08687.
|
[29] |
Xiangli Y B, Xu L N, Pan X G, et al. Bungee-NeRF: Progressive Neural Radiance Field for Extreme Multi-Scale Scene Rendering[C]//European Conference on Computer Vision, Cham, Switzerland, 2022.
|
[30] |
Hong Y N, Zhen H Y, Chen P H, et al. 3D-LLM: Injecting the 3D World into Large Language Models[EB/OL]. (2023-07-24) [2023-09-26]. https://arxiv.org/abs/2307.12981.
|
[31] |
Radford A, Kim J W, Hallacy C, et al. Learning Transferable Visual Models from Natural Language Supervision[C]//International Conference on Machine Learning, Baltimore, USA, 2021.
|
[32] |
Li L H, Zhang P, Zhang H, et al. Grounded Language-Image Pre-training[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022.
|
[33] |
Luo H S, Ji L, Zhong M, et al. CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval and Captioning[J]. Neurocomputing, 2022, 508: 293-304. doi: 10.1016/j.neucom.2022.07.028
|
[34] |
Wang M M, Xing J Z, Liu Y. ActionCLIP: A New Paradigm for Video Action Recognition[EB/OL]. (2021-09-17) [2023-09-20]. https://arxiv.org/abs/2109.08472.
|
[35] |
Zhou L D, Schneider F B, Van Renesse R. COCA: A Secure Distributed Online Certification Authority[J]. ACM Transactions on Computer Systems, 2002, 20(4): 329-368. doi: 10.1145/571637.571638
|
[36] |
Alayrac J B, Donahue J, Luc P, et al. Flamingo: A Visual Language Model for Few-Shot Learning[EB/OL]. (2022-11-01) [2023-09-20]. https://arxiv.org/abs/2204.14198.
|
[37] |
Chen X, Wang X, Changpinyo S, et al. PaLI: A Jointly-Scaled Multilingual Language-Image Model[EB/OL]. (2022-09-14) [2023-09-20]. https://arxiv.org/abs/2209.06794.
|
[38] |
中国科学院自动化研究所. "紫东太初"全模态大模型正式发布[EB/OL]. (2023-06-16) [2023-09-20]. https://mp.weixin.qq.com/s/BLg6mqZ2Hi8JiplKsDD77g.
Institute of Automation, Chinese Academy of Sciences. The "Zidong Taichu" Full Modal Large Model has Been Officially Released[EB/OL]. (2023-06-16) [2023-09-20]. https://mp.weixin.qq.com/s/BLg6mqZ2Hi8JiplKsDD77g.
|
[39] |
Wu C F, Liang J, Ji L, et al. NÜWA: Visual Synthesis Pre-training for Neural Visual World Creation[C]//European Conference on Computer Vision, Cham, Switzerland, 2022.
|
[40] |
Fudan University. Computing for the Future at Fudan [EB/OL]. (2023-09-15) [2023-09-22]http://cfff.fudan.edu.cn/home.
|
[41] |
Wu C F, Yin S M, Qi W Z, et al. Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models[EB/OL]. (2023-03-08) [2023-09-22]https://arxiv.org/abs/2303.04671.
|
[42] |
Shen Y L, Song K T, Tan X, et al. HuggingGPT: Solving AI Tasks with ChatGPT and Its Friends in Hugging Face[EB/OL]. (2023-03-30) [2023-09-25]. https://arxiv.org/abs/2303.17580.
|
[43] |
Huang S H, Dong L, Wang W H, et al. Language Is not All You Need: Aligning Perception with Language Models[EB/OL]. (2023-02-27) [2023-09-27]. https://arxiv.org/abs/2302.14045.
|
[44] |
Tsimpoukelli M, Menick J L, Cabi S, et al. Multimodal Few-Shot Learning with Frozen Language Models[J]. Advances in Neural Information Processing Systems, 2021, 34: 200-212.
|
[45] |
Li J N, Li D X, Savarese S, et al. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models[EB/OL]. (2023-06-30) [2023-09-26]. https://arxiv.org/abs/2301.12597.
|
[46] |
Li J N, Li D X, Xiong C M, et al. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation[C]//International Conference on Machine Learning, Xiamen, China, 2022.
|
[47] |
Zhang R R, Han J M, Liu C, et al. LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention[EB/OL]. (2023-03-28) [2023-09-20]. https://arxiv.org/abs/2303.16199.
|
[48] |
Wang J P, Zhou P, Shou M Z, et al. Position-Guided Text Prompt for Vision-Language Pre-training[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023.
|
[49] |
Cao Y K, Xu X H, Sun C, et al. Segment Any Anomaly Without Training via Hybrid Prompt Regularization[EB/OL]. (2023-05-18) [2023-09-25]. https://arxiv.org/abs/2305.10724.
|
[50] |
Zhang L, Rao A, Agrawala M. Adding Conditional Control to Text-to-Image Diffusion Models[C]//IEEE/CVF International Conference on Computer Vision, Paris, France, 2023.
|
[51] |
Rombach R, Blattmann A, Lorenz D, et al. High-Resolution Image Synthesis with Latent Diffusion Models[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, 2022.
|
[52] |
Zhou K Y, Yang J K, Loy C C, et al. Learning to Prompt for Vision-Language Models[J]. International Journal of Computer Vision, 2022, 130(9): 2337-2348. doi: 10.1007/s11263-022-01653-1
|
[53] |
Rao Y M, Zhao W L, Chen G Y, et al. DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, 2022.
|
[54] |
Kawar B, Zada S, Lang O, et al. Imagic: Text-based Real Image Editing with Diffusion Models[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023.
|
[55] |
Zhu J W, Lai S M, Chen X, et al. Visual Prompt Multi-modal Tracking[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023.
|
[56] |
Huang Q D, Dong X Y, Chen D D, et al. Diversity-Aware Meta Visual Prompting[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023.
|
[57] |
Zhang J L, Zhou Z L, Mai G C, et al. Text2Seg: Remote Sensing Image Semantic Segmentation via Text-Guided Visual Foundation Models[EB/OL]. (2023-04-20) [2023-09-25]. https://arxiv.org/abs/2304.10597.
|
[58] |
Wang T, Zhang J R, Fei J J, et al. Caption Anything: Interactive Image Description with Diverse Multimodal Controls[EB/OL]. (2023-05-04) [2023-09-22]. https://arxiv.org/abs/2305.02677.
|
[59] |
Houlsby N, Giurgiu A, Jastrzebski S, et al. Parameter-Efficient Transfer Learning for NLP[C]//International Conference on Machine Learning, California, USA, 2019.
|
[60] |
Hu E J, Shen Y L, Wallis P, et al. LoRA: Low-Rank Adaptation of Large Language Models[EB/OL]. (2021-06-17) [2023-09-20]. https://arxiv.org/abs/2106.09685.
|
[61] |
Liu X, Zheng Y, Du Z, et al. GPT Understands, Too[EB/OL]. (2021-07-20) [2023-09-25]. https://arxiv.org/abs/2103.10385.
|
[62] |
Li X L, Liang P. Prefix-Tuning: Optimizing Continuous Prompts for Generation[EB/OL]. (2021-01-01) [2023-09-20]. https://arxiv.org/abs/2101.00190.
|
[63] |
Hu Z Q, Lan Y H, Wang L, et al. LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models[EB/OL]. (2023-04-04) [2023-09-20]. https://arxiv.org/abs/2304.01933.
|
[64] |
Pfeiffer J, Vulić I, Gurevych I, et al. MAD-X: An Adapter-based Framework for Multi-Task Cross-Lingual Transfer[EB/OL]. (2020-04-30) [2023-09-22]. https://arxiv.org/abs/2005.00052.
|
[65] |
He J X, Zhou C T, Ma X Z, et al. Towards a Unified View of Parameter-Efficient Transfer Learning[EB/OL]. (2021-10-08) [2023-09-26]. https://arxiv.org/abs/2110.04366.
|
[1] | HE Chaoyang, XU Qiang, JU Nengpan, XIE Mingli. Optimization of Model Scheduling Algorithm in Real-Time Monitoring and Early Warning of Landslide[J]. Geomatics and Information Science of Wuhan University, 2021, 46(7): 970-982. DOI: 10.13203/j.whugis20200314 |
[2] | CAO Zhipeng, JIANG Liangcun, HUANG Qiujun, YUE Peng, SHANGGUAN Boyi, LUO Aling, LIANG Zheheng. A Dynamic Scheduling Method of Logistics Vehicles Based on Ruin and Recreate Algorithm[J]. Geomatics and Information Science of Wuhan University, 2021, 46(5): 755-765, 776. DOI: 10.13203/j.whugis20200017 |
[3] | ZHU Qing, HAN Huipeng, YU Jie, DU Zhiqiang, ZHANG Junxiao, WU Chen, SHEN Fuqiang. Multi-objective Optimization Scheduling Method for UAV Resources in Emergency Surveying and Mapping[J]. Geomatics and Information Science of Wuhan University, 2017, 42(11): 1608-1615. DOI: 10.13203/j.whugis20130000 |
[4] | ZHANG Dengyi, GUO Lei, WANG Qian, ZOU Hua. An Improved Single-orbit Scheduling Method for Agile ImagingSatellite Towards Area Target[J]. Geomatics and Information Science of Wuhan University, 2014, 39(8): 901-905. DOI: 10.13203/j.whugis20130233 |
[5] | CHEN Di, ZHU Xinyan, ZHOU Chunhui, SU Kehua. Distributed Spatial Query Processing and Parallel Schedule Based on Zonal Fragmentation[J]. Geomatics and Information Science of Wuhan University, 2012, 37(8): 892-896. |
[6] | YANG Chuncheng, XIE Peng, HE Liesong, ZHOU Xiaodong. Data Scheduling for Map Data Reading[J]. Geomatics and Information Science of Wuhan University, 2009, 34(2): 166-169. |
[7] | LIN Aiwen, NIU Jiqiang, HU Lifeng. Evaluation of Natural Resources with Grey Clustering Model[J]. Geomatics and Information Science of Wuhan University, 2008, 33(2): 164-167. |
[8] | XIE Hongyu, LIU Nianfeng, YAO Ruizhen, SONG Weiwei. Resource Yield Method on Ecological Footprint Analysis[J]. Geomatics and Information Science of Wuhan University, 2006, 31(11): 1018-1021. |
[9] | YU Dandan, HE Yanxiang, TU Guoqing. A Market-based Hierarchical Model for Resource Management Architecture in Spatial Information Grid[J]. Geomatics and Information Science of Wuhan University, 2005, 30(9): 837-840. |
[10] | Li Mingshan, Lu Zhiyan. The Generalized Backtracking Method & the Optimum Task Scheduling of the Parallel Computer System[J]. Geomatics and Information Science of Wuhan University, 1996, 21(1): 90-95. |
1. |
谭冰,高春春,陆洋,卢鹏,李志军. 南极威德尔海西北区域冬季海冰龙骨形态分析. 武汉大学学报(信息科学版). 2021(09): 1386-1394 .
![]() | |
2. |
陈俊霖,周春霞,赵秋阳. 2003—2018年Byrd冰川流域冰下湖活动及水文联系——多源卫星测高数据监测结果分析. 测绘学报. 2020(05): 547-556 .
![]() |