YUE Peng, LIU Ruixiang, SHANGGUAN Boyi, CAO Zhipeng, LIU Shuaiqi, XU Hanwen. GeoAI Training Data: Model, Quality, and Services[J]. Geomatics and Information Science of Wuhan University, 2023, 48(10): 1616-1631. DOI: 10.13203/j.whugis20230125
Citation: YUE Peng, LIU Ruixiang, SHANGGUAN Boyi, CAO Zhipeng, LIU Shuaiqi, XU Hanwen. GeoAI Training Data: Model, Quality, and Services[J]. Geomatics and Information Science of Wuhan University, 2023, 48(10): 1616-1631. DOI: 10.13203/j.whugis20230125

GeoAI Training Data: Model, Quality, and Services

More Information
  • Received Date: April 09, 2023
  • Available Online: April 19, 2023
  • The data-driven research paradigm brings a strong demand for training data sharing in geospatial artificial intelligence (GeoAI). The training data content and organization from different GeoAI applications are diverse. A unified information model will lay the foundation for GeoAI training data sharing and interoperability. By analyzing the common features and core attributes of different GeoAI training data, an information model for training data was proposed, and the training data quality elements and evaluation methods were explored. The results provide a reference for development of GeoAI training data stores and sharing services.

  • [1]
    中国人工智能学会. 中国人工智能进展: 2009[M]. 北京: 北京邮电大学出版社, 2009.

    Chinese Association for Artificial Intelligence. Chinese Progress for Artificial Intelligence: 2009[M]. Beijing: Beijing University of Posts and Telecommunications Press, 2009.
    [2]
    Garg P K. Overview of Artificial Intelligence[M]//Artificial Intelligence. London: CRC, 2021: 3-18.
    [3]
    余凯, 贾磊, 陈雨强, 等. 深度学习的昨天、今天和明天[J]. 计算机研究与发展, 2013, 50(9): 1799-1804. https://www.cnki.com.cn/Article/CJFDTOTAL-JFYZ201309002.htm

    Yu Kai, Jia Lei, Chen Yuqiang, et al. Deep Learning: Yesterday, Today, and Tomorrow[J]. Journal of Computer Research and Development, 2013, 50(9): 1799-1804. https://www.cnki.com.cn/Article/CJFDTOTAL-JFYZ201309002.htm
    [4]
    LeCun Y, Bengio Y, Hinton G. Deep Learning[J]. Nature, 2015, 521(7553): 436-444. doi: 10.1038/nature14539
    [5]
    李德仁, 龚健雅, 李京伟, 等. 中国空间数据基础设施建设[J]. 测绘通报, 2002(11): 4-7. https://www.cnki.com.cn/Article/CJFDTOTAL-CHTB200212000.htm

    Li Deren, Gong Jianya, Li Jingwei, et al. Establishment of the Spatial Data Infrastructure of China[J]. Bulletin of Surveying and Mapping, 2002(11): 4-7. https://www.cnki.com.cn/Article/CJFDTOTAL-CHTB200212000.htm
    [6]
    李德仁, 王密, 沈欣, 等. 从对地观测卫星到对地观测脑[J]. 武汉大学学报(信息科学版), 2017, 42(2): 143-149. https://www.cnki.com.cn/Article/CJFDTOTAL-WHCH201702001.htm

    Li Deren, Wang Mi, Shen Xin, et al. From Earth Observation Satellite to Earth Observation Brain[J]. Geomatics and Information Science of Wuhan University, 2017, 42(2): 143-149. https://www.cnki.com.cn/Article/CJFDTOTAL-WHCH201702001.htm
    [7]
    龚健雅, 钟燕飞. 光学遥感影像智能化处理研究进展[J]. 遥感学报, 2016, 20(5): 733-747. https://www.cnki.com.cn/Article/CJFDTOTAL-YGXB201605007.htm

    Gong Jianya, Zhong Yanfei. Survey of Intelligent Optical Remote Sensing Image Processing[J]. Journal of Remote Sensing, 2016, 20(5): 733-747. https://www.cnki.com.cn/Article/CJFDTOTAL-YGXB201605007.htm
    [8]
    龚健雅. 人工智能时代测绘遥感技术的发展机遇与挑战[J]. 武汉大学学报(信息科学版), 2018, 43(12): 1788-1796. https://www.cnki.com.cn/Article/CJFDTOTAL-WHCH201812005.htm

    Gong Jianya. Chances and Challenges for Development of Surveying and Remote Sensing in the Age of Artificial Intelligence[J]. Geomatics and Information Science of Wuhan University, 2018, 43(12): 1788-1796. https://www.cnki.com.cn/Article/CJFDTOTAL-WHCH201812005.htm
    [9]
    Reichstein M, Camps-Valls G, Stevens B, et al. Deep Learning and Process Understanding for Data-driven Earth System Science[J]. Nature, 2019, 566(7743): 195-204.
    [10]
    Yue P, Ramachandran R, Baumann P, et al. Recent Activities in Earth Data Science Technical Committees[J]. IEEE Geoscience and Remote Sensing Magazine, 2016, 4(4): 84-89.
    [11]
    Cheng G, Han J W, Lu X Q. Remote Sensing Image Scene Classification: Benchmark and State of the Art[J]. Proceedings of the IEEE, 2017, 105(10): 1865-1883.
    [12]
    Russell S, Norvig P. Artificial Intelligence: A Modern Approach[M]. Upper Saddle River: Pearson, 2009.
    [13]
    Elmes A, Alemohammad H, Avery R, et al. Accounting for Training Data Error in Machine Learning Applied to Earth Observations[J]. Remote Sensing, 2020, 12(6): 1034.
    [14]
    Cox S. OGC 10-004r3 Geographic Information – Observations and Measurements, Version 2.0[S]. Wayland, USA: Open Geospatial Consortium Inc, 2013.
    [15]
    Bzdok D, Altman N, Krzywinski M. Statistics Versus Machine Learning[J]. Nature Methods, 2018, 15(4): 233-234.
    [16]
    Kaggle. Kaggle: Your Home for Data Science[EB/OL]. [2022-09-20] https://www.kaggle.com.
    [17]
    Landry T. OGC® Testbed-14: Machine Learning Engineering Report[R]. Open Geospatial Consortium, 2018.
    [18]
    Sam M. OGC® Testbed-15: Machine Learning Engineering Report[R]. Open Geospatial Consortium, 2019.
    [19]
    Guy S. OGC® Testbed-16: Machine Learning Training Data Engineering Report[R]. Open Geospatial Consortium, 2021.
    [20]
    Sam L, Kate W, Caitlin A, et al. OGC® Testbed-18: Machine Learning Training Data Engineering Report[R]. Open Geospatial Consortium, 2023.
    [21]
    STAC. SpatioTemporal Asset Catalog[EB/OL]. [2021-12-2] https://stacspec.org.
    [22]
    AIREO. AI Ready EO Training Datasets[EB/OL]. [2021-12-18] https://eo4society.esa.int/projects/aireo.
    [23]
    COCO. Common Objects in Context[EB/OL]. [2021-08-26] https://cocodataset.org.
    [24]
    龚健雅, 许越, 胡翔云, 等. 遥感影像智能解译样本库现状与研究[J]. 测绘学报, 2021, 50(8): 1013-1022. https://www.cnki.com.cn/Article/CJFDTOTAL-CHXB202108004.htm

    Gong Jianya, Xu Yue, Hu Xiangyun, et al. Status Analysis and Research of Sample Database for Intelligent Interpretation of Remote Sensing Image[J]. Acta Geodaetica et Cartographica Sinica, 2021, 50(8): 1013-1022. https://www.cnki.com.cn/Article/CJFDTOTAL-CHXB202108004.htm
    [25]
    Ma L, Liu Y, Zhang X, et al. Deep Learning in Remote Sensing Applications: A Meta-analysis and Review[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2019, 152: 166-177.
    [26]
    Han X F, Laga H, Bennamoun M. Image-based 3D Object Reconstruction: State-of-the-Art and Trends in the Deep Learning Era[J]. IEEE Tran⁃sactions on Pattern Analysis and Machine Intelligence, 2021, 43(5): 1578-1604.
    [27]
    Talukdar S, Singha P, Mahato S, et al. Land-use Land-cover Classification by Machine Learning Classifiers for Satellite Observations—A Review[J]. Remote Sensing, 2020, 12(7): 1135.
    [28]
    Wan L, Xiang Y M, You H J. An Object-based Hierarchical Compound Classification Method for Change Detection in Heterogeneous Optical and SAR Images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(12): 9941-9959.
    [29]
    Herold M, Woodcock C E, di Gregorio A, et al. A Joint Initiative for Harmonization and Validation of Land Cover Datasets[J]. IEEE Transactions on Geoscience and Remote Sensing, 2006, 44(7): 1719-1727.
    [30]
    Loveland T R, Reed B C, Brown J F, et al. Development of a Global Land Cover Characteristics Database and IGBP DISCover from 1 km AVHRR Data[J]. International Journal of Remote Sensing, 2000, 21(6/7): 1303-1330.
    [31]
    Di L P, Yue P, Ramapriyan H K, et al. Geoscience Data Provenance: An Overview[J]. IEEE Transactions on Geoscience and Remote Sensing, 2013, 51(11): 5065-5072.
    [32]
    乐鹏, 郭霞, 张晨晓, 等. 空间数据溯源的概念、模型与服务[J]. 地理与地理信息科学, 2015, 31(6): 1-7. https://www.cnki.com.cn/Article/CJFDTOTAL-DLGT201506001.htm

    Yue Peng, Guo Xia, Zhang Chenxiao, et al. Geospatial Data Provenance: Concept, Model and Services[J]. Geography and Geo⁃Information Science, 2015, 31(6): 1-7. https://www.cnki.com.cn/Article/CJFDTOTAL-DLGT201506001.htm
    [33]
    Yue P, Wei Y, Di L, et al. Sharing Geospatial Provenance in a Service-oriented Environment[J]. Computers, Environment and Urban Systems, 2011, 35(4): 333-343.
    [34]
    Closa G, Masó J, Proß B, et al. W3C PROV to Describe Provenance at the Dataset, Feature and Attribute Levels in a Distributed Environment[J]. Computers, Environment and Urban Systems, 2017, 64: 103-117.
    [35]
    ISO/TC 211. ISO 19115-1: 2014, Geographic Information⁃Metadata⁃Part 1: Fundamentals[S]. ISO, 2014.
    [36]
    Jiang L, Yue P, Kuhn W, et al. Advancing Interoperability of Geospatial Data Provenance on the Web: Gap Analysis and Strategies[J]. Computers & Geosciences, 2018, 117: 21-31.
    [37]
    Goodchild M F. Beyond Metadata: Towards User-Centric Description of Data Quality[C]//Proceedings of the 5th International Symposium on Spatial Data Quality, ISPRS, Berlin, Germany, 2007.
    [38]
    ISO/TC 211. ISO 19157-1: 2023, Geographic Information—Data Quality[S]. ISO, 2023.
    [39]
    Ding J, Xue N, Xia G S, et al. Object Detection in Aerial Images: A Large-scale Benchmark and Challenges[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(11): 7778-7796.
    [40]
    Schmidt F A. Crowdsourced Production of AI Training Data: How Human Workers Teach Self-Driving Cars How to See[R]. Düsseldorf: Hans-Böckler-Stiftung, 2019.
    [41]
    ISO/TC 211. ISO 19107: 2019, Geographic Information—Spatial Schema[S]. ISO, 2019.
    [42]
    Stehman S V, Czaplewski R L. Design and Analysis for Thematic Map Accuracy Assessment[J]. Remote Sensing of Environment, 1998, 64(3): 331-344.
    [43]
    Burger A, Silima T. Sampling and Sampling Design[J]. Journal of Public Administration, 2006, 41(3): 656-668.
    [44]
    Vinyals O, Blundell C, Lillicrap T, et al. Matching Networks for One Shot Learning[J]. Advances in Neural Information Processing Systems, 2016, 29: 3637-3645.
    [45]
    Li H, Cui Z, Zhu Z, et al. RS-MetaNet: Deep Meta Metric Learning for Few-Shot Remote Sensing Scene Classification[J]. ArXiv Preprint, 2020, ArXiv: 2009.13364.
    [46]
    Li X, Sun Q, Liu Y, et al. Learning to Self-Train for Semi-Supervised Few-Shot Classification[J]. Advances in Neural Information Processing Systems, 2019, 32: 10276-10286.
    [47]
    Grill J B, Strub F, Altché F, et al. Bootstrap Your Own Latent — A New Approach to Self-Supervised Learning[J]. Advances in Neural Information Processing Systems, 2020, 33: 21271-21284.
    [48]
    Redmon J, Farhadi A. Yolov3: An Incremental Improvement[J]. ArXiv Preprint, 2018, ArXiv: 1804.02767.
    [49]
    Wang J X, Chen S B, Ding C H Q, et al. RanPaste: Paste Consistency and Pseudo Label for Semisupervised Remote Sensing Image Semantic Segmentation[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 1-16.
    [50]
    Northcutt C, Jiang L, Chuang I. Confident Learning: Estimating Uncertainty in Dataset Labels[J]. Journal of Artificial Intelligence Research, 2021, 70: 1373-1411.
    [51]
    Hu S, Liu C H, Dutta J, et al. PseudoProp: Robust Pseudo-label Generation for Semi-supervised Object Detection in Autonomous Driving Systems[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), New Orleans, USA, 2022.
    [52]
    Pan X, Zhao J, Xu J. An End-to-end and Localized Post-processing Method for Correcting High-resolution Remote Sensing Classification Result Images[J]. Remote Sensing, 2020, 12(5): 852.
    [53]
    Khoreva A, Benenson R, Hosang J, et al. Simple Does It: Weakly Supervised Instance and Semantic Segmentation[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 2017.
    [54]
    Afzal S, Maqsood M, Nazir F, et al. A Data Augmentation-based Framework to Handle Class Imbalance Problem for Alzheimer's Stage Detection[J]. IEEE Access, 2019, 7: 115528-115539.
    [55]
    Bellinger C, Corizzo R, Japkowicz N. Remix: Calibrated Resampling for Class Imbalance in Deep Learning[J]. ArXiv Preprint, 2020, ArXiv: 2012.02312.
    [56]
    Awan S E, Bennamoun M, Sohel F, et al. Imputation of Missing Data with Class Imbalance Using Conditional Generative Adversarial Networks[J]. Neurocomputing, 2021, 453: 164-171.
    [57]
    Wang J, Li F, Bi H X. Gaussian Focal Loss: Learning Distribution Polarized Angle Prediction for Rotated Object Detection in Aerial Images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 1-13.
    [58]
    Sun X, Wang P J, Lu W X, et al. RingMo: A Remote Sensing Foundation Model with Masked Image Modeling[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 1-22.
    [59]
    Yue P, Shangguan B Y, Hu L, et al. Towards a Training Data Model for Artificial Intelligence in Earth Observation[J]. International Journal of Geographical Information Science, 2022, 36(11): 2113-2137.
    [60]
    Panagiotis A V, Tom K, Charles H, et al. OGC 20-004 OGC API - Records-Part 1: Core, Version 1.0.0[S]. Wayland, USA: Open Geospatial Consortium Inc, 2020.
    [61]
    Paszke A, Gross S, Massa F, et al. PyTorch: An Imperative Style, High-performance Deep Learning Library[J]. Advances in Neural Information Processing Systems, 2019, 32: 8024-8035.
    [62]
    Abadi M, Agarwal A, Barham P, et al. Tensorflow: Large-Scale Machine Learning on Heterogeneous Distributed Systems[EB/OL]. (2016-04-14)[2023-02-23]. https://arxiv.org/abs/1603.04467.
  • Related Articles

    [1]HE Chaoyang, XU Qiang, JU Nengpan, XIE Mingli. Optimization of Model Scheduling Algorithm in Real-Time Monitoring and Early Warning of Landslide[J]. Geomatics and Information Science of Wuhan University, 2021, 46(7): 970-982. DOI: 10.13203/j.whugis20200314
    [2]CAO Zhipeng, JIANG Liangcun, HUANG Qiujun, YUE Peng, SHANGGUAN Boyi, LUO Aling, LIANG Zheheng. A Dynamic Scheduling Method of Logistics Vehicles Based on Ruin and Recreate Algorithm[J]. Geomatics and Information Science of Wuhan University, 2021, 46(5): 755-765, 776. DOI: 10.13203/j.whugis20200017
    [3]ZHU Qing, HAN Huipeng, YU Jie, DU Zhiqiang, ZHANG Junxiao, WU Chen, SHEN Fuqiang. Multi-objective Optimization Scheduling Method for UAV Resources in Emergency Surveying and Mapping[J]. Geomatics and Information Science of Wuhan University, 2017, 42(11): 1608-1615. DOI: 10.13203/j.whugis20130000
    [4]ZHANG Dengyi, GUO Lei, WANG Qian, ZOU Hua. An Improved Single-orbit Scheduling Method for Agile ImagingSatellite Towards Area Target[J]. Geomatics and Information Science of Wuhan University, 2014, 39(8): 901-905. DOI: 10.13203/j.whugis20130233
    [5]CHEN Di, ZHU Xinyan, ZHOU Chunhui, SU Kehua. Distributed Spatial Query Processing and Parallel Schedule Based on Zonal Fragmentation[J]. Geomatics and Information Science of Wuhan University, 2012, 37(8): 892-896.
    [6]YANG Chuncheng, XIE Peng, HE Liesong, ZHOU Xiaodong. Data Scheduling for Map Data Reading[J]. Geomatics and Information Science of Wuhan University, 2009, 34(2): 166-169.
    [7]LIN Aiwen, NIU Jiqiang, HU Lifeng. Evaluation of Natural Resources with Grey Clustering Model[J]. Geomatics and Information Science of Wuhan University, 2008, 33(2): 164-167.
    [8]XIE Hongyu, LIU Nianfeng, YAO Ruizhen, SONG Weiwei. Resource Yield Method on Ecological Footprint Analysis[J]. Geomatics and Information Science of Wuhan University, 2006, 31(11): 1018-1021.
    [9]YU Dandan, HE Yanxiang, TU Guoqing. A Market-based Hierarchical Model for Resource Management Architecture in Spatial Information Grid[J]. Geomatics and Information Science of Wuhan University, 2005, 30(9): 837-840.
    [10]Li Mingshan, Lu Zhiyan. The Generalized Backtracking Method & the Optimum Task Scheduling of the Parallel Computer System[J]. Geomatics and Information Science of Wuhan University, 1996, 21(1): 90-95.
  • Cited by

    Periodical cited type(2)

    1. 谭冰,高春春,陆洋,卢鹏,李志军. 南极威德尔海西北区域冬季海冰龙骨形态分析. 武汉大学学报(信息科学版). 2021(09): 1386-1394 .
    2. 陈俊霖,周春霞,赵秋阳. 2003—2018年Byrd冰川流域冰下湖活动及水文联系——多源卫星测高数据监测结果分析. 测绘学报. 2020(05): 547-556 .

    Other cited types(3)

Catalog

    Article views (921) PDF downloads (295) Cited by(5)
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return