YU Xin, ZHENG Zhaobao, LI Linyi. Oblique Factor Model for Selecting Training Samples[J]. Geomatics and Information Science of Wuhan University, 2022, 47(11): 1870-1877. DOI: 10.13203/j.whugis20200631
Citation: YU Xin, ZHENG Zhaobao, LI Linyi. Oblique Factor Model for Selecting Training Samples[J]. Geomatics and Information Science of Wuhan University, 2022, 47(11): 1870-1877. DOI: 10.13203/j.whugis20200631

Oblique Factor Model for Selecting Training Samples

Funds: 

The National Key Research and Development Program of China 2018YFC0407804

More Information
  • Author Bio:

    YU Xin, PhD, professor, specializes in photogrammetry and remote sensing, image interpretation and artificial intelligence. E-mail: china_yuxin@163.com

  • Corresponding author:

    LI Linyi, PhD, associate professor. E-mail: lilinyi@whu.edu.cn

  • Received Date: December 09, 2020
  • Available Online: November 15, 2022
  • Published Date: November 04, 2022
  •   Objectives  Researchers notice that the quality of training samples will impact the effective of training phase and then further will have an influence on the overall classification accuracy in the testing phase. In fact, representativeness or typicalness of training samples is able to reflect the quality of training samples in a way. Especially for the currently popular deep learning methods, it has needed thousands or millions of training samples. Therefore, how to reduce the number of training samples for deep learning method becomes a very important problem. In another hand, from the actual application angle, it is also very expensive. Therefore, we propose one method of reducing the training samples as less as possible based on the representativeness or typicalness of training samples.
      Methods  Selection of training samples based on oblique factor model is proposed and it relaxes the independent condition among common factors in the orthogonal factor model, which is able to better describe the real world.
      Results  Experimental results show the proposed method is feasible and effective and it is able to select more representative training samples than the method of selection of training samples based on orthogonal factor model and achieve better performance in the overall classification precision and stability. And the selection of training samples based on oblique factor model outperforms selection of training samples based on orthogonal factor model. And the distribution of selected samples becomes more decentralized and reasonable and the overall classification accuracy averagely improves about 3%.
      Conclusions  The proposed method not only supports how to optimize capturing data in the theory, but also is able to guide how to effectively capture data in the actual application.
  • [1]
    Adeli E, Li X, Kwon D, et al. Logistic Regression Confined by Cardinality-Constrained Sample and Feature Selection [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(7): 1713-1728 doi: 10.1109/TPAMI.2019.2901688
    [2]
    Arnold R B, Wang L, Lopez T, et al. Updating Lead and Copper Rule Sample- Site Selection: Best Practices from an Innovative Pilot Program [J]. Journal of American Water Works Association, 2020, 112(4): 22-31 doi: 10.1002/awwa.1478
    [3]
    Au J, Youngentob K N, Foley W J, et al. Sample Selection, Calibration and Validation of Models Developed from a Large Dataset of Near Infrared Spectra of Tree Leaves[J]. Journal of Near Infrared Spectroscopy, 2020, 28(4): 096703352090253
    [4]
    Bellver M, Salvador A, Torres J, et al. Mask-Guided Sample Selection for Semi-supervised Instance Segmentation[J]. Multimedia Tools and Applications, 2020, 79(4): 1-19
    [5]
    Silva M V B, Carvalho A A P, Jacobs A S, et al. Sample Selection Search to Predict Elephant Flows in IXP Programmable Networks[C]//International Conference on Advanced Information Networking and Applications, Caserta, Italy, 2020
    [6]
    Fernández M, García J E, Gholizadeh R, et al. Sample Selection Procedure in Daily Trading Volume Processes[J]. Mathematical Methods in the Applied Sciences, 2020, 43(13): 7537-7549 doi: 10.1002/mma.5705
    [7]
    He Kaixun, Wang Kai, Yan Yayun. Active Training Sample Selection and Updating Strategy for Near-Infrared Model with an Industrial Application [J]. Chinese Journal of Chemical Engineering, 2019, 27(11): 2749-2758 doi: 10.1016/j.cjche.2019.02.018
    [8]
    Kral J, Gotthans T, Marsalek R, et al. On Feedback Sample Selection Methods Allowing Lightweight Digital Predistorter Adaptation[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2020, 67(6): 1976-1988 doi: 10.1109/TCSI.2020.2975532
    [9]
    Li Huiyong, Bao Weiwei, Hu Jinfeng, et al. A Training Samples Selection Method Based on System Identification for STAP [J]. Signal Processing, 2018, 142: 119-124
    [10]
    Liu Jing, Zhu Axing, Rossiter D, et al. A Trustworthiness Indicator to Select Sample Points for the Individual Predictive Soil Mapping Method (iPSM) [J]. Geoderma, 2020, 373
    [11]
    Liu X, Zhu A X, Yang L, et al. A Graded Proportion Method of Training Sample Selection for Updating Conventional Soil Maps[J]. Geoderma, 2020, 357: 113939 doi: 10.1016/j.geoderma.2019.113939
    [12]
    Lu Qikai, Ma Yong, Xia Guisong. Active Learning for Training Sample Selection in Remote Sensing Image Classification Using Spatial Information [J]. Remote Sensing Letters, 2017, 8(12): 1210-1219 doi: 10.1080/2150704X.2017.1375610
    [13]
    Lu Wenbo, Ma Chaoqun, Li Peikun. Research on Sample Selection of Urban Rail Transit Passenger Flow Forecasting Based on SCBP Algorithm [J]. IEEE Access, 2020, 8: 89425-89438 doi: 10.1109/ACCESS.2020.2993595
    [14]
    Lu Yang, Ma Xiaolei, Lu Yinan. A Cluster-Based Sample Selection Strategy for Biological Event Extraction [C] // The 9th International Workshop on Computer Science and Engineering, Hong Kong, China, 2019
    [15]
    Ma Jing, Hong Dezhi, Wang Hongning. Selective Sampling for Sensor Type Classification in Buildings [C]//The 19th ACM/IEEE International Conference on Information Processing in Sensor Networks, Sydney, Australia, 2020
    [16]
    Ng W W Y, Jiang X, Tian X, et al. Incremental Hashing with Sample Selection Using Dominant Sets[J]. International Journal of Machine Learning and Cybernetics, 2020, 11(12): 2689-2702 doi: 10.1007/s13042-020-01145-z
    [17]
    Hamid R. Considering Factors Affecting the Prediction of Time Series by Improving Sine-Cosine Algorithm for Selecting the Best Samples in Neural Network Multiple Training Model [J]. Lecture Notes in Electrical Engineering, 2019, 480: 307-320
    [18]
    虞欣, 郑肇葆. 基于Q型因子分析的训练样本的选择[J]. 测绘学报, 2007, 36(1): 67-71

    Yu Xin, Zheng Zhaobao. Selcection of Training Samples Based on R-Q Factor Analysis[J]. Acta Geodaetica et Cartographica Sinica, 2007, 36(1): 67-71
    [19]
    虞欣, 郑肇葆. 基于对应分析的训练样本的选择[J]. 测绘学报, 2008, 37(2): 190-195

    Yu Xin, Zheng Zhaobao. Selcection of Training Samples Based on Correspondence Analysis[J]. Acta Geodaetica et Cartographica Sinica, 2008, 37(2): 190-195
    [20]
    Tang Pengfei, Du Peijun, Lin Cong, et al. A Novel Sample Selection Method for Impervious Surface Area Mapping Using JL1-3B Nighttime Light and Sentinel-2 Imagery [J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2020, 13: 3931-3941
    [21]
    Tran N, Abramenko O, Jung A. On the Sample Complexity of Graphical Model Selection from Non-stationary Samples[J]. IEEE Transactions on Signal Processing, 2019, 68: 17-32
    [22]
    Varshavskiy I E, Dmitriev I A, Krasnova A I, et al. Selection of Sampling Rate for Digital Noise Filtering Algorithms[C]//IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering, St. Petersburg and Moscow, Russia, 2020
    [23]
    Xu Xinzheng, Li Shan, Liang Tianming, et al. Sample Selection-Based Hierarchical Extreme Learning Machine [J]. Neurocomputing, 2020, 377: 95-102
    [24]
    於崇文. 数学地质的方法与应用[M]. 北京: 冶金工业出版社, 1980

    Yu Chongwen. Mathematical Geology and Application[M]. Beijing: Metallurgy Industry Press, 1980
    [25]
    Zhang Chenxiao, Wu Yifeng, Guo Mingming, et al. Training Sample Selection for Space-Time Adaptive Processing Based on Multi-frames[J]. Journal of Engineering, 2019, 20: 6369-6372
    [26]
    Zhang X, Seyfi T, Ju S, et al. Deep Learning for Interference Identification: Band, Training SNR, and Sample Selection[C]//The 20th International Workshop on Signal Processing Advances in Wireless Communications, Cannes, France, 2019
    [27]
    虞欣, 郑肇葆, 汤凌, 等. 基于Naive Bayes Classifiers的航空影像纹理分类[J]. 武汉大学学报∙信息科学版, 2006, 31(2): 108-111 http://ch.whu.edu.cn/article/id/2379

    Yu Xin, Zheng Zhaobao, Tang Ling, et al. Aerial Image Texture Classification Based on Naive Bayes Classifiers[J]. Geomatics and Information Science of Wuhan University, 2006, 31(2): 108-111 http://ch.whu.edu.cn/article/id/2379
    [28]
    虞欣, 郑肇葆, 叶志伟, 等. 基于Tree Augmented Naive Bayes Classifier的影像纹理分类[J]. 武汉大学学报∙信息科学版, 2007, 32(4): 287-289 http://ch.whu.edu.cn/article/id/1872

    Yu Xin, Zheng Zhaobao, Ye Zhiwei, et al. Texture Classification Based on Tree Augmented Naive Bayes Classifier[J]. Geomatics and Information Science of Wuhan University, 2007, 32(4): 287-289 http://ch.whu.edu.cn/article/id/1872
    [29]
    郑肇葆, 潘励, 郑宏. 图像纹理基元分类的马尔柯夫随机场方法[J]. 武汉大学学报∙信息科学版, 2017, 42(4): 463-467 doi: 10.13203/j.whugis20150615

    Zheng Zhaobao, Pan Li, Zheng Hong. A Method of Image Texture Texton Classification with Markov Random Field[J]. Geomatics and Information Science of Wuhan University, 2017, 42(4): 463-467 doi: 10.13203/j.whugis20150615
    [30]
    郑肇葆, 郑宏. 利用数据引力进行图像分类[J]. 武汉大学学报∙信息科学版, 2017, 42(11): 1604-1607 doi: 10.13203/j.whugis20160457

    Zheng Zhaobao, Zheng Hong. Image Classification Based on Data Gravitation[J]. Geomatics and Information Science of Wuhan University, 2017, 42(11): 1604-1607 doi: 10.13203/j.whugis20160457
  • Related Articles

    [1]ZHANG Aizhu, LI Renren, LIANG Shuneng, SUN Genyun, FU Hang. Hyperspectral Image Classification Based on Sample Augment and Spectral Space Iteration[J]. Geomatics and Information Science of Wuhan University, 2025, 50(1): 97-109. DOI: 10.13203/j.whugis20220708
    [2]WU Hongyang, ZHOU Chao, LIANG Xin, WANG Yue, YUAN Pengcheng, WU Lixing. Evaluation of Landslide Susceptibility Based on Sample Optimization Strategy[J]. Geomatics and Information Science of Wuhan University, 2024, 49(8): 1492-1502. DOI: 10.13203/j.whugis20220527
    [3]GUO Haoran, ZHANG Xin, ZHENG Yizhen. Research on Unsupervised Vegetation Remote Sensing Mapping Method Based on Sample Migration[J]. Geomatics and Information Science of Wuhan University. DOI: 10.13203/j.whugis20230242
    [4]SUN Yifan, YU Xuchu, TAN Xiong, LIU Bing, GAO Kuiliang. Lightweight Relational Network for Small Sample Hyperspectral Image Classification[J]. Geomatics and Information Science of Wuhan University, 2022, 47(8): 1336-1348. DOI: 10.13203/j.whugis20210157
    [5]YANG Gang, SUN Weiwei, ZHANG Dianfa. Separable Nonnegative Matrix Factorization Based Band Selection for Hyperspectral Imagery[J]. Geomatics and Information Science of Wuhan University, 2019, 44(5): 737-744. DOI: 10.13203/j.whugis20170174
    [6]GAO Yunlong, ZHANG Fan, QU Xiaozhi, HUANG Xianfeng, CUI Tingting. A Method for Window Extraction with Automatic Sample Selection and Regularity Constraint[J]. Geomatics and Information Science of Wuhan University, 2018, 43(3): 436-443. DOI: 10.13203/j.whugis20150225
    [7]WEN Qi, XIA Liegang, LI Lingling, WU Wei. Automatically Samples Selection in Disaster Emergency Oriented Land-Cover Classification[J]. Geomatics and Information Science of Wuhan University, 2013, 38(7): 799-804.
    [8]BO Shukui, HAN Xinchao, DING Lin. Automatic Selection of Segmentation Parameters for Object Oriented Image Classification[J]. Geomatics and Information Science of Wuhan University, 2009, 34(5): 514-517.
    [9]WANG Yi, ZHANG Liangpei, LI Pingxiang. Purified Algorithm for Training Samples Based on Automatic Searching and Spectral Matching Technique[J]. Geomatics and Information Science of Wuhan University, 2007, 32(3): 216-219.
    [10]XU Fang, MEI Wensheng, YAN Qin. Pre-selection Sample Method of Genetic Algorithm Fuzzy C-Mean in Support Vector Machines[J]. Geomatics and Information Science of Wuhan University, 2005, 30(10): 921-924.
  • Cited by

    Periodical cited type(2)

    1. 闫亚亮,陈龙,赵珺,王伟. 基于相关向量机样本选择的钢铁企业副产煤气系统预测. 冶金自动化. 2023(03): 35-43 .
    2. 王昱麒,李斌,朱明旺,刘燕德. 应用最小角回归索套算法优选苹果糖度预测模型的建模样本和波长. 光谱学与光谱分析. 2023(05): 1419-1425 .

    Other cited types(1)

Catalog

    Article views (969) PDF downloads (47) Cited by(3)
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return