适用于训练样本选择的斜交因子模型研究

虞欣; 郑肇葆; 李林宜

doi:10.13203/j.whugis20200631

适用于训练样本选择的斜交因子模型研究

Oblique Factor Model for Selecting Training Samples

摘要

摘要: 训练样本的质量直接影响训练阶段的训练质量（或效果），进而在一定程度上影响测试阶段的分类精度。训练样本的代表性和典型性则反映出训练样本质量的一个重要方面。对于当前非常流行的深度学习模型研究，如何尽可能地减少训练样本的数量，一方面成为一个非常“棘手”的问题，另一方面从实际应用的角度来看，这也上升为一个经济或成本方面的问题。提出了一种适用于训练样本选择的斜交因子模型方法，该方法松弛了Q型因子分析和对应分析对于公因子之间独立的假设条件，并在斜交参考解的基础上提出一种适合训练样本选择的近似求解斜交旋转的方法。实验结果表明，所提方法是可行、有效的。与基于正交因子模型的方法相比，它可以更好地描述或逼近现实的真实情况，可以选择出更合理、更具有代表性的典型训练样本，并且还可以取得满意的分类精度。适用于训练样本选择的斜交因子模型方法优于基于正交因子模型的训练样本的选择方法，被选择的训练样本分布相对更分散、更合理，而且总的分类精度平均提高3%左右。

Abstract:
Objectives Researchers notice that the quality of training samples will impact the effective of training phase and then further will have an influence on the overall classification accuracy in the testing phase. In fact, representativeness or typicalness of training samples is able to reflect the quality of training samples in a way. Especially for the currently popular deep learning methods, it has needed thousands or millions of training samples. Therefore, how to reduce the number of training samples for deep learning method becomes a very important problem. In another hand, from the actual application angle, it is also very expensive. Therefore, we propose one method of reducing the training samples as less as possible based on the representativeness or typicalness of training samples.
Methods Selection of training samples based on oblique factor model is proposed and it relaxes the independent condition among common factors in the orthogonal factor model, which is able to better describe the real world.
Results Experimental results show the proposed method is feasible and effective and it is able to select more representative training samples than the method of selection of training samples based on orthogonal factor model and achieve better performance in the overall classification precision and stability. And the selection of training samples based on oblique factor model outperforms selection of training samples based on orthogonal factor model. And the distribution of selected samples becomes more decentralized and reasonable and the overall classification accuracy averagely improves about 3%.
Conclusions The proposed method not only supports how to optimize capturing data in the theory, but also is able to guide how to effectively capture data in the actual application.

HTML全文

参考文献(30)

施引文献

资源附件(0)