利用顾及数据平衡的随机森林方法构建室内地标显著度评价模型

Indoor Landmark Salience Evaluation Model Using Random Forest with Data Balance Consideration

  • 摘要: 针对室内兴趣点(point of interest,POI)显著度评价问题,提出了一种基于数据平衡的随机森林(random forest,RF)模型。鉴于现有模型在处理数据不均衡及模拟显著度与影响指标之间复杂非线性关系方面的局限性,聚焦视觉、语义和结构三大维度,构建了包含34项特征的指标体系。通过合成少数类过采样技术(synthetic minority over-sampling technique,SMOTE)有效缓解数据不平衡和基于重要性权重的特征优化,形成了随机森林评价模型。实验结果表明,该模型在室内POI数据集上展现出卓越的性能,其准确率、精确度、召回率、加权F1分数和曲线下面积分别达到了0.987、0.984、0.987、0.987和0.999;与未进行数据均衡处理的RF模型相比,性能提升了一倍;与其他模型(如支持向量机、遗传规划算法)相比,性能分别提升了15%和5%,且在测试集上也显示出了良好的泛化性能。

     

    Abstract:
    Objectives With the continuous expansion of urban complexes, subterranean infrastructures, and multilevel transportation hubs, building interiors have become increasingly three-dimensional and intricate. As a consequence, traditional GPS signals are severely attenuated by floor slabs, rendering them inadequate for reliable indoor positioning. In this context, indoor points of interest (POI) have emerged as pivotal spatial anchors for seamless navigation and location-based services, and the rigorous selection of salient indoor POI has consequently become a prominent research focus.
    Methods To address the challenge of quantifying indoor POI saliency, we propose a data-balance-aware enhanced random forest (RF) framework. The existing approaches suffer from evident shortcomings in class imbalance and in capturing the highly non-linear dependencies between saliency and multi-dimensional predictors. Therefore, a comprehensive indicator set comprising 34 fine-grained features is systematically constructed from three complementary perspectives,including visual saliency, semantic saliency, and spatial saliency. The synthetic minority over-sampling technique (SMOTE) was employed to mitigate class imbalance, and a feature selection scheme driven by importance weighting was subsequently implemented. The final RF model, trained on the rebalanced and optimized feature space, delivers high accuracy and strong generalization for indoor POI saliency evaluation.
    Results Empirical research results show that the proposed model demonstrates outstanding performance on the indoor POI dataset. The accuracy, precision, recall, weighted F1 score, and area under the curve have reached 0.987, 0.984, 0.987, 0.987, and 0.999, respectively. Compared with the RF model without data balancing, the performance of the proposed model has been doubled. Compared with other traditional models, such as support vector machine and genetic programming, the performance of the proposed model has been increased by 15% and 5%, respectively. Moreover, the proposed model exhibits good generalization performance on the test set.
    Conclusions We introduce a data-balance-aware RF model, offering a unified framework for indicator selection, imbalanced-sample treatment, and robust model construction in indoor POI saliency assessment, thereby conferring both theoretical and practical impetus to high-precision indoor positioning and intelligent navigation services.

     

/

返回文章
返回