两种适用于线性回归EIV模型的高崩溃污染率算法

Two Algorithms with High Breakdown Points Applied in Linear Regression EIV Model

  • 摘要: 混合总体最小二乘是求解带有固定列的线性回归变量误差(errors-in-variables,EIV)模型的严密方法,结合M估计可以进一步增加其稳健性。但是M估计结果受初值影响,容易收敛错误。针对该问题,将两种高斯-马尔可夫模型下的抗差估计算法拓展到EIV模型中,提出两种高崩溃污染率的算法,即加权总体最小平方中值法(weighted total least median of squares,WTLMS)和加权截断总体最小二乘法(weighted total least trimmed squares,WTLTS)。分析两种算法的等变性质和崩溃污染率,给出单位权中误差的评定公式,分别通过重采样方法和可行集算法得到参数估计值。不同于已有的高崩溃污染率算法,所提算法考虑系数矩阵存在固定列的情况,同时减少对随机模型的限制。仿真数据和真实数据解算结果验证了两种算法在高粗差污染的观测数据中能够得到稳健可靠的估计结果。

     

    Abstract:
    Objectives Linear regression model is a basic model in the field of geodesy. To consider the structure of the coefficient matrix with the fixed column, the mixed least squares and total least squares method is implemented. However, it is easily contaminated by outliers. The M-estimator results depend on the initial value and are extremely prone to convergence badly. To increase the robustness, we propose two algorithms with high breakdown points for linear regression errors-in-variables (EIV) models, namely, the weighted total least median of squares (WTLMS) method and the weighted total least trimmed squares (WTLTS) method.
    Methods The two algorithms are extensions of traditional algorithms and use a more general stochastic model. Their breakdown points are near 50% and the two algorithms have two equivariant properties: scale equivariance and affine equivariance. The estimation formula of variance components is given. Since their objective functions are not differentiable, WTLMS and WTLTS get the solutions by the resampling algorithm and the feasible set algorithm in the EIV model respectively.
    Results The results show that: (1) The result of the M-estimator is biased heavily from the real line, while the two proposed algorithms can obtain results close to the true value. Their performances are significantly better than M-estimator in terms of root mean square error and standard deviation. The efficiency of the two algorithms is not high, which can be further improved when the results of the two algorithms are used as the initial value of the M-estimator. The breakdown points of the two algorithms are close to 50% in the real data, which is extremely robust. (2) In the experiment of the LiDAR data, the performance of the proposed methods is better than that of the M-estimator.
    Conclusions The two proposed algorithms have outstanding robustness, but their complexities are high and their efficiency is not ideal. We will focus to find an easy solution with higher efficiency.

     

/

返回文章
返回