Abstract:
For observational data with systematic and gross errors, this paper presents a method to locate and value gross errors by introducing the mixed Cook distance into the semiparametric model. Firstly, by structuring a penalized least squares function and using Taylor expansion, and according to the equivalence of the mean shift model and data deleted model, the penalized least squares estimation expression of the parametric and non-parametric components are obtained for the data with a deleted ith observation, which is useful for locating the gross errors. Secondly, with the help of mixed Cook distance as a kind of diagnostic statistic, the corresponding formula for the parametric and non-parametric components are deducted, in order to improve the accuracy when locating gross errors. Common forms of parameters
Q and
C are given, which can influence the mixed Cook distance directly, as different choices for the parameters yield different results. By selecting the appropriate parameters and calculating the Cook distances of the parametric and non-parametric components, the positions of gross errors are determined, thus the systematic error and gross error can be separated from the observed data. Using simulated computations and a real example, it is shown that the method can effectively determine the position and fixed values of gross error, and illustrating the effectiveness of the proposed approach.