Abstract:
When learning from high-dimensional sample data in big data, the non-parametric kernel method uses a unified metric, which is prone to dimensional disasters. If the low-dimensional geometric characteristics embedded in the high-dimensional space are found, it is helpful to reveal the manifold structure of the data distribution, and the high-dimensional data with limited samples can be used to approximate the true distribution of the data in the low-dimensional subspace. Based on this, this paper proposes a new low-dimensional manifold representative point method for non-parametric density estimation of high-dimensional data, which estimates the density by mining the geometric structure of the data distribution from the high-dimensional space. First, the local covariance matrix is calculated and the local data distribution is characterized by looking for points in the local area that can represent the main direction of the manifold structure. Then, each sample data point contribution is weight to density considering the different effects of the data points on or near the manifold structure. The experimental results show that, compared with the traditional kernel density estimation method and the manifold kernel density method, our proposed method can quickly and robustly perform density estimation and reflect the true distribution of data.