WANG Shuliang, LI Ying, GENG Jing. A Low-Dimensional Manifold Representative Point Method to Estimate the Non-parametric Density for High-Dimensional Data[J]. Geomatics and Information Science of Wuhan University, 2021, 46(1): 65-70. DOI: 10.13203/j.whugis20160115
Citation: WANG Shuliang, LI Ying, GENG Jing. A Low-Dimensional Manifold Representative Point Method to Estimate the Non-parametric Density for High-Dimensional Data[J]. Geomatics and Information Science of Wuhan University, 2021, 46(1): 65-70. DOI: 10.13203/j.whugis20160115

A Low-Dimensional Manifold Representative Point Method to Estimate the Non-parametric Density for High-Dimensional Data

Funds: 

The National Key Research and Development Program of China 2020YFC0832600

the National Natural Science Fundation of China 62076027

More Information
  • Author Bio:

    WANG Shuliang, PhD, professor, specializes in spatial data mining.E-mail: slwang2005@whu.edu.cn

  • Corresponding author:

    GENG Jing, postdoctoral fellow. E-mail: janegeng@bit.edu.cn

  • Received Date: May 24, 2019
  • Published Date: January 04, 2021
  • When learning from high-dimensional sample data in big data, the non-parametric kernel method uses a unified metric, which is prone to dimensional disasters. If the low-dimensional geometric characteristics embedded in the high-dimensional space are found, it is helpful to reveal the manifold structure of the data distribution, and the high-dimensional data with limited samples can be used to approximate the true distribution of the data in the low-dimensional subspace. Based on this, this paper proposes a new low-dimensional manifold representative point method for non-parametric density estimation of high-dimensional data, which estimates the density by mining the geometric structure of the data distribution from the high-dimensional space. First, the local covariance matrix is calculated and the local data distribution is characterized by looking for points in the local area that can represent the main direction of the manifold structure. Then, each sample data point contribution is weight to density considering the different effects of the data points on or near the manifold structure. The experimental results show that, compared with the traditional kernel density estimation method and the manifold kernel density method, our proposed method can quickly and robustly perform density estimation and reflect the true distribution of data.
  • [1]
    [1] Bellman R. Adaptive Control Processes: A Guided Tour[M]. New Jersey: Princeton University Press, 1961
    [2]
    刘瑜, 康朝贵, 王法辉.大数据驱动的人类移动模式和模型研究[J].武汉大学学报·信息科学版, 2014, 39(6):660-666 doi: 10.13203/j.whugis20140149

    Liu Yu, Kang Chaogui, Wang Fahui. Towards Big Data-Driven Human Mobility Patterns and Models[J].Geomatics and Information Science of Wuhan University, 2014, 39(6): 660-666 doi: 10.13203/j.whugis20140149
    [3]
    Shankar K, Lakshmanaprabu S K, Gupta D.et al. Optimal Feature-Based Multi-Kernel SVM Approach for Thyroid Disease Classification[J]. The Journal of Supercomputing, 2020, 76(2): 1 128-1 143 doi: 10.1007/s11227-018-2469-4
    [4]
    周源, 方圣辉, 李德仁.利用光谱角敏感森林的高光谱数据快速匹配方法[J].武汉大学学报·信息科学版, 2011, 36(6): 687-690 http://ch.whu.edu.cn/article/id/577

    Zhou Yuan, Fang Shenghui, Li Deren. A Fast Spectral Matching Algorithm for Larger-Scale Hyperspectral Data:Spectral Angle Sensitive Forest[J]. Geomatics and Information Science of Wuhan University, 2011, 36(6): 687-690 http://ch.whu.edu.cn/article/id/577
    [5]
    Ibtehaz N, Rahman M S. MultiResUNet : Rethinking the U-Net Architecture for Multimodal Biomedical Image Segmentation[J]. Neural Networks, 2020, 121: 74-87 doi: 10.1016/j.neunet.2019.08.025
    [6]
    张晓祥.大数据时代的空间分析[J].武汉大学学报·信息科学版, 2014, 39(6): 655-659 doi: 10.13203/j.whugis20140143

    Zhang Xiaoxiang. Spatial Analysis in the Era of Big Data[J]. Geomatics and Information Science of Wuhan University, 2014, 39(6): 655-659 doi: 10.13203/j.whugis20140143
    [7]
    Candes E J, Li X, Ma Y, et al. Robust Principal Component Analysis[J]. Journal of the ACM, 2011, 58(3):1-11
    [8]
    Jolliffe I T, Cadima J. Principal Component Analysis: A Review and Recent Developments[J]. Philosophical Transactions of the Royal Society A, 2016, 374(2 065): 20150202 https://pubmed.ncbi.nlm.nih.gov/26953178/
    [9]
    Scornet E. Random Forests and Kernel Methods[J].IEEE Transactions on Information Theory, 2016, 62(3): 1 485-1 500 doi: 10.1109/TIT.2016.2514489
    [10]
    Cox T, Cox M. Multidimensional Scaling[C]. Chapman & Hall, London, UK, 1994
    [11]
    Vincent P, Bengio Y. Manifold Parzen Windows[C]. Advances in Neural Information Processing Systems, Cambridge, UK, 2003
    [12]
    Goldberger J, Roweis S, Hinton G, et al. Neighbourhood Component Analysis[C].Advances in Neural Information Processing Systems, Cambridge, UK, 2005
    [13]
    Akaike H. An Approximation to the Density Function[J]. Annals of the Institute of Statistical Mathematics, 1954, 6:127-132 doi: 10.1007/BF02900741
    [14]
    Parzen E. On the Estimation of a Probability Density Function and the Mode[J]. Annals of Mathematical Statistics, 1962, 33:1 065-1 076 doi: 10.1214/aoms/1177704472
    [15]
    Rosenblatt F. Remarks on Some Nonparametric Estimates of a Density Function[J]. Annals of Mathematical Statistics, 1956, 27:832-837 doi: 10.1214/aoms/1177728190
    [16]
    Padhraic S. Model Selection for Probabilistic Clustering Using Cross-Validated Likelihood[J]. Statistics and Computing, 2000, 9: 63-72 doi: 10.1023/A:1008940618127

Catalog

    Article views PDF downloads Cited by()
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return