不等距划分的高维相似性度量方法研究

Similarity Measurement in High Dimensional Space Based on Unequally Spaced Partition

  • 摘要: 介绍了现有的高维相似性度量的改进方法,对其中存在的问题进行了分析说明,然后利用不等距维区间划分对相似性度量函数PIDist(X,Y,kd)进行改进,并对UCI提供的机器学习数据库中的heart-statlog和vehicle数据集进行聚类分析对比实验,实验结果验证了改进高维相似性度量方法的正确性和有效性。

     

    Abstract: Because of the curse of dimensionality,the traditional similarity measurement has been unfit for the high dimensional space.And the equally spaced partition can't represent the data distribution,so the similarity measurement based on that can't compute the similarity between high dimensional data reasonably.Aimed at these problems,the existing improved similarity measurements for high dimensional space are introduced firstly,and the problems are analyzed.Then,improves the similarity measurement PIDist(X,Y,kd) based on unequally spaced partition of each dimension.Finally,the experimental result of clustering heart-statlog and vehicle data sets provided by UCI proves the validity of the proposed similarity measurement.

     

/

返回文章
返回