Abstract:
Because of the curse of dimensionality,the traditional similarity measurement has been unfit for the high dimensional space.And the equally spaced partition can't represent the data distribution,so the similarity measurement based on that can't compute the similarity between high dimensional data reasonably.Aimed at these problems,the existing improved similarity measurements for high dimensional space are introduced firstly,and the problems are analyzed.Then,improves the similarity measurement PIDist(X,Y,kd) based on unequally spaced partition of each dimension.Finally,the experimental result of clustering heart-statlog and vehicle data sets provided by UCI proves the validity of the proposed similarity measurement.