利用等边长正交格网进行层次聚合聚类

Hierarchical Agglomerative Clustering Algorithm with Equilateral Orthogonal Grid

  • 摘要: 层次聚合聚类的典型算法可以体现研究数据的多尺度特征,但是典型算法的时空复杂度太高。通过将数据所在空间划分成等边长正交格网,结合3点间距离的传递性排除冗余计算,并将其推广到N维空间。设计了一种与典型算法遵循相同的单链规则,可即时计算类间距离且无需计算距离矩阵的算法,在获得与典型算法相同的多尺度聚类序列的同时,所需内存远小于典型算法。实验结果表明,该算法无需人工干预且不使用距离矩阵,能大幅降低层次聚合聚类的运行时间,但是效率优势随空间维数增长逐渐降低。

     

    Abstract: The process of the typical algorithm of hierarchical agglomerative clustering (HAC) reflects the multi-scale property of studying data, which is crucial in the research of Geography, Cartography and Remote Sensing. However, the typical algorithm is inefficient and costs too much memory space. In this paper, considering the transitivity of distances among points, the studying data are divided into equilateral orthogonal grids to avoid redundant computation; moreover, the feasibility of the proposed algorithm is proved in theory and extended to N-dimensional space. The proposed algorithm follows the same single-link clustering rule as the typical algorithm; and therefore, it generates the same multi-scale clustering series as the typical algorithm. Since there is no distance-matrix, the proposed algorithm costs much less memory space. Experimental results show that although without any manual intervention and distance-matrix, the proposed algorithm obviously improves the efficiency of HAC. However, the advantage decreases as the extending of the dimension number.

     

/

返回文章
返回