一种基于重排检验的时空聚类方法

A Spatio-temporal Clustering Method Based on Permutation Test

  • 摘要: 融合时空邻近与专题属性相似的时空聚类是挖掘地理现象时空演化规律的重要手段。现有方法需要的聚类参数许多难以获取,影响了聚类方法的可操作性与聚类结果的可靠性。提出一种基于重排检验的时空聚类方法。首先,通过重排检验发现时空数据集中的均质子区域;进而,采用均方误差准则合并均质子区域内的时空实体生成时空簇,并通过簇内重排检验自动识别聚类合并的终止条件;最后,借助时空拓扑关系在保证结果精度的前提下发展一种快速重排检验的方法,提高了聚类方法的运行效率。通过实验和比较发现,该方法一方面可以发现不同形状、大小的时空簇,聚类质量优于经典的ST-DBSCAN方法;另一方面聚类过程中人为设置参数的主观性显著降低,提高了聚类方法的可操作性。

     

    Abstract: Spatio-temporal clustering is an important technique for mining dynamic patterns of geographical phenomena, which aims to discover groups of data so that the intra-cluster similarity is maximized and the inter-cluster similarity is minimized. Spatio-temporal clustering has been a hot topic in the field of spatio-temporal data mining and knowledge discovery. However, the performance of existing methods is seriously influenced by a series of user-specified parameters, and the significance of discovered clusters cannot be evaluated in an objective way. On that account, in this paper, a spatio-temporal clustering method based on permutation test considering both spatio-temporal proximity and attribute similarity is developed. Firstly, homogeneous sub-regions with similar attributes in the dataset are identified using the permutation testing. Then, the mean squared error criterion is adopted to group these homogeneous sub-regions into larger clusters, and a permutation testing procedure is also developed to evaluate the significance of the detected clusters. Finally, in order to improve the efficiency of the proposed method without losing the accuracy, a fast permutation testing method is developed by using the topological information among the entities. Experiments on both simulated and real-life datasets show that, on the one hand, the proposed method is effective for detecting spatio-temporal clusters of different shapes and sizes with similar thematic attributes; on the other hand, the subjectivity in clustering is significantly reduced. The proposed method is applied successfully to find spatio-temporal clusters in China's monthly average precipitation database, and the detected dynamic patterns (i.e. spatio-temporal clusters) can be helpful to investigate and interpret the developmental trends of precipitation.

     

/

返回文章
返回