-
摘要: 针对复杂居民地多边形的信息挖掘问题,提出了一种多级图划分聚类分析方法,构造居民地多边形的图模型,并通过对图模型进行粗化匹配与重构、初始化分和细化得到聚类结果。首先构建研究区域内居民地建筑物的Delaunay三角网, 生成包含研究对象之间的邻接信息图;然后结合空间认知准则和人类认知的特点,采用形状狭长度、面积比、凹凸性、距离和连通性5个指标度量邻接图的相似性;最后应用多级图划分方法,得到聚类结果。采用中国上海地区的居民地建筑物矢量数据进行聚类分析实验,并对比了改进的k均值算法(k-Means++)、具有噪声鲁棒性的基于密度的空间聚类算法(density-based spatial clustering of applications with noise,DBSCAN)和最小生成树(minimum spanning tree, MST)聚类算法得到的轮廓系数以及视觉效果。实验结果表明,基于多级图划分的居民地多边形聚类分析的结果更加符合人类认知。Abstract: In order to solve the problems of information mining of complex residential polygons, a multi-level graph partition clustering method is proposed to construct the graph model of residential polygons, and the clustering results are obtained by coarsen, matching and reconstruction, initialization and refinement of the graph model. Firstly, the Delaunay triangular network of residential buildings in the study area is constructed to generate the adjacent information graph including the research objects. Then, the similarity of the neighborhood graph is measured by five indexes of shape narrow length, size, convexity, distance and connectivity combined with the characteristics of spatial cognition and human cognition in this paper. Finally, the clustering results are obtained by using the multi-level graph partition method. In the experiment, the vector data of residential buildings in Shanghai are used for clustering analysis, and the silhouette coefficients and visual effects of improved k-Means algorithm (k-Means + +), density-based spatial clustering of applications with noise (DBSCAN) and minimum spanning tree (MST) clustering algorithms with noise robustness are compared. The experimental results show that the results of polygonal clustering analysis based on multi-level graph partition are more consistent with human cognition.
-
-
表 1 空间相似性度量指标
Table 1 Measurement Indicies of Spatial Similarity
度量指标 定义 说明 形状狭长度 $ \text{shp}\left( x, y \right)=\frac{\text{min}\left( \text{asp}\left( x \right), \text{asp}\left( y \right) \right)}{\text{max}\left( \text{asp}\left( x \right), \text{asp}\left( y \right) \right)} $ $ \text{asp}\left( x \right)=\frac{\text{max}\left( {{a}_{x}}, {{b}_{x}} \right)}{\text{min}\left( {{a}_{x}}, {{b}_{x}} \right)} $,其中ax、bx表示多边形x的最小外接矩形的长和宽;同理可得sap(y)。 面积比 $ \text{size}\left( x, y \right)=\frac{\text{Area}\left( x \right)}{\text{Area}\left( y \right)} $ Area(x) < Area(y),Area ( )为多边形的面积。 凹凸性 $ \text{cvx}\left( x, y \right)=\frac{\min \left( \text{cp}\left( x \right), \text{cp}\left( y \right) \right)}{\max \left( \text{cp}\left( x \right), \text{cp}\left( y \right) \right)} $ $ \text{cp}\left( x, y \right)=\frac{\text{Area}\left( x \right)}{\text{Peri}\left( x \right)} $,其中Area(x)、Peri(x)分别为多边形x的面积和周长;同理可得cp(y)。 距离 $ \text{dist}\left( x, y \right)=\frac{n}{\sum\limits_{i=0}^{n}{l_{i}^{x, y}}} $ 如图 3(a)所示,dist(x, y)表示多边形x、y之间的距离。 连通性 con(x, y)=L(Landscape(x, y)) 如图 3(b)所示,con(x, y)表示多边形x、y之间的连通性。 表 2 空间相似性数值(部分)
Table 2 Spatial Similarity Values (Part)
邻接多边形组 相似性度量指标 空间相似值 连接对象ID 连接对象ID 形状狭长度 面积比 凹凸性 距离 连通性 15 17 0.808 8 0.275 3 0.588 4 0.105 2 0.128 4 0.368 0 45 50 0.862 3 0.772 8 0.921 3 0.116 4 0.521 3 0.618 8 274 276 0.919 5 0.748 6 0.860 6 0.156 6 0.272 8 0.570 2 540 560 0.936 5 0.846 3 0.894 6 0.179 9 0.424 1 0.638 8 815 823 0.760 4 0.842 5 0.890 9 0.189 2 0.189 7 0.544 1 1325 1394 0.292 2 0.841 9 0.918 1 0.204 8 0.245 9 0.454 7 1367 1368 0.814 1 0.771 2 0.991 0 0.106 7 0.375 7 0.579 9 表 3 各多边形的轮廓系数(部分)
Table 3 Silhouette Coefficient of Each Polygon (Part)
对象ID 轮廓系数 1 -0.120 8 2 -0.102 2 3 -0.078 3 ⋮ ⋮ 696 -0.113 2 697 0.070 9 698 0.086 2 699 0.066 7 ⋮ ⋮ 1392 -0.395 9 1393 -0.168 6 1394 -0.196 0 表 4 多级图划分过程中的各阶段耗时/s
Table 4 Consumed Time During the Progress of Multi-level Graph Partition Algorithm /s
划分阶段 体积不同的矢量建筑物多边形群组 平均耗时占比/% 200 250 300 350 400 450 500 550 600 构图 1.33 1.41 1.48 1.57 1.64 1.71 1.77 1.83 1.88 11 粗化 7.86 8.33 8.74 9.28 9.69 10.11 10.46 10.82 11.11 65 初始化分 0.97 1.02 1.08 1.14 1.19 1.24 1.29 1.33 1.37 8 细化 1.93 2.05 2.15 2.28 2.38 2.49 2.57 2.65 2.75 16 表 5 实验区1局部聚类结果对比
Table 5 Comparison of Local Clustering Results in Experiment Area 1
聚类算法 局部研究区域 A B C D 本文算法 k-Means++算法 MST算法 DBSCAN算法 表 6 实验区2的局部聚类结果对比
Table 6 Comparison of Local Clustering Results in Experiment Area 2
聚类算法 局部研究区域 A B C D E 本文算法 k-Means++算法 MST算法 DBSCAN算法 -
[1] 朱杰, 孙毅中, 陈律余, 等.顾及属性空间分布不均的空间聚类方法——以城市商业中心的提取为例[J].武汉大学学报·信息科学版, 2017, 42(12): 1 696-1 702 doi: 10.13203/j.whugis20150590 [1] Zhu Jie, Sun Yizhong, Chen Lvyu, et al. A Spatial Clustering Method Based on Uneven Distribution of Non-spatial Attributes—Identifying City Commercial Center[J].Geomatics and Information Science of Wuhan University, 2017, 42(12):1 696-1 702 doi: 10.13203/j.whugis20150590
[2] Estivillcastro V, Lee I. Multi-level Clustering and Its Visualization for Exploratory Spatial Analysis[M]. Norwell:Kluwer Academic Publishers, 2002
[3] 焦利民, 刘耀林, 刘艳芳.区域城镇基准地价水平的空间自相关格局分析[J].武汉大学学报·信息科学版, 2009, 34(7):873-877 http://ch.whu.edu.cn/article/id/1299 Jiao Limin, Liu Yaolin, Liu Yanfang. Spatial Autocorrelation Pattern of Datum Land Prices of Cities in a Region [J].Geomatics and Information Science of Wuhan University, 2009, 34(7):873-877 http://ch.whu.edu.cn/article/id/1299
[4] Pei T, Jasra A, Hand D J, et al. DECODE: A New Method for Discovering Clusters of Different Densities in Spatial Data[J].Data Mining and Knowledge Discovery, 2009, 18(3):337-369 doi: 10.1007/s10618-008-0120-3
[5] 贾奋励, 游雄, 刘芳.多尺度表达的地图设计模型[J].测绘科学技术学报, 2011, 28(2):153-156 doi: 10.3969/j.issn.1673-6338.2011.02.019 Jia Fenli, You Xiong, Liu Fang.Map Design Model for Multi-scale Representation[J].Journal of Geomatics Science and Technology, 2011, 28(2):153-156 doi: 10.3969/j.issn.1673-6338.2011.02.019
[6] Wang W, Du S, Guo Z, et al. Polygonal Clustering Analysis Using Multilevel Graph‐Partition[J].Transactions in GIS, 2015, 19(5):716-736 doi: 10.1111/tgis.12124
[7] Guo D, Wang H. Automatic Region Building for Spatial Analysis[J].Transactions in GIS, 2011, 15(s1):29-45
[8] Wang S, Eick C F. A Polygon-Based Clustering and Analysis Framework for Mining Spatial Datasets[M]. Norwell:Kluwer Academic Publishers, 2014
[9] 余莉.克服双重约束的面目标位置聚类方法[J].测绘学报, 2016, 45(10):1 250-1 259 doi: 10.11947/j.AGCS.2016.20150491 Yu Li. Position Clustering for Polygon Object Under Dual-constrains[J].Acta Geodaetica et Cartographica Sinica , 2016, 45(10): 1 250-1 259 doi: 10.11947/j.AGCS.2016.20150491
[10] Chu S C, Roddick J F, Pan J S. Novel Multi-centroid, Multi-run Sampling Schemes for K-Mediods-Based Algorithms[M].Berlin, Heidelberg:Springer Berlin, 2004
[11] 程绵绵, 孙群, 李少梅, 等.顾及密度对比的多层次聚类点群选取方法[J].武汉大学学报·信息科学版, 2019, 44(8): 1 131-1 137 doi: 10.13203/j.whugis20180043 Cheng Mianmian, Sun Qun, Li Shaomei, et al.A Point Group Selecting Method Using Multi-level Clustering Considering Density Comparison[J].Geomatics and Information Science of Wuhan University , 2019, 44(8): 1 131-1137 doi: 10.13203/j.whugis20180043
[12] 林恒, 龚威, 史硕, 等.利用等边长正交格网进行层次聚合聚类[J].武汉大学学报·信息科学版, 2018, 43(5): 786-791 doi: 10.13203/j.whugis20150668 Lin Heng, Gong Wei, Shi Shuo, et al.Hierarchical Agglomerative Clustering Algorithm with Equilateral Orthogonal Grid[J].Geomatics and Information Science of Wuhan University, 2018, 43(5): 786-791 doi: 10.13203/j.whugis20150668
[13] Ester M, Kriegel H P, Xu X. Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications[J].Data Mining and Knowledge Discovery, 1998, 2(2):169-194 doi: 10.1023/A:1009745219419
[14] 职露, 余旭初, 李光强, 等.滚圆法用于空间点聚类的研究[J].武汉大学学报·信息科学版, 2018, 43(8): 1 193-1 198 doi: 10.13203/j.whugis20160287 Zhi Lu, Yu Xuchu, Li Guangqiang, et al. Spatial Point Clustering Analysis Based on the Rolling Circle[J].Geomatics and Information Science of Wuhan University, 2018, 43(8): 1 193-1 198 doi: 10.13203/j.whugis20160287
[15] Gholamhosein S, Surojit C, Zhang A D. WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases[C]. The 24th International Conference on Very Large Data Bases (VLDB'98), San Francisco, CA, USA, 1998
[16] 邓敏, 刘启亮, 李光强, 等.一种基于似最小生成树的空间聚类算法[J].武汉大学学报·信息科学版, 2010, 35(11): 1 360-1 364 http://ch.whu.edu.cn/article/id/1115 Deng Min, Liu Qiliang, Li Guangqiang, et al. A Spatial Clustering Algorithm Based on Minimum Spanning Tree-like[J].Geomatics and Information Science of Wuhan University, 2010, 35(11): 1 360-1 364 http://ch.whu.edu.cn/article/id/1115
[17] Caruso G, Hilal M, Thomas I. Measuring Urban Forms from Inter-building Distances: Combining MST Graphs with a Local Index of Spatial Association[J].Landscape and Urban Planning, 2017, 16(3):80-89 https://www.sciencedirect.com/science/article/abs/pii/S0169204617300518
[18] Yan X. A Graph Convolutional Neural Network for Classification of Building Patterns Using Spatial Vector Data[J].ISPRS Journal of Photogrammetry and Remote Sensing, 2019, 150:259-273 doi: 10.1016/j.isprsjprs.2019.02.010
[19] Karypis G, Kumar V. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs[J].SIAM Journal on Scientific Computing, 1998, 20(1):359-370 doi: 10.1137/S1064827595287997
[20] Kernighan B W, Lin S. An Efficient Heuristic Procedure for Partitioning Graphs[J].Bell System Technical Journal, 1970, 49(2):291-307 doi: 10.1002/j.1538-7305.1970.tb01770.x
[21] Robnik-Šikonja M, Kononenko I. Theoretical and Empirical Analysis of ReliefF and RReliefF[J].Machine Learning, 2003, 53(1): 23-69
[22] Chen Z, Ma X, Wu L, et al. An Intuitionistic Fuzzy Similarity Approach for Clustering Analysis of Polygons[J].ISPRS International Journal of Geo-Information, 2019, 8(2): 98-106 doi: 10.3390/ijgi8020098
[23] 于剑, 程乾生.模糊聚类方法中的最佳聚类数的搜索范围[J].中国科学, 2002, 32(2):274-280 https://www.cnki.com.cn/Article/CJFDTOTAL-JEXK200202014.htm Yu Jian, Cheng Qiansheng. The Upper Bound of the Optimal Number of Clusters in Fuzzy Clustering[J].Science in China, 2002, 32(2):274-280 https://www.cnki.com.cn/Article/CJFDTOTAL-JEXK200202014.htm
[24] Fisher J. Visualizing the Connection Among Convex Hull, Voronoi Diagram and Delaunay Triangulation[C]. Midwest Instruction and Computing Symposium, Lawrence, Canada, 2004