图划分支持下的大规模点要素并行缓冲分析方法

亢晓琛; 刘纪平

doi:10.13203/j.whugis20210011

图划分支持下的大规模点要素并行缓冲分析方法

亢晓琛^1,,
刘纪平^{1, 2}

1.
中国测绘科学研究院, 北京, 100036
2.
河南省科学院地理研究所, 河南郑州, 450052

基金项目:

国家自然科学基金 41701461

中国测绘科学研究院基本科研业务费项目 AR2001

详细信息

作者简介:
亢晓琛，博士，副研究员，主要从事高性能地理计算和地理国情监测理论与方法研究。kangxc@casm.ac.cn

中图分类号: P208
计量
- 文章访问数: 306
- HTML全文浏览量: 78
- PDF下载量: 66
出版历程
- 收稿日期: 2022-12-11
- 网络出版日期: 2023-06-11
- 发布日期: 2023-06-04

Parallel Buffer Analysis of Large Scale Point Features Based on Graph Partitioning

KANG Xiaochen^1,,
LIU Jiping^{1, 2}

1.
Chinese Academy of Surveying and Mapping, Beijing 100036, China
2.
Institute of Geographical Sciences, Henan Academy of Sciences, Zhengzhou 450052, China

摘要

摘要: 缓冲分析是解决邻近度问题的基础工具，由于算法本身包含大量的复杂运算，处理效率亟待优化。针对大规模点要素的缓冲分析，引入图表达建立了面向数据和分析过程的空间计算域，通过图划分实现了任务的均衡分割。图式化的空间计算域首先从图节点和图边两个角度定义了点要素及其空间关系的处理函数，然后对相应的时间复杂度进行拟合，获取了图节点和图边的计算权重，最后利用图划分方法实现了缓冲分析的均衡分割，从而构建与计算资源相匹配的并行任务。实验结果表明，基于图划分实现的并行缓冲分析方法在负载均衡性和整体性能方面优于主流的四叉树和规则格网划分方法，可为大规模矢量数据的空间分析优化提供参考。
- 矢量数据 /
- 缓冲分析 /
- 空间计算域 /
- 图划分
Abstract:
Objectives Buffer analysis is a common tool of spatial analysis, which deals with the problem of proximity. Due to numerous and complex operations in the algorithm, the computational efficiency needs to be optimized.
Methods To process large scale point features, a graph-based representation model is proposed, which establishes the spatial computational domain for data and analysis, and develops a well-balanced task-partitioning method by partitioning the graph. First, the proposed model defines processing functions of point features and their spatial relationships from the perspectives of graph nodes and graph edges, and provides a logic description for buffer zone generation around point features. Second, the computational weights of graph nodes and graph edges are obtained by fitting the time complexity of the above processing functions. Finally, graph partitioning is adopted to divide the buffer task, which contributes to multiple parallel tasks matching with the computational resources.
Results The experimental results show that graph-based buffer analysis can achieve better load balance and overall efficiency, which is superior to the mainstream partitioning methods, regular-grid and quadtree.
Conclusions The proposed method can provide a reference for optimization of spatial analysis methods when processing large scale vector data.
- vector data /
- buffer analysis /
- spatial computational domain /
- graph partitioning

HTML全文

图 1 点要素距离依赖关系搜索

Figure 1. Searching for Distance Dependent Relationships of Point Features

下载: 全尺寸图片幻灯片

图 2 BGR实例化流程

Figure 2. Procedure for BGR Instantiation

下载: 全尺寸图片幻灯片

图 3 Metis加权图存储格式

Figure 3. Storage Format for Metis Weighted Graph

下载: 全尺寸图片幻灯片

图 4 规则格网划分与多级划分对比

Figure 4. Comparison of Regular-grid and Multi-level Partitioning

下载: 全尺寸图片幻灯片

图 5 基于BGR-Log的缓冲融合区并行生成

Figure 5. Parallel Dissolved Buffer Zone Generation Based on BGR-Log

下载: 全尺寸图片幻灯片

图 6 等间距点要素缓冲区融合的拟合函数时间

Figure 6. Fitting Function Time of Dissolved Buffer Zone Generation with Equidistant Point Features

下载: 全尺寸图片幻灯片

图 7 不同缓冲半径下3类空间划分方法时间对比

Figure 7. Partitioning Time Comparison of Three Methods with Different Radii

下载: 全尺寸图片幻灯片

图 8 不同划分方法局部效果图

Figure 8. Local Effects of Different Partitioning Methods

下载: 全尺寸图片幻灯片

图 9 空间划分均衡性与割边处理时间

Figure 9. Balance of Spatial Partitioning and Cut-Edge Processing Time

下载: 全尺寸图片幻灯片

表 1 多种半径下串行BGR与并行BGR计算时间

Table 1 Computation Time of Serial BGR and Parallel BGR with Different Radii

序号	缓冲半径/m	结果多边形数量	融合比例/%	串行时间/s	并行时间/s	加速比
1	50	672 151	1.29	106.47	81.65	1.30
2	100	660 333	3.03	109.74	92.01	1.19
3	150	638 495	6.23	116.10	103.21	1.12
4	200	610 145	10.40	127.32	112.67	1.13
5	250	579 401	14.91	140.21	94.98	1.48
6	300	548 306	19.48	152.51	98.84	1.54
7	350	516 911	24.09	165.01	102.51	1.61
8	400	484 913	28.79	179.45	112.78	1.59
9	450	452 125	33.60	194.92	118.52	1.64
10	500	419 060	38.46	209.65	120.52	1.74
11	550	385 672	43.36	233.66	127.92	1.83
12	600	352 021	48.30	289.91	138.79	2.09
13	650	319 934	53.02	392.93	148.13	2.65
14	700	289 453	57.49	1 038.99	296.12	3.51
15	750	261 232	61.64	2 925.87	409.83	7.14
16	800	235 772	65.38	6 883.81	576.84	11.93
17	850	212 723	68.76	18 782.26	1 082.78	17.35
18	900	192 058	71.80	26 595.51	1 392.72	19.10
19	950	173 827	74.47	31 561.23	1 541.99	20.47
20	1 000	157 414	76.88	36 191.83	1 518.41	23.84

下载: 导出CSV

表 2 BGR高强度运算测试

Table 2 Intensive Computation Test of BGR

序号	缓冲半径/m	串行时间/s	并行时间/s	加速比
1	10 000	54 516.13	2 391.54	22.80
2	20 000	56 593.58	2 496.18	22.67
3	40 000	133 143.17	5 735.34	23.21

下载: 导出CSV

参考文献(31)

[1]	李清泉, 李德仁. 大数据GIS[J]. 武汉大学学报(信息科学版), 2014, 39(6): 641-644. https://www.cnki.com.cn/Article/CJFDTOTAL-WHCH201406005.htm Li Qingquan, Li Deren. Big Data GIS[J]. Geomatics and Information Science of Wuhan University, 2014, 39(6): 641-644. https://www.cnki.com.cn/Article/CJFDTOTAL-WHCH201406005.htm
[2]	刘纪平, 董春, 亢晓琛, 等. 大数据时代的地理国情统计分析[J]. 武汉大学学报(信息科学版), 2019, 44(1): 68-76. https://www.cnki.com.cn/Article/CJFDTOTAL-WHCH201901008.htm Liu Jiping, Dong Chun, Kang Xiaochen, et al. National Geographical Conditions Statistical Analysis in the Era of Big Data[J]. Geomatics and Information Science of Wuhan University, 2019, 44(1): 68-76. https://www.cnki.com.cn/Article/CJFDTOTAL-WHCH201901008.htm
[3]	赵春宇. 高性能并行GIS中矢量空间数据存取与处理关键技术研究[D]. 武汉: 武汉大学, 2006. Zhao Chunyu. Studying on the Technolgies of Storage and Processing of Spatial Vector Data in High-performance Parallel GIS[D]. Wuhan: Wuhan University, 2006.
[4]	Nekola J C, White P S. The Distance Decay of Similarity in Biogeography and Ecology[J]. Journal of Biogeography, 1999, 26(4): 867-878. doi: 10.1046/j.1365-2699.1999.00305.x
[5]	Egenhofer M J. A Formal Definition of Binary Topological Relationships[M]//Foundations of Data Organization and Algorithms. Berlin, Heidelberg: Springer, 1989: 457-472.
[6]	郭仁忠, 陈业滨, 赵志刚, 等. ICT时代地图的科学概念及表达框架[J]. 武汉大学学报(信息科学版), 2022, 47(12): 1978-1987. https://www.cnki.com.cn/Article/CJFDTOTAL-WHCH202212002.htm Guo Renzhong, Chen Yebin, Zhao Zhigang, et al. Scientific Concept and Representation Framework of Maps in the ICT Era[J]. Geomatics and Information Science of Wuhan University, 2022, 47(12): 1978-1987. https://www.cnki.com.cn/Article/CJFDTOTAL-WHCH202212002.htm
[7]	黄良珂, 李琛, 谢劭峰, 等. 顾及垂直递减率的中国区域Tm格点产品空间插值[J]. 武汉大学学报(信息科学版), 2023, 48(2): 295-300. https://www.cnki.com.cn/Article/CJFDTOTAL-WHCH202302013.htm Huang Liangke, Li Chen, Xie Shaofeng, et al. Spatial Interpolation of Atmospheric Weighted Mean Temperature Grid Products in China with Conside-ration of Vertical Lapse Rate[J]. Geomatics and Information Science of Wuhan University, 2023, 48(2): 295-300. https://www.cnki.com.cn/Article/CJFDTOTAL-WHCH202302013.htm
[8]	应申, 靳凤攒, 李霖, 等. 基于ArcGIS Engine的矢量数据分层分块技术研究[J]. 测绘地理信息, 2014, 39(6): 50-53. https://www.cnki.com.cn/Article/CJFDTOTAL-DBCH202305057.htm Ying Shen, Jin Fengzan, Li Lin, et al. Hierarchical Block of Vector Data Based on ArcGIS Engine[J]. Journal of Geomatics, 2014, 39(6): 50-53. https://www.cnki.com.cn/Article/CJFDTOTAL-DBCH202305057.htm
[9]	Longley P A, Tobón C. Spatial Dependence and Heterogeneity in Patterns of Hardship: An Intra-urban Analysis[J]. Annals of the Association of American Geographers, 2004, 94(3): 503-519. doi: 10.1111/j.1467-8306.2004.00411.x
[10]	Rodríguez A M, Egenhofer M J. Comparing Geospatial Entity Classes: An Asymmetric and Context-dependent Similarity Measure[J]. International Journal of Geographical Information Science, 2004, 18(3): 229-256. doi: 10.1080/13658810310001629592
[11]	Yang C W, Wu H Y, Huang Q Y, et al. Using Spatial Principles to Optimize Distributed Computing for Enabling the Physical Science Discoveries[J]. Proceedings of the National Academy of Sciences of the United States of America, 2011, 108(14): 5498-5503. doi: 10.1073/pnas.0909315108
[12]	Werner M. Parallel Processing Strategies for Big Geospatial Data[J]. Frontiers in Big Data, 2019, 2: 44. doi: 10.3389/fdata.2019.00044
[13]	Wang S W. A CyberGIS Framework for the Synthesis of Cyberinfrastructure, GIS, and Spatial Analysis[J]. Annals of the Association of American Geographers, 2010, 100(3): 535-557. doi: 10.1080/00045601003791243
[14]	Zhang J T, You S M. CudaGIS: Report on the Design and Realization of a Massive Data Parallel GIS on GPUs[C]//The 3rd ACM SIGSPATIAL International Workshop on GeoStreaming, Redondo Beach, USA, 2012.
[15]	Aji A, Wang F S, Vo H, et al. HadoopGIS: A High Performance Spatial Data Warehousing System over MapReduce[J]. Proceedings of the VLDB Endowment International Conference on Very Large Data Bases, 2013, 6(11): 1009.
[16]	Kang X C, Liu J P, Lin X G. Streaming Progressive TIN Densification Filter for Airborne LiDAR Point Clouds Using Multi-core Architectures[J]. Remote Sensing, 2014, 6(8): 7212-7232.
[17]	郭明强, 谢忠, 黄颖. 集群并发环境下大规模矢量数据负载均衡算法[J]. 武汉大学学报(信息科学版), 2013, 38(9): 1131-1134. https://www.cnki.com.cn/Article/CJFDTOTAL-WHCH201309027.htm Guo Mingqiang, Xie Zhong, Huang Ying. Content Grid Load Balancing Algorithm for Large-scale Vector Data in the Server Cluster Concurrent Environment[J]. Geomatics and Information Science of Wuhan University, 2013, 38(9): 1131-1134. https://www.cnki.com.cn/Article/CJFDTOTAL-WHCH201309027.htm
[18]	Shen J X, Chen L, Wu Y, et al. Approach to Acce-lerating Dissolved Vector Buffer Generation in Distributed In-memory Cluster Architecture[J]. ISPRS International Journal of Geo-Information, 2018, 7(1): 26.
[19]	Fan J F, Ji M, Gu G M, et al. Optimization Approaches to Mpi and Area Merging-based Parallel Buffer Algorithm[J]. Boletim De Ciências Geodésicas, 2014, 20(2): 237-256.
[20]	Wu Y, Ma M Y, Chen L. HiVewshed: An Interactive Online Viewshed Analysis System for Multiple Observers[C]//The 3rd International Conference on Computer Science and Application Engineering, Sanya, China, 2019.
[21]	Guo M Q, Han C D, Guan Q F, et al. A Universal Parallel Scheduling Approach to Polyline and Polygon Vector Data Buffer Analysis on Conventional GIS Platforms[J]. Transactions in GIS, 2020, 24(6): 1630-1654.
[22]	Hennessy J L, Patterson D A, Arpaci-Dusseau A C. A Quantitative Approach[M]. Boston: Morgan Kaufmann, 2011.
[23]	Bernstein A J. Analysis of Programs for Parallel Processing[J]. IEEE Transactions on Electronic Computers, 1966, EC-15(5): 757-763.
[24]	Wang S W, Armstrong M P. A Theoretical Approach to the Use of Cyberinfrastructure in Geographical Analysis[J]. International Journal of Geographical Information Science, 2009, 23(2): 169-193.
[25]	Lipton R J, Tarjan R E. A Separator Theorem for Planar Graphs[J]. SIAM Journal on Applied Mathematics, 1979, 36(2): 177-189.
[26]	Skiena S S. The Algorithm Design Manual[M]. London: Springer, 2008.
[27]	许宝刚. 图的划分: 一些进展与未解决问题[J]. 数学进展, 2016, 45(1): 1-20. https://www.cnki.com.cn/Article/CJFDTOTAL-SXJZ201601001.htm Xu Baogang. Graph Partitions: Recent Progresses and some Open Problems[J]. Advances in Mathematics, 2016, 45(1): 1-20. https://www.cnki.com.cn/Article/CJFDTOTAL-SXJZ201601001.htm
[28]	Wieling M, Nerbonne J. Bipartite Spectral Graph Partitioning for Clustering Dialect Varieties and Detecting their Linguistic Features[J]. Computer Speech & Language, 2011, 25(3): 700-715.
[29]	Plimpton S J, Moore S G, Borner A, et al. Direct Simulation Monte Carlo on Petaflop Supercomputers and Beyond[J]. Physics of Fluids, 2019, 31(8): 086101.
[30]	Stanton I, Kliot G. Streaming Graph Partitioning for Large Distributed Graphs[C]// The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, USA, 2012.
[31]	王鑫, 陈蔚雪, 杨雅君, 等. 知识图谱划分算法研究综述[J]. 计算机学报, 2021, 44(1): 235-260. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJX202101012.htm Wang Xin, Chen Weixue, Yang Yajun, et al. Research on Knowledge Graph Partitioning Algorithms: A Survey[J]. Chinese Journal of Computers, 2021, 44(1): 235-260. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJX202101012.htm

施引文献(7)

期刊类型引用(4)

1.	朱杰，郑加柱，陈红华，杨静，胡平昌，陆敏燕. 结合POI数据的南京市商业中心识别与集聚特征研究. 现代测绘. 2022(06): 34-39 . 百度学术
2.	金澄，安晓亚，陈占龙，马啸川. 矢量居民地多边形多级图划分聚类方法. 武汉大学学报(信息科学版). 2021(01): 19-29 . 百度学术
3.	张铭龙，何贞铭. 基于因子分析法的城市商业中心抽取研究. 地理空间信息. 2021(08): 58-60+64+5 . 百度学术
4.	李卫东，张铭龙，段金龙. 基于POI数据的南京市空间格局定量研究. 世界地理研究. 2020(02): 317-326 . 百度学术