云环境下海量空间矢量数据并行划分算法

Parallel Algorithm for Partitioning Massive Spatial Vector Data in Cloud Environment

  • 摘要: 空间数据划分是空间大数据索引方法及其数据存储的重要组成部分。针对Hadoop云计算平台在空间数据划分及其存储方面的不足,提出了基于Hilbert空间填充曲线的海量空间矢量数据并行划分算法。在数据划分阶段,充分考虑空间数据相邻对象的空间位置关系、空间对象的自身大小以及相同编码块的空间对象个数等影响因素;通过“合并小编码块,分解大编码块”的划分原则,实现了云环境下海量空间矢量数据的并行划分算法。试验表明,该算法不仅能够提高海量空间矢量数据的索引效率,同时也能够很好地解决空间矢量数据在Hadoop分布式文件系统(Hadoop distributed file system,HDFS)上的数据倾斜问题。

     

    Abstract: Spatial data partitioning plays an important role in the spatial index methods and the data storage strategy for spatial big data. In this paper, to make up the inherent shortcomings of spatial data partitioning and data storage in the Hadoop cloud computing platform, a parallel algorithm based on Hilbert space-filling curve is presented for partitioning the massive spatial vector data. In the spatial vector data partitioning phase, we take more influence factors, including the spatial location relationship between adjacent objects, the size of spatial vector object itself, the number of spatial objects in the same spatial coded block and others, into full consideration. Meanwhile, by following the partitioning principle of merging small coded blocks and sub-splitting large coded blocks, this paper implements the parallel algorithm for partitioning the massive spatial vector data in cloud environment. Experimental results show that the algorithm proposed in this paper can not only improve the efficiency of the spatial R-tree index for massive spatial vector data, but also give a good data balance in Hadoop distributed file system (HDFS).

     

/

返回文章
返回