Abstract:
Spatial data partitioning plays an important role in the spatial index methods and the data storage strategy for spatial big data. In this paper, to make up the inherent shortcomings of spatial data partitioning and data storage in the Hadoop cloud computing platform, a parallel algorithm based on Hilbert space-filling curve is presented for partitioning the massive spatial vector data. In the spatial vector data partitioning phase, we take more influence factors, including the spatial location relationship between adjacent objects, the size of spatial vector object itself, the number of spatial objects in the same spatial coded block and others, into full consideration. Meanwhile, by following the partitioning principle of merging small coded blocks and sub-splitting large coded blocks, this paper implements the parallel algorithm for partitioning the massive spatial vector data in cloud environment. Experimental results show that the algorithm proposed in this paper can not only improve the efficiency of the spatial R-tree index for massive spatial vector data, but also give a good data balance in Hadoop distributed file system (HDFS).