Geohash-Trees:一种用于组织大规模轨迹的自适应索引

Geohash-Trees: An Adaptive Index Which can Organize Large-Scale Trajectories

  • 摘要: 蕴含着挖掘价值的轨迹数据分布在世界各地,且规模庞大。如何在全球范围内组织轨迹数据并支持高效范围查询成为难题。一种自适应索引组织框架被提出来管理查询全球范围大规模轨迹数据集,其基本思想为:针对不同轨迹数据集,根据Geohash编码,生成层数最深的Geohash格网覆盖住整个轨迹数据集范围;以格网作为根节点,生成Geohash-Trees;为了加快查询定位到对应索引,根据编码前缀相同的特点设计了字典查询树。Geohash-Trees是一种基于格网划分的空间索引,它能够根据轨迹密度自适应使用多种剖分策略划分空间,提高范围查询效率。为了支持索引动态更新,设计了增量插入和更新算法。同时,该索引被移植到商用数据库Oracle中,利用数据库性能高效管理查询轨迹数据。实验结果表明,该方法在范围查询以及占用空间等方面明显优于Oracle内置的R树索引。

     

    Abstract: Trajectory data which contains mining valve is widely distributed and large-scale. How to organize trajectory data and retrieve trajectory data efficiently becomes very difficult to solve. We present a framework of adaptive index based on Geohash to organize the worldwide and large-scale trajectory data set. Different trajectory data sets will be covered by Geohash grid which is deepest, and then we take the grid as root node to generate adaptive Geohash-Trees. In order to quickly locate the corresponding index, we design trie on the basis of the feature of Geohash. Adaptive Geohash-Trees is a spatial index based on grid. It can divide the space according to the track density by adopting a variety of strategies which improves the efficiency of range query. Meanwhile, we design the algorithm of incremental insertion and update for the supporting of real-time update of trajectory data. Furthermore, this framwork has been migrated in Oracle. The experiment results verify that our approach in several aspects such as range query and occupied disk size performs much better than R-Trees.

     

/

返回文章
返回