SHEN Dannan, GUAN Xuefeng, HAN Linxu, XIANG Longgang, WU Huayi. VA-HBase: An Adaptive Distributed Management Scheme for Vector Data[J]. Geomatics and Information Science of Wuhan University, 2023, 48(12): 1999-2008. DOI: 10.13203/j.whugis20200417
Citation: SHEN Dannan, GUAN Xuefeng, HAN Linxu, XIANG Longgang, WU Huayi. VA-HBase: An Adaptive Distributed Management Scheme for Vector Data[J]. Geomatics and Information Science of Wuhan University, 2023, 48(12): 1999-2008. DOI: 10.13203/j.whugis20200417

VA-HBase: An Adaptive Distributed Management Scheme for Vector Data

  • Objectives With the rapid development of Earth observation networks, the size of the accumulated spatial data increases explosively. However, current distributed spatial data management systems focus on discrete point sets (e.g. point of interest) or point sequences (e.g. vehicle trajectory), but they cannot provide sufficient support for complex polyline or polygon objects. To address this problem, we propose a vector-oriented adaptive management method based on HBase, named VA-HBase.
    Methods In this method, a novel two-level spatial index is firstly designed for complex vector objects. The primary index adaptively finds an appropriate storage level for each vector object according to its spatial characteristics, and encodes this object independently with a customed Z-curve encoding schema. This encoding schema interleaves the spatial coordinates into a bit-sequence following the Z-curve, and encodes the derived sequence into a byte code with a proposed simplest byte conversion schema. The secondary index adopts the idea of fixed-level grid partitioning and computes intermediate statistics on storage levels for later efficient spatial query. A middle level is defined for grid generation according the level distribution of stored objects, and the minimum storage level of objects within each grid cell will be recorded. Second, with this two-level spatial index, an HBase storage schema is proposed which includes four tables: One meta-data table, one primary index table, one secondary index table and one raw object table. Finally, we design an efficient range query algorithm based on this method. Integrated with the adaptive-level primary index and the fixed-level secondary index, efficient parallel queries are implemented through HBase's filter mechanism.
    Results Experiments on three real datasets show that: (1) VA-HBase can achieve about 2⁃10 times higher query efficiency compared with GeoMesa and other related methods. (2) For complex polyline or polygon objects, the adaptive indexing of VA-HBase can quickly filter out duplicated or not within the scope of the query rectangle, and the false positive proportion is much lower than other related methods. (3) With the increase of the input data size from 7 GB to 300 GB, the query time cost is kept in about 200 ms and VA-HBase shows very good scalability. (4) Facilitated by the simplest byte encoding schema, the indexing storage space of various vector objects is efficiently compressed.
    Conclusions VA-HBase can well support the complex vector object management in the context of distributed environment, and can maintain efficient and stable query efficiency faced with large-volume datasets.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return