CHEN Mian, LI Longhai, XIE Peng, FU Shaofeng, HE Liesong, ZHOU Xiaodong. A Data Management System for Big Geospatial Data Based on Phoenix[J]. Geomatics and Information Science of Wuhan University, 2020, 45(5): 719-727. DOI: 10.13203/j.whugis20180435
Citation: CHEN Mian, LI Longhai, XIE Peng, FU Shaofeng, HE Liesong, ZHOU Xiaodong. A Data Management System for Big Geospatial Data Based on Phoenix[J]. Geomatics and Information Science of Wuhan University, 2020, 45(5): 719-727. DOI: 10.13203/j.whugis20180435

A Data Management System for Big Geospatial Data Based on Phoenix

Funds: 

The Open Fund of State Key Laboratory of Geo-information Engineering SKLGIE2014-M-4-1

the National Natural Science Foundation of China 41301527

More Information
  • Author Bio:

    CHEN Mian, master, lecturer, specializes in distributed computing and mobile intelligent computing.chen_mian@mail.xidian.edu.cn

  • Corresponding author:

    LI Longhai, PhD, associate professor.lhli@xidian.edu.cn

  • Received Date: March 21, 2019
  • Published Date: May 04, 2020
  • HBase as a NoSQL database has been adopted as a solution for storing and managing huge datasets in many applications. However, it does not provide direct support for storing spatial data. In view of this, we present a data management system called GS-Phoenix for big geospatial data. GS-Phoenix builds on two open-source projects, Phoenix and HBase. While geospatial data being inserted into GS-Phoenix, it automatically generates a spatial index based on space filling curve in the form of primary keys of data table or a secondary index. By taking advantage of the spatial index, GS-Phoenix achieves several basic spatial query operations including rectangular range query, non-regular area query and k nearest neighbor (kNN) query which are all essential primitives for realizing complex spatial queries. GS-Phoenix employs the user-defined functions and server-side sorting mechanisms to impose most spatial filtering tasks on the server side in query processing, effectively reducing the computing burden of the client. GS-Phoenix also leverages a query optimization method based on spatial distribution statistics, which further improves the efficiency of spatial query. Experimental results show that GS-Phoenix deployed over a small scale cluster can sustain an I/O throughput of over 170 000 data insertions per second, while serving spatial range queries and kNN queries with response times as low as hundreds of milliseconds. The experiments demonstrate that GS-Phoenix is applicable to a wide spectrum of geospatial position related applications which demand high insertion throughput and real time spatial queries.
  • [1]
    张晓祥.大数据时代的空间分析[J].武汉大学学报·信息科学版, 2014, 39(6): 655-659 http://ch.whu.edu.cn/CN/abstract/abstract3010.shtml

    Zhang Xiaoxiang. Spatial Analysis in the Era of Big Data[J]. Geomatics and Information Science of Wuhan University, 2014, 39(6): 655-659 http://ch.whu.edu.cn/CN/abstract/abstract3010.shtml
    [2]
    The Apache Software Foundation. Apache Hadoop Documentation[EB/OL]. http://hadoop.apache.org/, 2019
    [3]
    The Apache Software Foundation. Apache HBase Reference Guide[EB/OL]. https://hbase.apache.org/, 2019
    [4]
    李绍俊, 杨海军, 黄耀欢, 等.基于NoSQL数据库的空间大数据分布式存储策略[J].武汉大学学报·信息科学版, 2017, 42(2): 163-169 http://ch.whu.edu.cn/CN/abstract/abstract5656.shtml

    Li Shaojun, Yang Haijun, Huang Yaohuan, et al. Geo-spatial Big Data Storage Based on NoSQL Database[J]. Geomatics and Information Science of Wuhan University, 2017, 42(2): 163-169 http://ch.whu.edu.cn/CN/abstract/abstract5656.shtml
    [5]
    Lee K, Ganti R, Srivatsa M, et al. Efficient Spatial Query Processing for Big Data[C]. The 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL 2014), Dallas, Texas, USA, 2014
    [6]
    Fox A, Eichelberger C, Hughes J, et al. Spatio-Temporal Indexing in Non-relational Distributed Databases[C]. IEEE International Conference on Big Data, Santa Clara, CA, USA, 2013
    [7]
    向隆刚, 王德浩, 龚健雅.大规模轨迹数据的Geohash编码组织及高效范围查询[J].武汉大学学报·信息科学版, 2017, 42(1): 21-27 http://ch.whu.edu.cn/CN/abstract/abstract5630.shtml

    Xiang Longgang, Wang Dehao, Gong Jianya. Organization and Efficient Range Query of Large Trajectory Data Based on Geohash[J]. Geomatics and Information Science of Wuhan University, 2017, 42(1): 21-27 http://ch.whu.edu.cn/CN/abstract/abstract5630.shtml
    [8]
    Nishimura S, Das S, Agrawal D, et al. MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware Services[C]. The 12th IEEE International Conference on Mobile Data Management, Luleå, Sweden, 2011
    [9]
    Van Le H. Distributed Moving Objects Database Based on Key-Value Stores[C]. VLDB 2016 PhD Workshop, New Delhi, India, 2016
    [10]
    The Apache Software Foundation. Apache Phoenix Overview[EB/OL]. https://phoenix.apache.org/, 2019
    [11]
    Gaede V, Günther O. Multidimensional Access Methods[J]. ACM Computing Surveys, 1998, 30(2): 170-231 doi: 10.1145/280277.280279
    [12]
    Wikimedia Foundation Inc. Hilbert Curve [EB/OL]. https://en.wikipedia.org/wiki/Hilbert_curve, 2019
    [13]
    GeoTools. GeoTools—The Open Source Java GIS Toolkit[CP/OL]. http://www.geotools.org/, 2019
    [14]
    Axtell R L. Zipf Distribution of US Firm Sizes[J]. Science, 2001, 293(5536):1818-1820 doi: 10.1126/science.1062081

Catalog

    Article views (1716) PDF downloads (107) Cited by()
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return