陈勉, 李龙海, 谢鹏, 付少锋, 何列松, 周校东. 基于Phoenix的地理空间大数据管理系统[J]. 武汉大学学报 ( 信息科学版), 2020, 45(5): 719-727. DOI: 10.13203/j.whugis20180435
引用本文: 陈勉, 李龙海, 谢鹏, 付少锋, 何列松, 周校东. 基于Phoenix的地理空间大数据管理系统[J]. 武汉大学学报 ( 信息科学版), 2020, 45(5): 719-727. DOI: 10.13203/j.whugis20180435
CHEN Mian, LI Longhai, XIE Peng, FU Shaofeng, HE Liesong, ZHOU Xiaodong. A Data Management System for Big Geospatial Data Based on Phoenix[J]. Geomatics and Information Science of Wuhan University, 2020, 45(5): 719-727. DOI: 10.13203/j.whugis20180435
Citation: CHEN Mian, LI Longhai, XIE Peng, FU Shaofeng, HE Liesong, ZHOU Xiaodong. A Data Management System for Big Geospatial Data Based on Phoenix[J]. Geomatics and Information Science of Wuhan University, 2020, 45(5): 719-727. DOI: 10.13203/j.whugis20180435

基于Phoenix的地理空间大数据管理系统

A Data Management System for Big Geospatial Data Based on Phoenix

  • 摘要: NoSQL数据库HBase已被众多应用系统作为存储和管理海量数据的解决方案,但HBase并未提供对地理空间数据的直接支持,因此提出了名为GS-Phoenix的地理空间大数据管理系统,GS-Phoenix构建在开源项目Phoenix和HBase之上。在插入空间数据时,GS-Phoenix自动以主键索引或二次索引方式生成基于空间填充曲线的空间索引。利用该空间索引,GS-Phoenix实现了矩形范围查询、不规则范围查询和k近邻(k nearest neighbors,kNN)查询等复杂空间查询所需的基本操作。GS-Phoenix利用用户自定义函数机制和服务器端排序机制将空间查询中的主要运算任务放置在服务器端,有效降低了客户端的计算负担。此外,GS-Phoenix还设计了基于数据空间分布统计的查询优化方法,进一步提高了空间查询效率。实验表明,GS-Phoenix能够在小规模的集群上实现17万/s左右的数据插入速率,常用的空间范围查询和kNN查询都可以在几百毫秒内完成,因此GS-Phoenix能够适用于各类具有高数据吞吐和实时空间查询需求的位置相关应用系统。

     

    Abstract: HBase as a NoSQL database has been adopted as a solution for storing and managing huge datasets in many applications. However, it does not provide direct support for storing spatial data. In view of this, we present a data management system called GS-Phoenix for big geospatial data. GS-Phoenix builds on two open-source projects, Phoenix and HBase. While geospatial data being inserted into GS-Phoenix, it automatically generates a spatial index based on space filling curve in the form of primary keys of data table or a secondary index. By taking advantage of the spatial index, GS-Phoenix achieves several basic spatial query operations including rectangular range query, non-regular area query and k nearest neighbor (kNN) query which are all essential primitives for realizing complex spatial queries. GS-Phoenix employs the user-defined functions and server-side sorting mechanisms to impose most spatial filtering tasks on the server side in query processing, effectively reducing the computing burden of the client. GS-Phoenix also leverages a query optimization method based on spatial distribution statistics, which further improves the efficiency of spatial query. Experimental results show that GS-Phoenix deployed over a small scale cluster can sustain an I/O throughput of over 170 000 data insertions per second, while serving spatial range queries and kNN queries with response times as low as hundreds of milliseconds. The experiments demonstrate that GS-Phoenix is applicable to a wide spectrum of geospatial position related applications which demand high insertion throughput and real time spatial queries.

     

/

返回文章
返回