Abstract:
HBase as a NoSQL database has been adopted as a solution for storing and managing huge datasets in many applications. However, it does not provide direct support for storing spatial data. In view of this, we present a data management system called GS-Phoenix for big geospatial data. GS-Phoenix builds on two open-source projects, Phoenix and HBase. While geospatial data being inserted into GS-Phoenix, it automatically generates a spatial index based on space filling curve in the form of primary keys of data table or a secondary index. By taking advantage of the spatial index, GS-Phoenix achieves several basic spatial query operations including rectangular range query, non-regular area query and
k nearest neighbor (
kNN) query which are all essential primitives for realizing complex spatial queries. GS-Phoenix employs the user-defined functions and server-side sorting mechanisms to impose most spatial filtering tasks on the server side in query processing, effectively reducing the computing burden of the client. GS-Phoenix also leverages a query optimization method based on spatial distribution statistics, which further improves the efficiency of spatial query. Experimental results show that GS-Phoenix deployed over a small scale cluster can sustain an I/O throughput of over 170 000 data insertions per second, while serving spatial range queries and
kNN queries with response times as low as hundreds of milliseconds. The experiments demonstrate that GS-Phoenix is applicable to a wide spectrum of geospatial position related applications which demand high insertion throughput and real time spatial queries.