基于NoSQL数据库的空间大数据分布式存储策略

李绍俊, 杨海军, 黄耀欢, 周芹

李绍俊, 杨海军, 黄耀欢, 周芹. 基于NoSQL数据库的空间大数据分布式存储策略[J]. 武汉大学学报 ( 信息科学版), 2017, 42(2): 163-169. DOI: 10.13203/j.whugis20140774
引用本文: 李绍俊, 杨海军, 黄耀欢, 周芹. 基于NoSQL数据库的空间大数据分布式存储策略[J]. 武汉大学学报 ( 信息科学版), 2017, 42(2): 163-169. DOI: 10.13203/j.whugis20140774
LI Shaojun, YANG Haijun, HUANG Yaohuan, ZHOU Qin. Geo-spatial Big Data Storage Based on NoSQL Database[J]. Geomatics and Information Science of Wuhan University, 2017, 42(2): 163-169. DOI: 10.13203/j.whugis20140774
Citation: LI Shaojun, YANG Haijun, HUANG Yaohuan, ZHOU Qin. Geo-spatial Big Data Storage Based on NoSQL Database[J]. Geomatics and Information Science of Wuhan University, 2017, 42(2): 163-169. DOI: 10.13203/j.whugis20140774

基于NoSQL数据库的空间大数据分布式存储策略

基金项目: 

国家自然科学青-基金 No. 51309210

详细信息
    作者简介:

    李绍俊,博士生,主要从事地理信息系统软件研究。lishaojun@supermap.com

    通讯作者:

    杨海军,博士,高级工程师,主要从事生态环境遥感应用研究。yanghj@lreis.ac.cn

  • 中图分类号: P208

Geo-spatial Big Data Storage Based on NoSQL Database

Funds: 

The National Natural Science Foundation of China No. 51309210

More Information
  • 摘要: 基于关系型数据库的空间数据存储与处理是地理信息系统(geographic information system,GIS)领域的主流模式,但伴随着物联网、移动互联网、云计算及空间数据采集技术的发展,空间数据已从海量特征转变为大数据特征,对空间数据的存储和管理在数据量和处理模式上提出了新的挑战。首先分析了基于传统的集中式存储与管理模式在处理和应用大数据方面的局限性,包括存储对象的适应性、存储能力的可扩展性及高并发处理能力要求;然后在分析当前几大主流NoSQL数据库特点的基础上,指出了空间大数据基于NoSQL数据库的单一存储模式在数据操作方式、查询方式和数据高效管理方面存在的局限性;最后结合GIS领域空间大数据存储对数据库存储能力的可扩展性及数据处理和访问的高并发要求,提出基于内存数据库和NoSQL数据库的空间大数据分布式存储与综合处理策略,并开发了原型系统对提出的存储策略进行可行性和有效性进行了验证。
    Abstract: Geospatial data in databases have shifted to conform to the characteristics of big-data in tandem with the development of the Internet, mobile Internet, cloud computing, and especially, spatial data acquisition technologies. Faced with tackling spatial big data, traditional spatial database management techniques based on Relational Database Management Systems have encountered problems including the unstructured characteristics of the spatial object, the high scalability of storage capacity, and the high concurrency in big data application environment. This paper focuses on the mainstream of NoSQL databases that successfully deal with unstructured big data and are widely used in Internet applications, but lack of spatial characteristics. The data operational and query modes cannot meet the requirments of GIS applications. To resolve this problem, this paper proposes a strategy that takes a NoSQL database as a warehouse for spatial big data and a traditional spatial database as the application server. The storage system architecture and the key technology and solutions are discussed. A prototype system was developed based on MongoDB, PostgreSQL and SQLite to verify the feasibility and effectiveness of the strategy.
  • 图  1   空间大数据分布式存储管理系统应用框架图

    Figure  1.   Application Diagram of the System of Spatial Big Data Storage and Management

    图  2   分布式存储系统体系结构图

    Figure  2.   Distributed Storage Architecture of Spatial Big Data

    图  3   基于MongoDB的分布式空间数据存储方案

    Figure  3.   Spatial Database Storage in MongDB Sharding

    图  4   图幅索引的数据表

    Figure  4.   Index Collection of the Tile Index

    表  1   各数据库中空间数据存储组织方式

    Table  1   Object Types Mapping in Database

    SQLite:MemPostgreSQLMongoDB
    空间位置信息存储格式TextbinBSON
    数据源对应的数据库对象databasedatabasedatabase
    图层对应的数据库对象tabletablecollection
    空间对象对应的数据库对象rowrowdocument
    下载: 导出CSV

    表  2   单个要素平均处理时间

    Table  2   Average Time Consumption for One Shape File

    耗时/s 备注
    导入内存数据库 20.6 -
    内存图层切分图幅 2.3 12列×3行
    追加到目标图层 46.7 -
    总计 69.6 -
    下载: 导出CSV

    表  3   数据提取时间记录

    Table  3   Data Extraction Bases on Spatial Index

    耗时/s 备注
    元数据查询 0.6 -
    图层内图幅查询 2.3 8个图层
    图幅数据追加 978 3 861个图幅
    总计 981 12 574 739对象
    下载: 导出CSV

    表  4   各图层叠加统计耗时

    Table  4   Time Consumption on Overlay Analysis

    图层 图幅数 对象数 耗时A/s 耗时B/s(内存数据库)
    L1 73 253 454 192 126
    L2 385 1 264 189 643 397
    L3 656 2 159 563 1072 658
    L4 656 2 111 632 1123 744
    L5 658 2 176 057 1070 723
    L6 762 2 475 838 1119 691
    L7 581 1 879 148 649 419
    L8 90 254 858 215 141
    下载: 导出CSV
  • [1]

    Mooney P, Corcoran P, Winstanley A C. Geospatial Data Issues in the Provision of Location-based Services[C].Proceedings of the 7th International Symposium on LBS & Telecartography, Guangzhou, China, 2010

    [2] 龚健雅.空间数据库管理系统的概念与发展趋势[J].测绘科学, 2001, 26(3):4-9 http://www.cnki.com.cn/Article/CJFDTOTAL-CHKD200103001.htm

    Gong Jianya. Concepts and Development of Spatial Database Management System[J]. Science of Surveying and Mapping, 2001, 26(3):4-9 http://www.cnki.com.cn/Article/CJFDTOTAL-CHKD200103001.htm

    [3] 刘经南, 方媛.位置大数据的分析处理研究进展[J]. 武汉大学学报·信息科学版, 2014, 39(4):380-385 http://ch.whu.edu.cn/CN/abstract/abstract2947.shtml

    Liu Jingnan, Fang Yuan. Research Progress in Location Big Data Analysis and Processing[J].Geomatics and Information Science of Wuhan University,2014, 39(4):380-385 http://ch.whu.edu.cn/CN/abstract/abstract2947.shtml

    [4] 周芹,李绍俊.基于Oracle Spatial的空间数据库缓存技术研究[J].地球信息科学, 2007, 9(3):39-44 http://www.cnki.com.cn/Article/CJFDTOTAL-DQXX200703010.htm

    Zhou Qin, Li Shaojun. Study on Spatial Data Cache Technology Based on Oracle Spatial[J].Geo-Information Science, 2007, 9(3):39-44 http://www.cnki.com.cn/Article/CJFDTOTAL-DQXX200703010.htm

    [5] 周芹,李绍俊,李云锦,等.空间数据库引擎的关键技术及发展[C]. 中国地理信息系统协会第四次会员代表大会, 北京,2007

    Zhou Qin, Li Shaojun, Li Yunjin, et al. The Key Technique and Development of Spatial Database Engine[C]. The Fourth Member Representative Assembly of China Geographic Information System Association, Beijing, China, 2007

    [6]

    Zhong Y, Han J, Zhang T, et al. A Distributed Geospatial Data Storage and Processing Framework for Large-scale WebGIS[C]. The 20th International Conference on Geoinformatics, Hong Kong. China, 2012

    [7]

    Han D, Stroulia E. HGrid:A Data Model for Large Geospatial Data Sets in HBase[C]. Proceedings of the 2013 IEEE Sixth International Conference on Cloud Computing, CA. USA, 2013

    [8]

    Wei L Y, Hsu Y T, Peng W C, et al. Indexing Spatial Data in Cloud Data Managements[J].Pervasive and Mobile Computing, 2014, 15:48-61 doi: 10.1016/j.pmcj.2013.07.001

    [9] 陈崇成, 林剑峰, 吴小竹, 等.基于NoSQL的海量空间数据云存储与服务方法[J].地球信息科学学报, 2013, 15(2):166-174 doi: 10.3724/SP.J.1047.2013.00166

    Chen Chongcheng, Lin Jianfeng, Wu Xiaozhu,et al. Massive Geo-spatial Data Cloud Storage and Services Based on NoSQL Database Technique[J]. Journal of Geo-Information Science, 2013, 15(2):166-174 doi: 10.3724/SP.J.1047.2013.00166

    [10]

    Chang F, Dean J, Ghemawat S, et al. Bigtable:A Distributed Storage System for Structured Data[J].ACM Transactions on Computer Systems, 2008, 26(2):1-26 http://cn.bing.com/academic/profile?id=6034c75210d72de01b3e8b076389df33&encoded=0&v=paper_preview&mkt=zh-cn

    [11]

    Ghemawat S, Gobioff H, Leung S T. The Google File System[C]. 19th ACM Symposium on Operating Systems Principles, New York, USA, 2003

    [12]

    Burrows M. The Chubby Lock Service for Loosely-coupled Distributed Systems[C]. Proceedings of the 7th Symposium on Operating Systems Design and Implementation, Berkeley, USA, 2006

    [13] 陈吉荣,乐嘉锦.基于Hadoop生态系统的大数据解决方案综述[J].计算机工程与科学, 2013, 35(10):25-35 http://www.cnki.com.cn/Article/CJFDTOTAL-JSJK201310004.htm

    Chen Jirong, Le Jiajin. Reviewing the Big Data Solution Based on Hadoop Ecosystem[J]. Computer Engineering & Science, 2013, 35(10):25-35 http://www.cnki.com.cn/Article/CJFDTOTAL-JSJK201310004.htm

    [14]

    Hecht R, Jablonski S. NoSQL Evaluation:A Use Case Oriented Survey[C]. 2011 International Conference on Cloud and Service Computing, Hong Kong, China, 2011

    [15] 李绍俊,钟耳顺,周芹,等.开放式空间数据库访问接口的开发应用[J].地球信息科学学报, 2013, 10(2):193-199 http://www.cnki.com.cn/Article/CJFDTOTAL-DQXX201302007.htm

    Li Shaojun, Zhong Ershun, Zhou Qin,et al. Study on Opening Geospatial Database Connectivity[J].Journal of Geo-Information Science,2013, 10(2):193-199 http://www.cnki.com.cn/Article/CJFDTOTAL-DQXX201302007.htm

图(4)  /  表(4)
计量
  • 文章访问数: 
  • HTML全文浏览量: 
  • PDF下载量: 
  • 被引次数: 0
出版历程
  • 收稿日期:  2015-04-16
  • 发布日期:  2017-02-04

目录

    /

    返回文章
    返回