YU Liebing, XIANG Longgang, SUN Shangyu, GUAN Xuefeng, WU Huayi. kNN Query Processing for Trajectory Big Data Based on Distributed Column-Oriented Storage[J]. Geomatics and Information Science of Wuhan University, 2021, 46(5): 736-745. DOI: 10.13203/j.whugis20200136
Citation: YU Liebing, XIANG Longgang, SUN Shangyu, GUAN Xuefeng, WU Huayi. kNN Query Processing for Trajectory Big Data Based on Distributed Column-Oriented Storage[J]. Geomatics and Information Science of Wuhan University, 2021, 46(5): 736-745. DOI: 10.13203/j.whugis20200136

kNN Query Processing for Trajectory Big Data Based on Distributed Column-Oriented Storage

Funds: 

The National Key Research and Development Program of China 2018YFB2100603

the National Natural Science Foundation of China 41771474

the National Natural Science Foundation of China 41930107

More Information
  • Author Bio:

    YU Liebing, master, specializes in geographic spatial-temporal data storage and computing. E-mail: liebingyu@whu.edu.cn

  • Corresponding author:

    SUN Shangyu, PhD. E-mail: shangyu_sun@whu.edu.cn

  • Received Date: March 30, 2020
  • Published Date: May 04, 2021
  •   Objectives  With the advancement of mobile object tracking technologies, huge volumes of spatiotemporal data have been accumulated in the form of location trajectories. Such data bring us new opportunities and challenges in efficient trajectory retrieval. Among them, point to trajectory k nearest neighbor (P2T_kNN) query plays a pivotal role in many mobile object-oriented analyses and applications. It has been widely used in tracing the source of infectious diseases, recommending travel routes, and generating and updating local road networks based on trajectories.
      Methods  Aiming at the requirement of efficient P2T_kNN query processing for large-scale trajectory data, a trajectory organization method combining spatiotemporal segmentation and trajectory segmentation is proposed. The core idea is to use the discrete global grid system (DGGS) to segment the trajectory in spatial based on the previous time division. The spatiotemporal cell encoding is applied to index the trajectory segments falling in it. In order to realize efficient encoding of spatiotemporal cells, three concatenated encoding structures of spatiotemporal divide and conquer are proposed to solve the huge difference in dimension and scale between temporal dimension and spatial dimension. Based on above, a P2T_kNN query processing framework for distributed column-oriented storage is designed, and an adaptive spatial cell search algorithm that takes into account the spatiotemporal distribution of trajectory data is proposed. The step size for spatial cell searching is adaptively set according to the spatial stratified heterogeneity of the trajectory data under the given time constraint, which greatly improves the search efficiency of areas with sparse trajectory data.
      Results  Experimental results based on billion-level trajectories show that the method is suitable for P2T_kNN query processing of big trajectory data. (1) Different encoding structures are suitable for different query scenarios. Appropriate encoding structure can be used to make query response time less than 1 s. (2) Adaptive spatial cell search algorithm can significantly improve query efficiency, making the average query response delay of random one thousand queries less than 1 second on both dense and sparse areas. (3) The spatial cell?s k rate has an important influence on the query. After the k rate reaches 0.5, the query response time tends to balance.
      Conclutions  We propose a nearest neighbor query processing framework and its search algorithm for distributed column-oriented storage. Firstly, the framework organizes the trajectory in segments based on the spatiotemporal cells, and uses concatenated spatiotemporal encoding to realize spatiotemporal cluster storage of trajectory segments.Secondly, an algorithm that can perceive the spatial differentiation characteristics of trajectory data under a given time constraint and dynamically adjust the search step size of the spatial cell is designed, which greatly improves the processing efficiency of the data sparse area. The adaptive search strategy is applicable to any DGGS based on recursive subdivision, and it has strong scalability and can seamlessly connect with other subdivision frameworks, such as S2 Geometry, H3, etc.
  • [1]
    高强, 张凤荔, 王瑞锦, 等. 轨迹大数据: 数据处理关键技术研究综述[J]. 软件学报, 2017, 28(4): 959-992 https://www.cnki.com.cn/Article/CJFDTOTAL-RJXB201704015.htm

    Gao Qiang, Zhang Fengli, Wang Ruijin, et al. Trajectory Big Data: A Review of Key Technologies in Data Processing[J]. Journal of Software, 2017, 28(4): 959-992 https://www.cnki.com.cn/Article/CJFDTOTAL-RJXB201704015.htm
    [2]
    Zheng Y. Trajectory Data Mining: An Overview[J]. ACM Transactions on Intelligent Systems and Technology, 2015, 6(3): 1-41 http://pure.ltu.se/portal/en/publications/trajectory-data-mining(47e55f7e-db63-4ee4-a2a8-917c82999856)/export.html
    [3]
    李德仁, 邵振峰, 于文博, 等. 基于时空位置大数据的公共疫情防控服务让城市更智慧[J]. 武汉大学学报∙信息科学版, 2020, 45(4): 475-488 https://www.cnki.com.cn/Article/CJFDTOTAL-WHCH202004001.htm

    Li Deren, Shao Zhenfeng, Yu Wenbo, et al. Public Epidemic Prevention and Control Services Based on Big Data of Spatiotemporal Location Make Cities more Smart[J]. Geomatics and Information Science of Wuhan University, 2020, 45(4): 475-488 https://www.cnki.com.cn/Article/CJFDTOTAL-WHCH202004001.htm
    [4]
    李晓旭, 于亚新, 张文超, 等. Coteries轨迹模式挖掘及个性化旅游路线推荐[J]. 软件学报, 2018, 29(3): 587-598 https://www.cnki.com.cn/Article/CJFDTOTAL-RJXB201803005.htm

    Li Xiaoxu, Yu Yaxin, Zhang Wenchao, et al. Mining Coteries Trajectory Patterns for Recommending Personalized Travel Routes[J]. Journal of Software, 2018, 29(3): 587-598 https://www.cnki.com.cn/Article/CJFDTOTAL-RJXB201803005.htm
    [5]
    Chen Z, Shen H T, Zhou X, et al. Searching Trajectories by Locations: An Efficiency Study[C]//ACM SIGMOD International Conference on Management of Data, Indianapolis, Indiana, USA, 2010
    [6]
    Qi S, Bouros P, Sacharidis D, et al. Efficient Point-Based Trajectory Search[C]//International Symposium on Spatial and Temporal Databases, Hong Kong, China, 2015
    [7]
    Tang L A, Zheng Y, Xie X, et al. Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations[C]//International Symposium on Spatial and Temporal Databases, Minneapolis, MN, USA, 2011
    [8]
    Han J, Haihong E, Le G, et al. Survey on NoSQL Database[C]//International Conference on Pervasive Computing, San Francisco, USA, 2011
    [9]
    李绍俊, 杨海军, 黄耀欢, 等. 基于NoSQL数据库的空间大数据分布式存储策略[J]. 武汉大学学报∙信息科学版, 2017, 42(2): 163-169 https://www.cnki.com.cn/Article/CJFDTOTAL-WHCH201702004.htm

    Li Shaojun, Yang Haijun, Huang Yaohuan, et al. Geo-spatial Big Data Storage Based on NoSQL Database[J]. Geomatics and Information Science of Wuhan University, 2017, 42(2): 163-169 https://www.cnki.com.cn/Article/CJFDTOTAL-WHCH201702004.htm
    [10]
    Nishimura S, Das S, Agrawal D, et al. MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware Services[C]//The 12th International Conference on Mobile Data Management, Lulea, Sweden, 2011
    [11]
    Nishimura S, Das S, Agrawal D, et al. MD-HBase: Design and Implementation of an Elastic Data Infrastructure for Cloud-Scale Location Services[J]. Distributed and Parallel Databases, 2013, 31(2): 289-319 doi: 10.1007/s10619-012-7109-z
    [12]
    van Le H, Takasu A. G-HBase: A High Performance Geographical Database Based on HBase[J]. IEICE Transactions on Information and Systems, 2018, 101(4): 1 053-1 065 http://www.researchgate.net/publication/324133897_G-HBase_A_High_Performance_Geographical_Database_Based_on_HBase
    [13]
    Apache HBase Team. Apache HBase TM Reference Guide[EB/OL]. [2019-12-18]. https: //hbase.apache.org/book.html
    [14]
    Fox A, Eichelberger C, Hughes J N, et al. Spatiotemporal Indexing in Non-relational Distributed Databases[C]//International Conference on Big Data, Santa Clara, CA, USA, 2013
    [15]
    Hughes J N, Annex A, Eichelberger C N, et al. GeoMesa: A Distributed Architecture for Spatio-temporal Fusion[C]// Geospatial Informatics, Fusion, and Motion Video Analytics, Washington, USA, 2015
    [16]
    Fecher R, Whitby M A. Optimizing Spatiotemporal Analysis Using Multidimensional Indexing with Geo- Wave[C]// Free and Open Source Software for Geospatial Conference Proceedings, Chicago, USA, 2017
    [17]
    Whitby M A, Fecher R, Bennight C. GeoWave: Utilizing Distributed Key-value Stores for Multidimensional Data[C]//Gertz M. Advances in Spatial and Temporal Databases. Switzerland: Springer, 2017
    [18]
    Haverkort H, van Walderveen F. Locality and Bounding-Box Quality of Two-Dimensional Space-Filling Curves[J]. Computational Geometry, 2010, 43(2): 131-147 doi: 10.1016/j.comgeo.2009.06.002
    [19]
    Qin J, Ma L, Niu J. THBase: A Coprocessor-Based Scheme for Big Trajectory Data Management[J]. Future Internet, 2019, 11(1): 10 doi: 10.3390/fi11010010
    [20]
    Zheng Yu, Zhou Xiaofang. Computing with Spatial Trajectories[M]. New York: Springer, 2011
    [21]
    Chang F, Dean J, Ghemawat S, et al. Bigtable: A Distributed Storage System for Structured Data[J]. ACM Transactions on Computer Systems, 2008, 26(2): 1-26 http://www.mendeley.com/research/distributed-storage-system-structured-data/
    [22]
    Sahr K, White D, Kimerling A J, et al. Geodesic Discrete Global Grid Systems[J]. Cartography and Geographic Information Science, 2003, 30(2): 121-134 doi: 10.1559/152304003100011090
    [23]
    Google. S2 Geometry[EB/OL]. [2019-12-18]. https://s2geometry.io/
    [24]
    Uber. H3[EB/OL]. [2019-12-18]. https://uber.github.io/h3
  • Related Articles

    [1]ZHAI Ruoming, HAN Xianquan, GAN Xiaoqing, ZOU Jingui, ZOU Shuangchao, WAN Peng, LI Jianzhou. Extraction of Line Segments from Indoor Point Clouds under Building Geometric Regularization Constraints[J]. Geomatics and Information Science of Wuhan University. DOI: 10.13203/j.whugis20240384
    [2]LIU Yawen, ZHANG Ying, CHEN Quan. Vehicle Point Cloud Data Enhancement Method Combined with Panoramic Image[J]. Geomatics and Information Science of Wuhan University, 2020, 45(7): 1015-1020. DOI: 10.13203/j.whugis20180332
    [3]YU Anbin, MEI Wensheng. An Efficient Management Method for Massive Point Cloud Data of Metro Tunnel Based on R-tree and Grid[J]. Geomatics and Information Science of Wuhan University, 2019, 44(10): 1553-1559. DOI: 10.13203/j.whugis20170419
    [4]ZHU Qing, LI Shiming, HU Han, ZHONG Ruofei, WU Bo, XIE Linfu. Multiple Point Clouds Data Fusion Method for 3D City Modeling[J]. Geomatics and Information Science of Wuhan University, 2018, 43(12): 1962-1971. DOI: 10.13203/j.whugis20180109
    [5]LU Xiaoping, ZHU Ningning, LU Fengnian. An Elliptic Cylindrical Model for Tunnel Filtering[J]. Geomatics and Information Science of Wuhan University, 2016, 41(11): 1476-1482. DOI: 10.13203/j.whugis20140389
    [6]FANG Fang, CHENG Xiaojun. A Fast Data Reduction Method for Massive Scattered Point Clouds Based on Slicing[J]. Geomatics and Information Science of Wuhan University, 2013, 38(11): 1353-1357.
    [7]YING Shen, MAO Zhengyuan, LI Lin, XU Guang. Point Cloud Segmentation of 3D Rabbit Base 3D Voronoi[J]. Geomatics and Information Science of Wuhan University, 2013, 38(3): 358-361.
    [8]TUO Lei, KANG Zhizhong, XIE Yuancheng, WANG Baoqian. Continuously Vertical Section Abstraction for Deformation Monitoring of Subway Tunnel Based on Terrestrial Point Clouds[J]. Geomatics and Information Science of Wuhan University, 2013, 38(2): 171-175,185.
    [9]SUI Lichun, ZHANG Yibin, ZHANG Shuo, CHEN Wei. Filtering of Airborne LiDAR Point Cloud Data Based on Progressive TIN[J]. Geomatics and Information Science of Wuhan University, 2011, 36(10): 1159-1163.
    [10]ZHAN Qingming ZHOU Xingang, XIAO Yinghui, YU Liang, . 对古建筑激光扫描点云进行分割、识别,并利用Hough变换和最小二乘法从点云中提取直线和圆,取得了较满意的结果。对两种算法的提取效果进行了比较。[J]. Geomatics and Information Science of Wuhan University, 2011, 36(6): 674-677.
  • Cited by

    Periodical cited type(22)

    1. 万冠军,苏涛. 移动式三维激光扫描在轨道交通结构变形监测中的应用. 北京测绘. 2024(07): 1015-1019 .
    2. 李佳田,阿晓荟,王聪聪,高鹏,朱志浩,晏玲. 利用高帧频相机检测运动控制轴精细形变. 武汉大学学报(信息科学版). 2022(03): 388-395 .
    3. 赵立都,张双成,向中富,马下平,周银,胡川,冯红刚,陈茂霖,蒋俊秋. 地面三维激光扫描点云应用于滑坡监测中基准统一研究. 灾害学. 2022(02): 84-88 .
    4. 甘立彬. 基于三维激光扫描技术的隧道工程测量与建模. 工程勘察. 2021(06): 58-61 .
    5. 严慧敏. 三维激光扫描在隧道断面测量中应用研究——以宁杭高速公路梯子山隧道为例. 测绘地理信息. 2021(06): 108-111 .
    6. 张辛,向巍,丁涛,宋韬. 基于三维激光扫描技术的穿黄隧洞形变检测研究. 人民长江. 2020(09): 129-134 .
    7. 浦仕贵,甘淑,杨敏. 面向不同分辨率点云的NURBS曲面构建偏差分析比较. 地质灾害与环境保护. 2020(03): 72-76 .
    8. 王峰,王清泉,王红新,温立委. 三维激光扫描技术在地铁工程测量的应用综述. 工程勘察. 2019(01): 56-60 .
    9. 吴俊杰. 高负荷光栅传感网络异常节点数据挖掘方法研究. 激光杂志. 2019(02): 68-72 .
    10. 陈智聪. 桥梁模型的三维激光扫描变形分析及精度评定. 四川建材. 2019(05): 205-206 .
    11. 祝明然. 三维激光测量技术在大型复杂钢结构工程建造中的应用. 测绘通报. 2019(08): 92-95 .
    12. 郝进锋,姜月利,祝庭,唐楠. 基于激光扫描数据的建筑工程质量评估. 激光杂志. 2018(05): 53-56 .
    13. 郑跃骏,岳仁宾. 基于激光扫描的交通隧洞几何形变监测方法. 北京测绘. 2018(11): 1318-1321 .
    14. 马伟丽,王健,孙文潇,陈喆. 基于NURBS曲面模型的滑坡点云形变分析. 中国科技论文. 2018(21): 2408-2412 .
    15. 王峰. 集成RTK的三维激光扫描技术测量地形的方法. 测绘通报. 2017(03): 71-75 .
    16. 梁周雁,赵富燕,孙文潇,邵为真. 基于三维激光扫描技术的地表变形监测方法研究. 测绘与空间地理信息. 2017(06): 213-216+219 .
    17. 葛聪. 激光扫描的物流条形码识别系统. 激光杂志. 2017(07): 188-191 .
    18. 唐奇军. 三维激光扫描中隧道变形监测方法分析. 中国高新技术企业. 2017(10): 218-220 .
    19. 常明,康志忠. 利用激光点云的规则面微小变形统计分析. 测绘科学. 2016(03): 138-144+57 .
    20. 林鸿,欧海平,王峰. 地面激光扫描技术在建筑变形测量中的应用探讨. 测绘通报. 2016(06): 73-76 .
    21. 曾繁轩,李亮. 基于Lagrange算子与曲面拟合的点云滤波研究. 激光杂志. 2016(08): 75-78 .
    22. 孔金玲,杨笑天,吴哲超,邵永军,袁雷,赵绍兵,先涛. 基于三维激光扫描技术的高速公路滑坡体建模及应用. 公路交通科技(应用技术版). 2015(12): 12-14 .

    Other cited types(14)

Catalog

    Article views PDF downloads Cited by(36)
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return