YU Liebing, XIANG Longgang, SUN Shangyu, GUAN Xuefeng, WU Huayi. kNN Query Processing for Trajectory Big Data Based on Distributed Column-Oriented Storage[J]. Geomatics and Information Science of Wuhan University, 2021, 46(5): 736-745. DOI: 10.13203/j.whugis20200136
Citation: YU Liebing, XIANG Longgang, SUN Shangyu, GUAN Xuefeng, WU Huayi. kNN Query Processing for Trajectory Big Data Based on Distributed Column-Oriented Storage[J]. Geomatics and Information Science of Wuhan University, 2021, 46(5): 736-745. DOI: 10.13203/j.whugis20200136

kNN Query Processing for Trajectory Big Data Based on Distributed Column-Oriented Storage

  •   Objectives  With the advancement of mobile object tracking technologies, huge volumes of spatiotemporal data have been accumulated in the form of location trajectories. Such data bring us new opportunities and challenges in efficient trajectory retrieval. Among them, point to trajectory k nearest neighbor (P2T_kNN) query plays a pivotal role in many mobile object-oriented analyses and applications. It has been widely used in tracing the source of infectious diseases, recommending travel routes, and generating and updating local road networks based on trajectories.
      Methods  Aiming at the requirement of efficient P2T_kNN query processing for large-scale trajectory data, a trajectory organization method combining spatiotemporal segmentation and trajectory segmentation is proposed. The core idea is to use the discrete global grid system (DGGS) to segment the trajectory in spatial based on the previous time division. The spatiotemporal cell encoding is applied to index the trajectory segments falling in it. In order to realize efficient encoding of spatiotemporal cells, three concatenated encoding structures of spatiotemporal divide and conquer are proposed to solve the huge difference in dimension and scale between temporal dimension and spatial dimension. Based on above, a P2T_kNN query processing framework for distributed column-oriented storage is designed, and an adaptive spatial cell search algorithm that takes into account the spatiotemporal distribution of trajectory data is proposed. The step size for spatial cell searching is adaptively set according to the spatial stratified heterogeneity of the trajectory data under the given time constraint, which greatly improves the search efficiency of areas with sparse trajectory data.
      Results  Experimental results based on billion-level trajectories show that the method is suitable for P2T_kNN query processing of big trajectory data. (1) Different encoding structures are suitable for different query scenarios. Appropriate encoding structure can be used to make query response time less than 1 s. (2) Adaptive spatial cell search algorithm can significantly improve query efficiency, making the average query response delay of random one thousand queries less than 1 second on both dense and sparse areas. (3) The spatial cell?s k rate has an important influence on the query. After the k rate reaches 0.5, the query response time tends to balance.
      Conclutions  We propose a nearest neighbor query processing framework and its search algorithm for distributed column-oriented storage. Firstly, the framework organizes the trajectory in segments based on the spatiotemporal cells, and uses concatenated spatiotemporal encoding to realize spatiotemporal cluster storage of trajectory segments.Secondly, an algorithm that can perceive the spatial differentiation characteristics of trajectory data under a given time constraint and dynamically adjust the search step size of the spatial cell is designed, which greatly improves the processing efficiency of the data sparse area. The adaptive search strategy is applicable to any DGGS based on recursive subdivision, and it has strong scalability and can seamlessly connect with other subdivision frameworks, such as S2 Geometry, H3, etc.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return