LI Zuo-lin, XIANG Long-gang, YU Lie-bing, WU Hua-yi, GUAN Xue-feng. Distributed Join Query Method for Large-Scale Spatial Data Streams[J]. Geomatics and Information Science of Wuhan University. DOI: 10.13203/j.whugis20230040
Citation: LI Zuo-lin, XIANG Long-gang, YU Lie-bing, WU Hua-yi, GUAN Xue-feng. Distributed Join Query Method for Large-Scale Spatial Data Streams[J]. Geomatics and Information Science of Wuhan University. DOI: 10.13203/j.whugis20230040

Distributed Join Query Method for Large-Scale Spatial Data Streams

More Information
  • Received Date: September 10, 2023
  • Available Online: May 12, 2024
  • Objectives: Spatial join query is one of the basic operations for processing and analyzing spatial data. With the explosive growth of spatial data, join query technology for massive spatial data has attracted much attention. Although the existing research work has explored the join query of spatial data streams, it is still in its infancy. The processing of join query processing of spatial data streams is still insufficient in terms of systematization and universality. It is urgent to explore how to define general Spatial data streams connect query problems and provide systematic solutions. Method: After fully excavating the common real-time spatial data processing problems behind various practical application scenarios, this paper refers to the processing theory and methods of spatial data in the batch processing field, and considers the characteristics of long-term continuous operation of stream processing, and formalizes the problem of spatial data stream connection Two types of connection operators are defined, including "stream-table" connection and "stream-stream" connection; on this basis, a two-layer data index structure of global grid partition and local spatial index is proposed to support spatial Stream supports distributed connection; for the "stream-table" connection, this paper proposes two physical realization methods of spatial dimension tables, and designs a two-level R-tree-based topology relationship judgment optimization for the memory-based storage method Algorithm; For "stream-to-stream" joins, this paper proposes a partition boundary data redundancy algorithm to correctly implement cross-partition joins of partition boundary data. In addition, for the caching requirements of interval time semantics, a BinR-tree result that takes into account both management and retrieval efficiency is proposed. Results: (1) The “stream-table” spatial connection implemented in this paper can achieve a throughput of more than 60,000 under a single degree of parallelism, and the average overall delay is about 90 milliseconds. Compared with the native method, the average throughput is 9 times Boost, latency is reduced by an average of 20 milliseconds. (2) Same as the "stream-table" connection, the performance of the "stream-stream" connection method in this paper improves steadily with the increase of parallelism, and has good horizontal scalability. (3) In the experimental comparison between window connection and interval connection, with As the window increases, the connection latency increases, much higher than the stream interval connection. This part of the experiment also shows that the stream interval connection without bounded time semantics is more suitable for stream processing logic. (4) Mesh division has little effect on throughput, and its main function is to balance the load and ensure the spatial proximity of data. (5) When the bin size is smaller than the window size, the query will span multiple BinR-trees, which will affect the query efficiency. Properly increasing the bin size will improve the query efficiency; when the bin size is greater than or equal to the time interval, the query efficiency will not be greatly improved. Conclusions: A large number of experimental results show that the spatial data connection method proposed in this paper has a good linear speedup ratio, and compared with the baseline method, the connection query efficiency has been significantly improved.
  • Related Articles

    [1]LIU Shuo, ZHANG Lei, LI Jian. A Modified Wide Lane Bootstrapping Ambiguity Resolution Algorithm[J]. Geomatics and Information Science of Wuhan University, 2018, 43(4): 637-642. DOI: 10.13203/j.whugis20150462
    [2]Wang Bing, Sui Lifen, Wang Wei, Ma Cheng. Rapid Resolution of Integer Ambiguity in Integrated GPS/Gyro Attitude Determination[J]. Geomatics and Information Science of Wuhan University, 2015, 40(1): 128-133.
    [3]FENG Wei, HUANG Dingfa, YAN Li, LI Meng. GNSS Dual-Frequency Integer Relationship Constrained Ambiguity Resolution[J]. Geomatics and Information Science of Wuhan University, 2012, 37(8): 945-948.
    [4]QIU Lei, HUA Xianghong, CAI Hua, WU Yue. Direct Calculation of Ambiguity Resolution in GPS Short Baseline[J]. Geomatics and Information Science of Wuhan University, 2009, 34(1): 97-99.
    [5]WANG Xinzhou, HUA Xianghong, QIU Lei. A New Method for Integer Ambiguity Resolution in GPS Deformation Monitoring[J]. Geomatics and Information Science of Wuhan University, 2007, 32(1): 24-26.
    [6]LIU Zhimin, LIU Jingnan, JIANG Weiping, LI Tao. Ambiguity Resolution of GPS Short-Baseline Using Genetic Algorithm[J]. Geomatics and Information Science of Wuhan University, 2006, 31(7): 607-609.
    [7]LOU Yidong, LI Zhenghang, ZHANG Xiaohong. A Method of Short Baseline Solution without Cycle Slip Detection and Ambiguity Resolution[J]. Geomatics and Information Science of Wuhan University, 2005, 30(11): 995-998.
    [8]YANG Rengui, OU Jikun, WANG Zhenjie ZHAO Chunmei, . Searching Integer Ambiguities in Single Frequency Single Epoch by Genetic Algorithm[J]. Geomatics and Information Science of Wuhan University, 2005, 30(3): 251-254.
    [9]P. J. G. Teunissen. A New Class of GNSS Ambiguity Estimators[J]. Geomatics and Information Science of Wuhan University, 2004, 29(9): 757-762.
    [10]Chen Yongqi. An Approach to Validate the Resolved Ambiguities in GPS Rapid Positioning[J]. Geomatics and Information Science of Wuhan University, 1997, 22(4): 342-345.

Catalog

    Article views (113) PDF downloads (27) Cited by()
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return