路网环境下的混合数据最近邻查询算法

张丽平; 张晓娇; 金飞虎; 李松

doi:10.13203/j.whugis20200075

摘要: 路网环境下的k最近邻查询方法在地理信息系统、智慧城市、数据挖掘、医疗营救和物流配送等领域都有着较为重要的作用，已有路网环境下的最近邻查询方法无法直接解决查询对象为点而数据对象为点和线段混合的复杂数据的近邻查询问题，为了弥补已有方法的不足，提出了路网环境下混合复杂数据的最近邻查询算法。将查询过程分为预处理、数据集约减和数据集精炼3个部分，并与3种对比算法进行对比实验，研究了测试数据对象的数量、路网规模的大小对中央处理器运行时间以及输入/输出代价的影响。结果表明，所提算法能有效地处理路网环境下混合数据的最近邻查询问题。

Abstract:

Objectives The k nearest neighbor query method under the road network environment plays an important role in geographic information system, point of interest discovery, smart city, data mining and material distribution. The existing methods mainly deal with the query problem that the query object and data object are single data type. However, in reality, the data object type of the target object set is often not single, and the query problem will become more complex. The existing nearest neighbor query methods in road network cannot directly solve the neighbor query problem in which the query object is a point and the data object includes the data mixed with points and line segments. In order to make up for the shortcomings of the existing methods, the nearest neighbor query algorithm of mixed complex data in the road network is proposed.

Methods The query process is divided into three parts: Preprocessing, data set reduction and data set refining. Firstly, the data set is preprocessed, the line segments are transformed into points. A mapping relationship is formed between the intersection of line segments in the road network and line segments, and the corresponding line segments are found by querying the location of the points. The query points, data points and Hilbert values are stored using arrays. Secondly, the data object set is refined preliminarily by pruning method and shortest path theorem to reduce the number of data objects, and the data set filtering algorithm is given. Finally, the points in the data set are further pruned to calculate the shortest path network distance from the query point to the points in the candidate set. Compared with these distances, the shortest path distance can be determined, and then the nearest neighbor is accurately found.

Results Experimental results show that the proposed algorithm effectively reduces the data set and the amount of calculation in the query process, further refines the nearest neighbor candidate set and reduces the query scope and the query time. There is no need to store the distance information between all boundary nodes in the road network, so a large amount of redundant distance information is eliminated.The data points and road network data that meet the pruning conditions around the query points are pruned. Therefore, the increase of road network scale has little impact on the time cost of the algorithm.

Conclusions The proposed algorithm can directly deal with the nearest neighbor query of mixed data in the road network environment. When the number of data objects and the scale of the road network are large, the algorithm has obvious advantages in central processing unit running time and input/output cost.

路网环境下的混合数据最近邻查询算法

Nearest Neighbor Query Algorithm of Mixed Data in Road Network