王鸿琰, 关雪峰, 吴华意. 一种面向CPU/GPU异构环境的协同并行空间插值算法[J]. 武汉大学学报 ( 信息科学版), 2017, 42(12): 1688-1695. DOI: 10.13203/j.whugis20150361
引用本文: 王鸿琰, 关雪峰, 吴华意. 一种面向CPU/GPU异构环境的协同并行空间插值算法[J]. 武汉大学学报 ( 信息科学版), 2017, 42(12): 1688-1695. DOI: 10.13203/j.whugis20150361
WANG Hongyan, GUAN Xuefeng, WU Huayi. A Collaborative Parallel Spatial Interpolation Algorithmon Oriented Towards the Heterogeneous CPU/GPU System[J]. Geomatics and Information Science of Wuhan University, 2017, 42(12): 1688-1695. DOI: 10.13203/j.whugis20150361
Citation: WANG Hongyan, GUAN Xuefeng, WU Huayi. A Collaborative Parallel Spatial Interpolation Algorithmon Oriented Towards the Heterogeneous CPU/GPU System[J]. Geomatics and Information Science of Wuhan University, 2017, 42(12): 1688-1695. DOI: 10.13203/j.whugis20150361

一种面向CPU/GPU异构环境的协同并行空间插值算法

A Collaborative Parallel Spatial Interpolation Algorithmon Oriented Towards the Heterogeneous CPU/GPU System

  • 摘要: CPU/GPU异构混合系统是一种新型高性能计算平台,但现有并行空间插值算法仅依赖CPU或GPU进行加速,迫切需要研究协同并行空间插值算法以充分利用异构计算资源,进一步提升插值效率。以薄板样条函数插值为例,提出一种CPU/GPU协同并行插值算法以加速海量激光雷达(light detector & ranger,LiDAR)点云生成数字高程模型(DEM)。通过插值任务的分解与抽象封装以屏蔽底层硬件执行模式的差异性,同时在多级协同并行框架基础上设计了Greedy-SET动态调度策略,策略顾及底层硬件能力的差异性,以实现异构并行资源的充分利用和良好负载均衡。实验表明,协同并行插值算法在高性能工作站上取得19.6倍的加速比,相比单一CPU或GPU并行算法,其效率提升分别达到54%和44%,实现了高效的协同并行处理。

     

    Abstract: Nowadays the heterogeneous CPU/GPU systems become ubiquitous, but most of current parallel spatial interpolation algorithms exploit only one type of computation units to speedup the calculation and thus it results in parallel resources wasted. To address this problem, a collaborative parallel thin plate spline interpolation algorithm is proposed in this paper to accelerate DEM generation from massive LiDAR point clouds. In this collaborative parallel algorithm, the input point clouds are firstly decomposed into a collection of discrete blocks and encapsulated as general task objects to shield the heterogeneous execution models of different processing units. And then a special scheduling algorithm, named Greedy-SET, is also proposed to achieve better load balance based on the computing capabilities of CPU and GPU. Experimental results demonstrate that the proposed collaborative parallel algorithm can achieve the highest speedup times of approximately 19.6. The performance improvement ratios compared with pure CPU and GPU parallel algorithms are 54% and 44% respectively.

     

/

返回文章
返回