Message Board

Respected readers, authors and reviewers, you can add comments to this page on any questions about the contribution, review,        editing and publication of this journal. We will give you an answer as soon as possible. Thank you for your support!

Name
E-mail
Phone
Title
Content
Verification Code
Volume 43 Issue 10
Oct.  2018
Turn off MathJax
Article Contents

LI Shuang, ZHAI Liang, SANG Huiyong, ZHOU Bin, FANG Xin, ZHEN Yunpeng. An Improved LUR-based Spatial Distribution Simulation for the Large Area PM2.5 Concentration[J]. Geomatics and Information Science of Wuhan University, 2018, 43(10): 1574-1579, 1587. doi: 10.13203/j.whugis20170042
Citation: LI Shuang, ZHAI Liang, SANG Huiyong, ZHOU Bin, FANG Xin, ZHEN Yunpeng. An Improved LUR-based Spatial Distribution Simulation for the Large Area PM2.5 Concentration[J]. Geomatics and Information Science of Wuhan University, 2018, 43(10): 1574-1579, 1587. doi: 10.13203/j.whugis20170042

An Improved LUR-based Spatial Distribution Simulation for the Large Area PM2.5 Concentration

doi: 10.13203/j.whugis20170042
Funds:

Thematic Monitoring of the First Package of National Geographic Situation Monitoring Project in 2017(2nd Batch) Q1722

the National Natural Science Foundation of China 41701213

Basic Research Funding in CASM 7771716

Open Fund from the Key Laboratory for National Geographic Census and Monitoring, National Administration of Surveying, Mapping and Geoinformation 2016NGCMZD03

the National Natural Science Youth Fund Project of China 41501192

More Information
  • Author Bio:

    LI Shuang, master, specializes in the information statistics of geographical conditions monitoring. E-mail:ls02020029@163.com

  • Corresponding author: SANG Huiyong, PhD. E-mail:huiyong.sang@casm.ac.cn
  • Received Date: 2017-02-28
  • Publish Date: 2018-10-05
  • There exists the shortage of traditional land use regression (LUR) model in losing information of predictor variables when simulating the air pollutant concentration. An improved model which combined principal component regression (PCR) and stepwise multiple line regression (SMLR)-LUR (PCA+SMLR) was developed to simulate the spatial distribution of PM2.5 in large area. Firstly, the correlation analysis was conducted to screen out effective predictor variables. Secondly, principal component analysis (PCA) was employed to transform effective predictor variables to principle components. Finally, all principal components were used to conduct SMLR to simulate the spatial distribution of PM2.5. Meanwhile, the reliability of the improved model was tested in Beijing-Tianjin-Hebei urban agglomeration. Experimental results of three models (PCR, SMLR and PCA+SMLR) were compared and analyzed. The results indicated that the PCA+SMLR model has an adjusted R2 of 0.883 by improving the contribution of the predictor variables. Besides, it is better than the traditional mo-del for accuracy index and the mapping results. Therefore, it can be concluded that the PCA+SMLR is a promising PM2.5 modeling method and could be very use-ful for air pollution mapping.
  • [1] Zou B, Pu Q, Bilal M, et al. High-Resolution Sate-llite Mapping of Fine Particulates Based on Geographically Weighted Regression[J].IEEE Geoscience and Remote Sensing Letters, 2016, 13(4):495-499 doi:  10.1109/LGRS.2016.2520480
    [2] Silva R A, West J J, Zhang Y, et al. Global Premature Mortality Due to Anthropogenic Outdoor Air Pollution and the Contribution of Past Climate Change[J].Environmental Research Letters, 2013, 8(3):034005 doi:  10.1088/1748-9326/8/3/034005
    [3] Lim J M, Jeong J H, Lee J H, et al. The Analysis of PM2.5 and Associated Elements and Their Indoor/Outdoor Pollution Status in an Urban Area[J]. Indoor Air, 2011, 21(2):145-155 doi:  10.1111/ina.2011.21.issue-2
    [4] 邹滨, 许珊, 张静.融合空间尺度特征的时空序列预测建模方法[J].武汉大学学报·信息科学版, 2017, 42(2):216-222 http://ch.whu.edu.cn/CN/abstract/abstract3391.shtml

    Zou Bin, Xu Shan, Zhang Jin. Spatial Variation Analysis of Urban Air Pollution Using GIS:A Land Use Perspective[J].Geomatics and Information Science of Wuhan University, 2017, 42(2):216-222 http://ch.whu.edu.cn/CN/abstract/abstract3391.shtml
    [5] 邓敏, 陈倜, 杨文涛.融合空间尺度特征的时空序列预测建模方法[J].武汉大学学报·信息科学版, 2015, 40(12):1625-1632 http://ch.whu.edu.cn/CN/abstract/abstract3391.shtml

    Deng Min, Chen Ti, Yang Wentao. A New Method of Modeling Spatio-temporal Sequence by Conside-ring Spatial Characteristics[J]. Geomatics and Information Science of Wuhan University, 2015, 40(12):1625-1632 http://ch.whu.edu.cn/CN/abstract/abstract3391.shtml
    [6] Zou B, Luo Y, Wan N, et al. Performance Compari-son of LUR and OK in PM2.5 Concentration Mapping:A Multidimensional Perspective[J]. Sci Rep, 2015, 5(5):8698
    [7] Briggs D J, Collins S, Elliott P, et al. Mapping Urban Air Pollution Using GIS:A Regression-based Approach[J]. International Journal of Geographi-cal Information Science, 1997, 11(7):699-718 doi:  10.1080/136588197242158
    [8] 焦利民, 许刚, 赵素丽, 等.基于LUR的武汉市PM2.5浓度空间分布模拟[J].武汉大学学报·信息科学版, 2015, 40(8):1088-1094 http://ch.whu.edu.cn/CN/abstract/abstract3417.shtml

    Jiao Limin, Xu Gang, Zhao Suli, et al. LUR-based Simulation of the Spatial Distribution of PM2.5 of Wuhan[J]. Geomatics and Information Science of Wuhan University, 2015, 40(8):1088-1094 http://ch.whu.edu.cn/CN/abstract/abstract3417.shtml
    [9] Zou B, Xu S, Sternberg T, et al. Effect of Land Use and Cover Change on Air Quality in Urban Sprawl[J].Sustainability, 2016, 8(7):677 doi:  10.3390/su8070677
    [10] Olvera H A, Garcia M, Li W W, et al. Principal Component Analysis Optimization of a PM2.5 Land Use Regression Model with Small Monitoring Network[J]. Sci Total Environ, 2012, 425:27-34 doi:  10.1016/j.scitotenv.2012.02.068
    [11] Ul-Saufie A Z, Yahaya A S, Ramli N A, et al. Future Daily PM10 Concentrations Prediction by Combining Regression Models and Feedforward Backpropagation Models with Principle Component Analysis (PCA)[J]. Atmospheric Environment, 2013, 77:621-630 doi:  10.1016/j.atmosenv.2013.05.017
    [12] Li S, Zhai L, Zou B, et al. A Generalized Additive Model Combining Principal Component Analysis for PM2.5 Concentration Estimation[J]. ISPRS International Journal of Geo-Information, 2017, 6:248 doi:  10.3390/ijgi6080248
    [13] Ghosh D, Manson S M. Robust Principal Component Analysis and Geographically Weighted Regression Urbanization in the Twin Cities Metropolitan Area of Minnesota[J].J Urban Reg Inf Syst Assoc, 2008, 20(1):15-25 https://www.researchgate.net/publication/243970515_Robust_Principal_Component_Analysis_and_Geographically_Weighted_Regression_Urbanization_in_the_Twin_Cities_Metropolitan_Area_of_Minnesota
    [14] 朱建平, 殷瑞飞. SPSS在统计分析中的应用[D].北京: 清华大学出版社, 2007

    Zhu Jianping, Yin Ruifei. Application of SPSS in Statistical Analysis[D].Beijing: Tsinghua University Press, 2007
    [15] 郑咏梅, 张军, 陈星旦, 等.基于逐步回归法的近红外光谱信息提取及模型的研究[J].光谱学与光谱分析, 2004, 24(6):675-678 doi:  10.3321/j.issn:1000-0593.2004.06.010

    Zheng Yongmei, Zhang Jun, Chen Xingdan, et al.Reasearch on Model and Wavelength Selection of Near Infrared Spectral Information[J].Spectrosc Spect Anal, 2004, 24(6):675-678 doi:  10.3321/j.issn:1000-0593.2004.06.010
    [16] Zhai L, Zou B, Fang X, et al. Land Use Regression Modeling of PM2.5 Concentrations at Optimized Spatial Scales[J]. Atmosphere, 2017, 8(1):1-15 https://www.researchgate.net/publication/311880812_Land_Use_Regression_Modeling_of_PM25_Concentrations_at_Optimized_Spatial_Scales
    [17] Fang X, Zou B, Liu X, et al. Satellite-based Ground PM2.5 Estimation Using Timely Structure Adaptive Modeling[J]. Remote Sensing of Environment, 2016, 186:152-163 doi:  10.1016/j.rse.2016.08.027
    [18] Rodriguez J D, Perez A, Lozano J A. Sensitivity Analysis of Kappa-fold Cross Validation in Prediction Error Estimation[J].IEEE Trans Pattern Anal Mach Intell, 2010, 32(3):569-575 doi:  10.1109/TPAMI.2009.187
    [19] Olvera H A, Garrcia M, Li W W, et al. Principal Component Analysis Optimization of a PM2.5 Land Use Regression Model with Small Monitoring Network[J]. Science of the Environment, 2012, 425(3):27-34 https://www.ncbi.nlm.nih.gov/pubmed/22464030
  • 加载中
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Figures(5)  / Tables(2)

Article Metrics

Article views(1017) PDF downloads(208) Cited by()

Related
Proportional views

An Improved LUR-based Spatial Distribution Simulation for the Large Area PM2.5 Concentration

doi: 10.13203/j.whugis20170042
Funds:

Thematic Monitoring of the First Package of National Geographic Situation Monitoring Project in 2017(2nd Batch) Q1722

the National Natural Science Foundation of China 41701213

Basic Research Funding in CASM 7771716

Open Fund from the Key Laboratory for National Geographic Census and Monitoring, National Administration of Surveying, Mapping and Geoinformation 2016NGCMZD03

the National Natural Science Youth Fund Project of China 41501192

Abstract: There exists the shortage of traditional land use regression (LUR) model in losing information of predictor variables when simulating the air pollutant concentration. An improved model which combined principal component regression (PCR) and stepwise multiple line regression (SMLR)-LUR (PCA+SMLR) was developed to simulate the spatial distribution of PM2.5 in large area. Firstly, the correlation analysis was conducted to screen out effective predictor variables. Secondly, principal component analysis (PCA) was employed to transform effective predictor variables to principle components. Finally, all principal components were used to conduct SMLR to simulate the spatial distribution of PM2.5. Meanwhile, the reliability of the improved model was tested in Beijing-Tianjin-Hebei urban agglomeration. Experimental results of three models (PCR, SMLR and PCA+SMLR) were compared and analyzed. The results indicated that the PCA+SMLR model has an adjusted R2 of 0.883 by improving the contribution of the predictor variables. Besides, it is better than the traditional mo-del for accuracy index and the mapping results. Therefore, it can be concluded that the PCA+SMLR is a promising PM2.5 modeling method and could be very use-ful for air pollution mapping.

LI Shuang, ZHAI Liang, SANG Huiyong, ZHOU Bin, FANG Xin, ZHEN Yunpeng. An Improved LUR-based Spatial Distribution Simulation for the Large Area PM2.5 Concentration[J]. Geomatics and Information Science of Wuhan University, 2018, 43(10): 1574-1579, 1587. doi: 10.13203/j.whugis20170042
Citation: LI Shuang, ZHAI Liang, SANG Huiyong, ZHOU Bin, FANG Xin, ZHEN Yunpeng. An Improved LUR-based Spatial Distribution Simulation for the Large Area PM2.5 Concentration[J]. Geomatics and Information Science of Wuhan University, 2018, 43(10): 1574-1579, 1587. doi: 10.13203/j.whugis20170042
  • 大气细小颗粒物PM2.5(直径小于等于2.5 μm)是大气主要污染物之一,与雾霾天气的发生密切相关[1]。根据环保部发布的《环境空气质量标准》中规定的居民区PM2.5年均浓度不超过35 μɡ/m3来衡量,2017年1月全国PM2.5排行榜中的114个城市仅有8个城市空气质量达标。研究表明,PM2.5会导致心血管和呼吸系统等疾病发病率的增加,严重影响人们的身体健康[2-3]。PM2.5污染已成为严峻的社会问题, 并引起了公众及政府环保部门的广泛关注[4]

    PM2.5浓度模拟可为环保部门治理大气污染提供决策支持,PM2.5浓度精确模拟已成为当前研究热点[5-6]。土地利用回归(land use regression,LUR)[7]模型是大气污染物浓度模拟的主要研究方法之一,该类研究采用与因变量显著相关的预测变量直接进行逐步多元线性回归(stepwise multiple line regression,SMLR)[8-9];或对预测变量进行主成分变换(principal component analysis,PCA),之后挑选特征根大于1或者累计方差贡献率达到80%的前几个主成分进行主成分回归(principal component regression,PCR)[10-12], 建立回归模型。但是,SMLR方法中的预测变量存在一定的共线性问题,并且在逐步回归时直接从模型中剔除了部分与因变量显著相关的预测变量;而PCR方法虽解决了预测变量的共线性问题,但该方法直接采用前几个主成分变量建立回归模型,没有进行主成分变量的筛选。

    针对上述不足,本研究将PCA与SMLR两种方法相结合,首先采用相关分析筛选与PM2.5显著相关的预测变量,然后对筛选出的预测变量进行PCA,最后保留所有主成分变量进行SMLR确立最优建模驱动因子,同时构建回归模型进行PM2.5浓度空间分布模拟。

  • 本文在传统LUR模型的基础之上构建了改进的LUR模型。相比于传统LUR模型在预测变量信息损失方面的缺陷,改进的LUR模型不仅可以消除预测变量的共线性, 从而避免信息冗余,而且可以让所有与PM2.5显著相关的预测变量参与到回归建模构建当中,达到提高预测变量对回归模型贡献度的目的。本文提出的改进LUR模型的核心在于结合PCA与SMLR两种方法建立回归模型,即先利用PCA消除预测变量的共线性,之后利用SMLR将变换后的预测变量逐步引入回归模型之中。

    PCA的基本思想是将原来众多具有一定相关性的变量重新组合成一组相互无关的新变量来代替原来的变量[10, 13]。所选取的新变量被称为主成分变量,选取的原则是尽可能保留原有变量所包含的信息。PCA在数学上的处理是将原来的变量X作线性组合,生成新的综合变量P,模型结构如下[11]

    式中,Pi表示第i个主成分变量; lni表示预测变量Xn的载荷。由于各预测变量Xn的量纲不同,需要先对其进行0~1标准化处理,之后采用PCA方法将预测变量转换为主成分变量, 以消除原预测变量的共线性。不同于以往的PCR方法,本研究不依据特征根或方差贡献率直接选取前几个主成分变量,而是利用SMLR方法对主成分变量进行筛选。

    获取Pi之后,利用SMLR方法建立回归模型。SMLR是传统多元线性回归模型的扩展,其基本思想是在向前引入每一个新的自变量之后都要重新对之前已选入的自变量进行检查,以评价其有无继续保留在方程中的价值[14]。SMLR中自变量是否被引入或剔除取决于其偏回归平方和的F检验或校正决定系数R2(Adjusted R2,Adj_R2),自变量的引入和剔除交替进行,直到无具有统计学意义的新变量可以引入,也无失去统计学意义的自变量可以剔除时为止[15]。SMLR的公式如下:

    式中,Yi表示因变量; Xi表示自变量; βi表示回归系数; ε为模型的随机误差。以Pi作为自变量,利用SPPS 22.0自动实现自变量的引入或剔除,最终建立回归模型。

  • 将PCA与SMLR两种方法相结合构建了一种改进的LUR模型模拟PM2.5浓度,整个研究分为预测变量筛选、回归建模、模型检验、PM2.5年均浓度空间分布模拟制图 4个子过程。首先依据现有研究结果[6, 16-17]提取预测变量,进而根据Pearson相关性系数筛选与PM2.5显著相关的预测变量;然后对筛选出的预测变量进行PCA, 并保留所有Pi进行SMLR建立回归模型;之后统计拟合模型与交叉验证[18]模型下的均方根误差(root mean square error, RMSE)、平均预测误差(mean prediction error, MPE)、平均相对预测误差(mean relative prediction error, MRPE)3个指标[17]来检验模型性能;最后在研究区内建立10 km×10 km的加密点并采用普通克里金插值方法进行整个京津冀地区的PM2.5年均浓度空间分布模拟制图。技术路线如图 1所示。

    Figure 1.  The Technique Flow Chart

  • 京津冀地区东临渤海, 西为太行山地, 北为燕山山地,地势西北高东南低,面积约21.6万km2(见图 2)。该地区经济发展迅速,加之三面环山的地形条件,使其成为国内大气污染最严重的地区之一。

    Figure 2.  Sketch Map of the Study Area

    本研究采用的数据可分为5大部分:PM2.5监测站点实时浓度数据、气溶胶光学厚度(aerosol optical depth,AOD)数据、气象要素数据、地理要素数据、污染源要素数据。PM2.5监测站实时浓度数据来自中国环境监测总站城市空气质量实时发布平台;AOD数据采用从美国航空航天宇航局数据中心网站下载的MOD04_L2大气气溶胶数据产品;气象要素数据包括风速、气压、温度、降水、湿度,皆来源于中国地面气候资料日值数据集;地理要素数据包括DEM、道路数据和地表覆盖数据;污染源要素数据包括采用高分辨率遥感影像或航空正射影像获取的扬尘地表污染源数据和从企业法人数据库整理得到的工业企业污染源数据。

  • 本文对PCR、SMLR及PCA+SMLR这3种方法的实验结果进行了对比分析。

  • 对于传统的PCR方法,回归模型拟合优度与主成分变量个数之间的关系如图 3所示。从图 3可以看出:主成分变量个数达到8个以后, 回归模型的拟合优度趋于平稳,当所有主成分变量全部进入回归模型时, 其拟合优度最高,达到0.880。但研究表明[19],回归模型中预测变量个数过多会导致模型的过拟合问题,当因变量与自变量之比为10~15时模型较为合理。本文共78个PM2.5监测站点,选取5~7个变量作为建模回归因子为宜。因此,本文选取特征根大于1的6个主成分变量构建回归模型PCR1,同时为了验证过拟合问题,构建含有17个主成分变量的回归模型PCR2,并构建了SMLR及PCA+SMLR模型。

    Figure 3.  The Line Chart of Goodness of Fit for Regression Model

    上述4种模型的参数及拟合度如表 1所示。从表 1可知:SMLR模型只保留了5个与PM2.5相关的预测变量,其余12个与PM2.5强相关的预测变量对模型无贡献;PCR1与PCR2模型的建模驱动因子为主成分变量,因此17个与PM2.5强相关的预测变量都对回归模型有所贡献,但PCR1模型的拟合度较差,而PCR2模型的变量太多,可能导致模型的过拟合问题;相比较而言,PCA+SMLR模型通过SMLR逐步引入或剔除主成分变量,其调整后的R2为0.883,较PCR模型(0.793/0.880)和SMLR模型(0.832)有明显提升。

    模型参数Adj_R2
    PCR1P1P2P3P4P5P60.793
    PCR2P1P2P3P4P5P6P7P8P9
    P10P11P12P13P14P15P16P17
    0.880
    SMLRX1X2X3X4X50.832
    PCA+SMLRP1P2P4P5P8P170.883
    注:Pi为第i个主成分;X1为气溶胶光学厚度;X2为降水;X3为监测站8 000 m缓冲区内耕地面积占比;X4为监测站8 000 m缓冲区内房屋建筑面积占比;X5为监测站5 000 m缓冲区内的露天采掘场面积占比。

    Table 1.  Comparison of Parameterization and Model Fitting for Four Models

  • 图 4展示了4种回归模型拟合结果与实测结果的散点图,表 2直观地对比了4种模型的精度检验指标,其结果均在合理范围内。就模型的拟合精度而言,PCR2模型的拟合结果最好,并且其RMSE、MPE、MRPE均优于其他3种模型;PCA+SMLR模型的拟合精度次之,并且与PCR2模型的精度相差不大。但就模型的交叉验证精度而言,PCA+SMLR模型的验证精度最优,并且相比拟合精度来说浮动很小,证明了该模型的可靠性与稳定性;PCR1与SMLR模型的验证精度与拟合精度结果也较为接近;相反,PCR2模型的验证精度浮动相对较大,并且MRPE精度在4个模型里最差,表明该模型存在过拟合问题。

    Figure 4.  Scatter Plots of Fitting and Measured Results

    模型拟合精度验证精度
    RMSE
    /μg·m-3
    MPE
    /μg·m-3
    MRPE
    /%
    RMSE
    /μg·m-3
    MPE
    /μg·m-3
    MRPE
    /%
    PCR18.9426.8699.92010.3947.1029.300
    PCR26.2485.0006.9838.7806.95010.266
    SMLR8.1046.2258.6289.3036.6288.509
    PCA+SMLR6.7215.2787.3897.3915.9128.419

    Table 2.  Comparison of Accuracy Indicators for Four Models

  • 图 5为基于4种模型的PM2.5年均浓度模拟空间分布图。从图 5可以看出:虽然4种模型的PM2.5浓度均呈现由东南至西北区域递减的趋势,但SMLR模型的模拟效果较差(见图 5(c)),北京、唐山等城市PM2.5浓度整体偏低,沧州、天津等城市中心PM2.5浓度低,与实际情况完全相反。其他3种模型的模拟效果相近,均以太行山—燕山山脉为界限,东南地区浓度高,西北地区浓度低。相比PCR1模型(见图 5(a)),PCR2(见图 5(b))与PCA+SMLR模型(见图 5(d))中城市中心至城市边界PM2.5浓度逐渐降低的变化趋势更加明显。此外,张家口中心城区PM2.5浓度较高,与之前研究中张家口PM2.5浓度较低的结论相反,这主要是由于张家口地区筹备2022年冬奥会而产生的影响。

    Figure 5.  Spatial Distribution of PM2.5 Annual Concentrations Estimated

  • 从上述实验结果可以看出,传统的SMLR模型预测变量的贡献度较低且PM2.5浓度模拟结果相对较差;PCR1模型的拟合精度相对较差;PCR2模型采用主成分变量个数过多导致模型过拟合;本文提出的改进的LUR(PCA+SMLR)模型在模型精度及PM2.5浓度模拟上都取得了较好的结果。此外,构建好PCA+SMLR模型之后,可以通过主成分逆变换确定PM2.5浓度与17个原始强相关特征变量之间的相关关系,进而确定研究区内的PM2.5浓度主要受到哪些变量的影响。逆变换结果表明,本研究区内的气温、气压、降水等气象要素对PM2.5浓度影响较大,高程、污染企业、道路次之,各类地表覆盖等短期无明显变化的地理要素对PM2.5浓度的影响较小。

  • 本研究将PCA与SMLR方法相结合,建立改进的LUR模型以实现PM2.5年均浓度模拟空间分布制图。实例分析表明:PCA+SMLR模型不仅解决了预测变量的共线性问题,而且弥补了传统LUR模型在预测变量信息损失方面的缺陷,其拟合度、精度检验指标及浓度模拟效果皆优于传统LUR模型。此外,通过本研究得到了京津冀地区PM2.5浓度的空间分布规律,为PM2.5区域联防联控提供了有力的信息支撑。

    然而,本文仅选取本地区的污染源作为预测变量,未考虑外来污染的迁移因素,后续研究可综合考虑本地区污染源及输入性污染源,从而对PM2.5浓度模拟进行更加深入的探讨。

Reference (19)

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return