Population Spatialization by Considering Pixel‐Level Attribute Grading and Spatial Association
-
摘要: 现有人口空间化方法多基于行政单元构建回归模型并分配格网单元人口,但分析单元的尺度差异引发模型迁移问题。同时,格网特征建模仅考虑格网自身属性,导致格网间空间关联被人为割裂。为此,基于随机森林模型提出一种顾及格网属性分级与空间关联的人口空间化方法。该方法在格网特征建模中:(1)基于自然断点法构造建筑区类别约束的夜间灯光分级特征,并在行政单元尺度统计各等级网格占比作为训练输入,以减小模型跨尺度误差;(2)利用核密度估计刻画邻域兴趣点(point of interest, POI)对当前格网人口分布的影响及距离衰减效应;(3)基于叠置分析统计不同类型建筑区轮廓包含的各类POI数量,提升特征建模精细度。选取武汉市作为实验区域,在街道尺度与WorldPop、GPW及中国公里网格人口数据集进行对比验证方法的有效性。结果表明,该方法的平均绝对值误差仅为对比数据集的1/6~1/3。此外,还探讨了特征构成、格网大小及核密度带宽对精度的影响。Abstract:Objectives Existing population spatialization methods mainly use administrative-unit-level data to train regression model, and transfer it to grid cell-level to achieve population allocation. However, the significant scale difference between the analytical units in training and estimation leads to the issues of cross-scale model transfer. Meanwhile, only the attributes of current cell are considered in cell-level feature modeling, which causes the innate spatial association between cells to be eliminated and cells to be isolated.Methods This paper proposes a novel population spatialization based on random forest by considering pixel-level attribute grading and spatial association (PAG-SA). In the cell-level feature modeling, we firstly construct the night light grading features embedded with building category constraints based on natural breaks, and count the grid proportion of each grading level at the administrative-unit-level as the training input to reduce the cross scale error; secondly, the influence and distance attenuation of neighborhood point of interests (POIs) upon the current cell is modelled by using kernel density estimation; thirdly, based on overlay analysis, the numbers of POIs in the contours of different building types are counted to improve the precision of feature modeling.Results To verify the effectiveness of the proposed method, we selected Wuhan city as the experimental area and compared its spatialization accuracy with the datasets of WorldPop, GPW and PopulationGrid_China at street scale. The results show that the mean absolute error of PAG‐SA is only 1/6-1/3 of the comparison datasets. In addition, the influence of feature composition, grid size and kernel density bandwidth on the accuracy is also discussed.Conclusions By fusing multi‐source data and considering pixel‐level attribute grading and spatial association, the proposed method PAG‐SA is effective for achieving population spatialization in urban areas with finer grid sizes and higher accuracy. It can also provide references for spatialization applications of other geographic attributes that also face with scale mismatch issue in spatial regression modeling.
-
-
表 1 所选用的研究数据
Table 1 Dataset Used in This Study
数据类型 数据来源 年份 格式 描述 夜间灯光 美国国家环境中心 2015 栅格 NPP/VIIRS全年月份数据合成夜间灯光影像,分辨率约为500 m 地理国情普查建筑区 武汉市测绘研究院 2015 矢量 基于分辨率低于1 m的多源航空航天遥感影像数据,使用的建筑区类型包括高密度多层及以上房屋、低密度多层及以上房屋、高密度低矮房屋、低密度低矮房屋 POI 高德软件有限公司 2017 矢量 8类POI包括休闲娱乐、住宿、医院、居民小区、科研教育、购物、金融服务及餐饮 武汉市行政区划 武汉市测绘研究院 2015 矢量 包括武汉市区县、街道级别的轮廓数据及对应的常住人口信息 表 2 各类POI相对最优核密度带宽区间及本文选用带宽
Table 2 Relative Optimal Bandwidth Ranges and the Selected Bandwidths for Different POI Types
POI类型 带宽区间/km MAE RMSE R2 选用带宽/km 医院 3.0~4.0 7 994 12 515 0.937 05 4 科研教育 0.2~5.0 7 840 12 055 0.941 59 5 住宿 0.2~5.0 7 928 12 349 0.938 70 5 金融服务 0.2~5.0 7 915 12 179 0.940 38 5 休闲娱乐 3.0 7 973 12 280 0.939 39 3 餐饮 1.0~2.0 7 905 12 330 0.938 89 2 居民小区 0.9~1.0 7 933 12 343 0.938 76 1 购物 2.0 7 961 12 434 0.937 84 2 -
[1] 胡云锋, 王倩倩, 刘越, 等. 国家尺度社会经济数据格网化原理和方法[J]. 地球信息科学学报, 2011, 13(5): 573-578 https://www.cnki.com.cn/Article/CJFDTOTAL-DQXX201105000.htm Hu Yunfeng, Wang Qianqian, Liu Yue, et al. Index System and Transferring Methods to Build the National Society and Economy Grid Database[J]. Journal of Geo-Information Science, 2011, 13(5): 573-578 https://www.cnki.com.cn/Article/CJFDTOTAL-DQXX201105000.htm
[2] 柏中强, 王卷乐, 杨飞. 人口数据空间化研究综述[J]. 地理科学进展, 2013, 32(11): 1692-1702 doi: 10.11820/dlkxjz.2013.11.012 Bai Zhongqiang, Wang Juanle, Yang Fei. Research Progress in Spatialization of Population Data[J]. Progress in Geography, 2013, 32(11): 1692-1702 doi: 10.11820/dlkxjz.2013.11.012
[3] Wu S S, Qiu X M, Wang L. Population Estimation Methods in GIS and Remote Sensing: A Review [J]. GIScience & Remote Sensing, 2005, 42(1): 80-96
[4] Flowerdew R, Green M. Developments in Areal Interpolation Methods and GIS[J]. The Annals of Regional Science, 1992, 26(1): 67-78 doi: 10.1007/BF01581481
[5] Goodchild M F, Anselin L, Deichmann U. A Framework for the Areal Interpolation of Socioeconomic Data[J]. Environment and Planning A: Economy and Space, 1993, 25(3): 383-397 doi: 10.1068/a250383
[6] 吕安民, 李成名, 林宗坚, 等. 人口统计数据的空间分布化研究[J]. 武汉大学学报·信息科学版, 2002, 27(3): 301-305 http://ch.whu.edu.cn/article/id/4962 Lü Anmin, Li Chengming, Lin Zongjian, et al. Spatial Distribution of Statistical Population Data[J]. Geomatics and Information Science of Wuhan University, 2002, 27(3): 301-305 http://ch.whu.edu.cn/article/id/4962
[7] 闫庆武, 卞正富, 张萍, 等. 基于居民点密度的人口密度空间化[J]. 地理与地理信息科学, 2011, 27(5): 95-98 https://www.cnki.com.cn/Article/CJFDTOTAL-DLGT201105022.htm Yan Qingwu, Bian Zhengfu, Zhang Ping, et al. Census Spatialization Based on Settlements Density [J]. Geography and Geo-Information Science, 2011, 27(5): 95-98 https://www.cnki.com.cn/Article/CJFDTOTAL-DLGT201105022.htm
[8] Mennis J. Generating Surface Models of Population Using Dasymetric Mapping[J]. The Professional Geographer, 2008, 55(1): 31-42
[9] Su M D, Lin M C, Hsieh H I, et al. Multi-layer Multi-class Dasymetric Mapping to Estimate Population Distribution[J]. Science of the Total Environment, 2010, 408(20): 4807-4816 doi: 10.1016/j.scitotenv.2010.06.032
[10] 符海月, 李满春, 赵军, 等. 人口数据格网化模型研究进展综述[J]. 人文地理, 2006, 21(3): 115-119 https://www.cnki.com.cn/Article/CJFDTOTAL-RWDL200603024.htm Fu Haiyue, Li Manchun, Zhao Jun, et al. Summary of Grid Transformation Models of Population Data [J]. Human Geography, 2006, 21(3): 115-119 https://www.cnki.com.cn/Article/CJFDTOTAL-RWDL200603024.htm
[11] 董南, 杨小唤, 蔡红艳. 人口数据空间化研究进展[J]. 地球信息科学学报, 2016, 18(10): 1295-1304 https://www.cnki.com.cn/Article/CJFDTOTAL-DQXX201610002.htm Dong Nan, Yang Xiaohuan, Cai Hongyan. Research Progress and Perspective on the Spatialization of Population Data[J]. Journal of Geo-Information Science, 2016, 18(10): 1295-1304 https://www.cnki.com.cn/Article/CJFDTOTAL-DQXX201610002.htm
[12] Zeng C Q, Zhou Y, Wang S X, et al. Population Spatialization in China Based on Night-Time Imagery and Land Use Data[J]. International Journal of Remote Sensing, 2011, 32(24): 9599-9620 doi: 10.1080/01431161.2011.569581
[13] 方匡南, 吴见彬, 朱建平, 等. 随机森林方法研究综述[J]. 统计与信息论坛, 2011, 26(3): 32-38 doi: 10.3969/j.issn.1007-3116.2011.03.006 Fang Kuangnan, Wu Jianbin, Zhu Jianping, et al. A Review of Technologies on Random Forests[J]. Statistics and Information Forum, 2011, 26(3): 3238 doi: 10.3969/j.issn.1007-3116.2011.03.006
[14] Yang X C, Ye T T, Zhao N Z, et al. Population Mapping with Multisensor Remote Sensing Images and Point-of-Interest Data[J]. Remote Sensing, 2019, 11(5): 574 doi: 10.3390/rs11050574
[15] 刘正廉, 桂志鹏, 吴华意, 等. 融合建筑物与兴趣点数据的精细人口空间化研究[J]. 测绘地理信息, 2021, 46(5): 102-106 Liu Zhenglian, Gui Zhipeng, Wu Huayi, et al. Fine-Scale Population Spatialization by Synthesizing Building Survey Data and Point of Interest Data[J]. Journal of Geomatics, 2021, 46(5): 102-106
[16] Ye T T, Zhao N Z, Yang X C, et al. Improved Population Mapping for China Using Remotely Sensed and Points-of-Interest Data Within a Random Forests Model[J]. Science of the Total Environment, 2019, 658: 936-946 doi: 10.1016/j.scitotenv.2018.12.276
[17] Sinha P, Gaughan A E, Stevens F R, et al. Assessing the Spatial Sensitivity of a Random Forest Model: Application in Gridded Population Modeling[J]. Computers, Environment and Urban Systems, 2019, 75: 132-145 doi: 10.1016/j.compenvurbsys.2019.01.006
[18] Robinson C, Hohman F, Dilkina B. A Deep Learning Approach for Population Estimation from Satellite Imagery[C]/ The 1st ACM SIGSPATIAL Workshop on Geospatial Humanities, Redondo Beach, USA, 2017
[19] Chen J, Pei T, Shaw S L, et al. Fine-Grained Prediction of Urban Population Using Mobile Phone Location Data[J]. International Journal of Geographical Information Science, 2018, 32(9): 1770-1786 doi: 10.1080/13658816.2018.1460753
[20] Zhao S, Liu Y X, Zhang R, et al. China 's Population Spatialization Based on Three Machine Learning Models[J]. Journal of Cleaner Production, 2020, 256: 120644 doi: 10.1016/j.jclepro.2020.120644
[21] Leyk S, Gaughan A E, Adamo S B, et al. The Spatial Allocation of Population: A Review of LargeScale Gridded Population Data Products and Their Fitness for Use[J]. Earth System Science Data, 2019, 11(3): 1385-1409
[22] 禹文豪, 艾廷华, 杨敏, 等. 利用核密度与空间自相关进行城市设施兴趣点分布热点探测[J]. 武汉大学学报·信息科学版, 2016, 41(2): 221-227 doi: 10.13203/j.whugis20140092 Yu Wenhao, Ai Tinghua, Yang Min, et al. Detecting "Hot Spots"of Facility POIs Based on Kernel Density Estimation and Spatial Autocorrelation Technique [J]. Geomatics and Information Science of Wuhan University, 2016, 41(2): 221-227 doi: 10.13203/j.whugis20140092
[23] 杨喜平, 方志祥, 赵志远, 等. 顾及手机基站分布的核密度估计城市人群时空停留分布[J]. 武汉大学学报·信息科学版, 2017, 42(1): 49-55 doi: 10.13203/j.whugis20150646 Yang Xiping, Fang Zhixiang, Zhao Zhiyuan, et al. Analyzing Space-Time Variation of Urban Human Stay Using Kernel Density Estimation by Considering Spatial Distribution of Mobile Phone Towers [J]. Geomatics and Information Science of Wuhan University, 2017, 42(1): 49-55 doi: 10.13203/j.whugis20150646
[24] 陈晴, 侯西勇. 集成土地利用数据和夜间灯光数据优化人口空间化模型[J]. 地球信息科学学报, 2015, 17 (11): 1370-1377 https://www.cnki.com.cn/Article/CJFDTOTAL-DQXX201511014.htm Chen Qing, Hou Xiyong. An Improved Population Spatialization Model by Combining Land Use Data and DMSP/OLS Data[J]. Journal of Geo-Information Science, 2015, 17(11): 1370-1377 https://www.cnki.com.cn/Article/CJFDTOTAL-DQXX201511014.htm
[25] Yu B L, Lian T, Huang Y X, et al. Integration of Nighttime Light Remote Sensing Images and Taxi GPS Tracking Data for Population Surface Enhancement[J]. International Journal of Geographical Information Science, 2019, 33(4): 687-706
[26] Langford M. Obtaining Population Estimates in Noncensus Reporting Zones: An Evaluation of the 3Class Dasymetric Method[J]. Computers, Environment and Urban Systems, 2006, 30(2): 161-180
[27] 郭雨臣, 黄金川, 林浩曦. 多源数据融合的中国人口数据空间化研究[J]. 遥感技术与应用, 2020, 35(1): 219-232 https://www.cnki.com.cn/Article/CJFDTOTAL-YGJS202001022.htm Guo Yuchen, Huang Jinchuan, Lin Haoxi. Spatialization of China's Population Data Based on Multisource Data[J]. Remote Sensing Technology and Application, 2020, 35(1): 219-232 https://www.cnki.com.cn/Article/CJFDTOTAL-YGJS202001022.htm
[28] Chainey S. Examining the Influence of Cell Size and Bandwidth Size on Kernel Density Estimation Crime Hotspot Maps for Predicting Spatial Patterns of Crime[J]. Bulletin of the Geographical Society of Liege, 2013, 60(1): 7-19
[29] Lin Y P, Chu H J, Wu C F, et al. Hotspot Analysis of Spatial Environmental Pollutants Using Kernel Density Estimation and Geostatistical Techniques [J]. International Journal of Environmental Research and Public Health, 2011, 8(1): 75-88
[30] 杜国明, 张树文, 张有全. 城市人口分布的空间自相关分析: 以沈阳市为例[J]. 地理研究, 2007, 26(2): 383-390 https://www.cnki.com.cn/Article/CJFDTOTAL-DLYJ200702019.htm Du Guoming, Zhang Shuwen, Zhang Youquan. Analyzing Spatial Auto - correlation of Population Distribution: A Case of Shenyang City[J]. Geographical Research, 2007, 26(2): 383-390 https://www.cnki.com.cn/Article/CJFDTOTAL-DLYJ200702019.htm
[31] Yuan K, Cheng X Q, Gui Z P, et al. A Quad-TreeBased Fast and Adaptive Kernel Density Estimation Algorithm for Heat - Map Generation[J]. International Journal of Geographical Information Science, 2019, 33(12): 2455-2476
-
期刊类型引用(3)
1. 周永章,陈川,张旗,王功文,肖凡,沈文杰,卞静,王亚,杨威,焦守涛,刘艳鹏,韩枫. 地质大数据分析的若干工具与应用. 大地构造与成矿学. 2020(02): 173-182 . 百度学术
2. 王叶晨梓,杜震洪,张丰,刘仁义. 面向分片地图的多分辨率格点数据统一存取方法. 浙江大学学报(理学版). 2017(05): 584-590 . 百度学术
3. 朱建章,石强,陈凤娥,史晓丹,董泽民,秦前清. 遥感大数据研究现状与发展趋势. 中国图象图形学报. 2016(11): 1425-1439 . 百度学术
其他类型引用(6)