Citation: | LIN Sen, LIU Beibei, LI Jianwen, LIU Xu, QIN Kun, GUO Guizhen. Social Media Information Classification of Earthquake Disasters Based on BERT Transfer Learning Model[J]. Geomatics and Information Science of Wuhan University, 2024, 49(9): 1661-1671. DOI: 10.13203/j.whugis20220167 |
With the rapid development of the Internet, social media has become an important information source of emergency events. However, there are a lot of duplication, errors and even malicious contents in social media, which need to be effectively classified to provide more accurate information for disaster emergency response.
Deep learning has greatly improved the accuracy and efficiency of text classification. This paper takes earthquake disaster as an example, and builds a multi-label classification model based on bidirectional encoder representation from transformers (BERT) transfer learning. Over 50 000 posts about 5 earthquakes are collected as training samples from SINA Weibo, which is a very popular social media in China. Each sample is manually marked as one or more labels, such as hazards information, loss information, rescue information, public opinion information and useless information.
By fine-tune training, the classification accuracies of the proposed model on training dataset and test dataset reach 97% and 92%, respectively. The area under curve score of each label ranges from 0.952 to 0.998.
The results prove that the multi-label classification using BERT transfer learning is of high reliability. The proposed model can be applied to the emergency management services for earthquake events, which is beneficial for the rapid disaster rescue and relief.
[1] |
白华, 林勋国. 基于中文短文本分类的社交媒体灾害事件检测系统研究[J]. 灾害学, 2016, 31(2): 19-23.
Bai Hua, Lin Xunguo. Sina Weibo Disaster Information Detection Based on Chinese Short Text Classification[J]. Journal of Catastrophology, 2016, 31(2): 19-23.
|
[2] |
Velev D, Zlateva P. Use of Social Media in Natural Disaster Management[C]// International Economics Development and Research Center(IEDRC), Hong Kong, China, 2012.
|
[3] |
薄涛. 基于社交媒体的地震灾情数据挖掘与烈度快速评估应用[D]. 哈尔滨: 中国地震局工程力学研究所, 2018.
Bo Tao. Application of Social Media-Based Earthquake Disaster Data Mining and Rapid Intensity Assessment[D]. Harbin: Institute of Engineering Mechanics, China Earthquake Administration, 2018.
|
[4] |
单杰, 秦昆, 黄长青, 等. 众源地理数据处理与分析方法探讨[J]. 武汉大学学报(信息科学版), 2014, 39(4): 390-396.
Shan Jie, Qin Kun, Huang Changqing, et al. Methods of Crowd Sourcing Geographic Data Processing and Analysis[J]. Geomatics and Information Science of Wuhan University, 2014, 39(4): 390-396.
|
[5] |
Goodchild M F. Citizens as Sensors: The World of Volunteered Geography[J]. GeoJournal, 2007, 69(4): 211-221.
|
[6] |
Schade S, Díaz L, Ostermann F, et al. Citizen-Based Sensing of Crisis Events: Sensor Web Enablement for Volunteered Geographic Information[J]. Applied Geomatics, 2013, 5(1): 3-18.
|
[7] |
沈伟豪,钟燕飞,王俊珏,等.多模态数据的洪涝灾害知识图谱构建与应用[J]. 武汉大学学报(信息科学版),2023, 48(12): 2009-2018.
Shen Weihao, Zhong Yanfei, Wang Junjue, et al. Construction and Application of Flood Disaster Knowledge Graph Based on Multi-modal Data[J]. Geomatics and Information Science of Wuhan University, 2023, 48(12): 2009-2018.
|
[8] |
蔡梅竹. 突发自然灾害事件网络舆论特征研究[D]. 武汉: 华中科技大学, 2012.
Cai Meizhu. Research on the Characteristics of Network Public Opinion of Sudden Natural Disasters[D]. Wuhan: Huazhong University of Science and Technology, 2012.
|
[9] |
宋建功, 王之欣, 李勤勇, 等. 面向地震应急响应的互联网信息处理[J]. 北京航空航天大学学报, 2017, 43(6): 1155-1164.
Song Jiangong, Wang Zhixin, Li Qinyong, et al. Internet Information Processing for Earthquake Emergency Response[J]. Journal of Beijing University of Aeronautics and Astronautics, 2017, 43(6): 1155-1164.
|
[10] |
Goodchild M F, Glennon J A. Crowd Sourcing Geographic Information for Disaster Response: A Research Frontier[J]. International Journal of Digital Earth, 2010, 3(3): 231-241.
|
[11] |
帅向华, 胡素平, 刘钦, 等. 地震灾情网络媒体获取与处理模型[J]. 自然灾害学报, 2013, 22(3): 178-184.
Xianghua Shuai, Hu Suping, Liu Qin, et al. Internet Media-Based Acquisition and Processing Model of Earthquake Disaster Situation[J]. Journal of Natural Disasters, 2013, 22(3): 178-184.
|
[12] |
Heinzelman J, Waters C. Crowdsourcing Crisis Information in Disaster[R]. New York, USA: United States Institute of Peace, 2010.
|
[13] |
陈科帆, 余伟. 地方政府应急管理信息化建设思路研究[J]. 通信与信息技术, 2020(3): 69-72.
Chen Kefan, Yu Wei. Research on the Ideas of Local Government Emergency Management Information Construction [J]. Communication & Information Technology, 2020(3): 69-72.
|
[14] |
Zhang C, Fan C, Yao W L, et al. Social Media for Intelligent Public Information and Warning in Disasters: An Interdisciplinary Review[J]. International Journal of Information Management, 2019, 49: 190-207.
|
[15] |
Qu Y, Huang C, Zhang P Y, et al. Microblogging After a Major Disaster in China: A Case Study of the 2010 Yushu Earthquake[C]//The ACM Conference on Computer Supported Cooperative Work, Hangzhou, China, 2011.
|
[16] |
Imran M, Elbassuoni S, Castillo C, et al. Practical Extraction of Disaster-Relevant Information from Social Media[C]//The 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil, 2013.
|
[17] |
Takahashi B, Tandoc E C, Carmichael C. Communicating on Twitter During a Disaster: An Analysis of Tweets During Typhoon Haiyan in the Philippines[J]. Computers in Human Behavior, 2015, 50: 392-398.
|
[18] |
王艳东, 李萌萌, 付小康, 等. 基于社交媒体共词网络的灾情发展态势探测方法[J]. 武汉大学学报(信息科学版), 2020, 45(5): 691-698.
Wang Yandong, Li Mengmeng, Fu Xiaokang, et al. A New Method to Detect the Development Situation of Disasters Based on Social Media Co-word Network[J]. Geomatics and Information Science of Wuhan University, 2020, 45(5): 691-698.
|
[19] |
Chakrabarti S, Roy S, Soundalgekar M V. Fast and Accurate Text Classification via Multiple Linear Discriminant Projections[J]. The VLDB Journal, 2003, 12(2): 170-185.
|
[20] |
贺捷. 随机森林在文本分类中的应用[D]. 广州: 华南理工大学, 2015.
He Jie. Application of Random Forest in Text Classification[D]. Guangzhou: South China University of Technology, 2015.
|
[21] |
平源. 基于支持向量机的聚类及文本分类研究[D]. 北京: 北京邮电大学, 2012.
Ping Yuan. Research on Clustering and Text Classification Based on Support Vector Machine[D]. Beijing: Beijing University of Posts and Telecommunications, 2012.
|
[22] |
余同瑞, 金冉, 韩晓臻, 等. 自然语言处理预训练模型的研究综述[J]. 计算机工程与应用, 2020, 56(23): 12-22.
Yu Tongrui, Jin Ran, Han Xiaozhen, et al. Review of Pre-training Models for Natural Language Processing[J]. Computer Engineering and Applications, 2020, 56(23): 12-22.
|
[23] |
Nguyen D T, Joty S, Imran M, et al. Applications of Online Deep Learning for Crisis Response Using Social Media Information[EB/OL]. [2016-01-30] http://arxiv.org/abs/1610.01030.
|
[24] |
刘淑涵, 王艳东, 付小康. 利用卷积神经网络提取微博中的暴雨灾害信息[J]. 地球信息科学学报, 2019, 21(7): 1009-1017.
Liu Shuhan, Wang Yandong, Fu Xiaokang. Extracting Rainstorm Disaster Information from Microblogs Using Convolutional Neural Network[J]. Journal of Geo⁃Information Science, 2019, 21(7): 1009-1017.
|
[25] |
杨腾飞, 解吉波, 闫东川, 等. 基于深度学习的社交媒体情感信息抽取及其在灾情分析中的应用研究[J]. 地理与地理信息科学, 2020, 36(2): 62-68.
Yang Tengfei, Xie Jibo, Yan Dongchuan, et al. Extracting Sentiment Information from Social Media Based on Deep Learning and the Research on Disaster Reduction[J]. Geography and Geo⁃Information Science, 2020, 36(2): 62-68.
|
[26] |
李舟军, 范宇, 吴贤杰. 面向自然语言处理的预训练技术研究综述[J]. 计算机科学, 2020, 47(3): 162-173.
Li Zhoujun, Fan Yu, Wu Xianjie. Survey of Natural Language Processing Pre-training Techniques[J]. Computer Science, 2020, 47(3): 162-173.
|
[27] |
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[EB/OL]. [2018-04-08] http://arxiv.org/abs/1810.04805.
|
[28] |
Peters M E, Neumann M, Iyyer M, et al. Deep Contextualized Word Representations[EB/OL]. [2018-05-16] http://arxiv.org/abs/1802.05365.
|
[29] |
Radford A, Narasimhan K, Salimans T, et al. Improving Language Understanding by Generative Pre-training[EB/OL]. [2022-01-05] https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.
|
[30] |
Heidari M, Jones J H. Using BERT to Extract Topic-Independent Sentiment Features for Social Media Bot Detection[C]//The 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, USA, 2020.
|
[31] |
Mozafari M, Farahbakhsh R, Crespi N. A BERT-Based Transfer Learning Approach for Hate Speech Detection in Online Social Media[EB/OL]. [2019-12-14] http://arxiv.org/abs/1910.12574.
|
[32] |
Jain P, Ross R, Schoen-Phelan B. Estimating Distributed Representation Performance in Disaster-Related Social Media Classification[C]//IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Vancouver, Canada, 2019.
|
[33] |
Zahera H M. Fine-tuned BERT Model for Multi-label Tweets Classification[C]//Text REtrieval Conference (TREC) , Paris, France, 2019.
|
[34] |
Liang C, Yu Y, Jiang H M, et al. BOND: BERT-Assisted Open-Domain Named Entity Recognition with Distant Supervision[C]//The 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, USA, 2020.
|
[35] |
Souza F, Nogueira R, Lotufo R. Portuguese Named Entity Recognition Using BERT-CRF[EB/OL]. [2019-06-09] http://arxiv.org/abs/1909.10649.
|
[36] |
Wang Z G, Ng P, Ma X F, et al. Multi-passage BERT: A Globally Normalized BERT Model for Open-Domain Question Answering[EB/OL]. [2019-08-17] http://arxiv.org/abs/1908.08167.
|
[37] |
Yang W, Xie Y Q, Lin A, et al. End-to-End Open-Domain Question Answering with BERT Serini[EB/OL]. [2019-02-01] http://arxiv.org/abs/1902.01718.
|
[38] |
谌志群, 鞠婷. 基于BERT和双向LSTM的微博评论倾向性分析研究[J]. 情报理论与实践, 2020, 43(8): 173-177.
Chen Zhiqun, Ju Ting. Research on Tendency Analysis of Microblog Comments Based on BERT and BLSTM[J]. Information Studies: Theory & Application, 2020, 43(8): 173-177.
|
[39] |
Tsoumakas G, Katakis I, Vlahavas I. Mining Multi-label Data[M]// Data Mining and Knowledge Discovery Handbook. Boston, USA: Springer, 2009: 667-685.
|
[40] |
李纲, 海岚, 陈璟浩. 突发自然灾害事件网络媒体报道的周期特征分析: 以地震和台风灾害为例[J]. 信息资源管理学报, 2015, 5(3): 18-24.
Li Gang, Lan Hai, Chen Jinghao. A Survival Analysis of Periodic Characteristics of China’s Emergent Natural Disaster Event Reported by Network Media: A Case Study on Earthquake and Typhoon Disaster[J]. Journal of Information Resources Management, 2015, 5(3): 18-24.
|
[1] | DENG Bo, ZHANG Hui, BAI Jun, DONG Xiujun, JIN Dianqi, JIN Songyan, ZHANG Shaobiao. Hazard Evaluation of the Slope Based on Airborne LiDAR Data in Shenzhen, China[J]. Geomatics and Information Science of Wuhan University, 2024, 49(8): 1377-1391. DOI: 10.13203/j.whugis20220141 |
[2] | XU Caijun, HE Kefeng. Advancements in Earthquake Cycle Deformation Research Based on Geodetic Observations[J]. Geomatics and Information Science of Wuhan University, 2023, 48(11): 1736-1755. DOI: 10.13203/j.whugis20230304 |
[3] | XU Caijun, XIONG Wei, LIU Chuanjin. Progress in Studying of 3D Crustal Deformation and Seismic Risk Assessment of the Tibetan Plateau Using Geodetic Observations[J]. Geomatics and Information Science of Wuhan University, 2023, 48(7): 997-1009. DOI: 10.13203/j.whugis20230126 |
[4] | XU Caijun, WANG Xiaohang, WEN Yangmao, LI Wei. Progress and Prospects of Seismic Geodetic Determination of Asperities[J]. Geomatics and Information Science of Wuhan University, 2022, 47(10): 1701-1712. DOI: 10.13203/j.whugis20220446 |
[5] | WANG Jiapei, ZHANG Xinlin, ZHANG Yi, LI Zhongya, HU Minzhang, SHEN Chongyang. Analysis of Gravity Variation and Vertical Crustal Deformation at Wuhan Jiufeng Seismic Station[J]. Geomatics and Information Science of Wuhan University, 2022, 47(6): 964-971. DOI: 10.13203/j.whugis20220157 |
[6] | JIANG Ying, LIU Ziwei, ZHANG Xiaotong, ZHANG Lina, WEI Jin. Variation Features of b-Value Before and After the 2021 Maduo Mw 7.4 Earthquake[J]. Geomatics and Information Science of Wuhan University, 2022, 47(6): 907-915. DOI: 10.13203/j.whugis20220071 |
[7] | LI Chong, LI Jiancheng, HUANG Ruijin, TAN Li. Discussion of Crustal Flow Beneath the Eastern Tibetan Plateau and Mechanism of the Wenchuan Earthquake[J]. Geomatics and Information Science of Wuhan University, 2015, 40(6): 810-815. DOI: 10.13203/j.whugis20130655 |
[8] | Yin Myo Min Htwe, SHEN Wenbin, SUN Rong. Seismic Hazard Assessment in Yangon(Burma) and Its Surrounding Areas[J]. Geomatics and Information Science of Wuhan University, 2010, 35(4): 463-466. |
[9] | GU Guohua, WANG Wuxing, MENG Guojie, XU Yueren. Crustal Movements Before and After the Wenchuan Earthquake as Detected by GPS Observations[J]. Geomatics and Information Science of Wuhan University, 2009, 34(11): 1336-1339. |
[10] | WU Yun, SUN Jianzhong, QIAO Xuejun, WANG Hui. Applications of GPS to Current Crust Movements and Monitoring Seismic Precursors[J]. Geomatics and Information Science of Wuhan University, 2003, 28(S1): 79-82,136. |