LIN Sen, LIU Beibei, LI Jianwen, LIU Xu, QIN Kun, GUO Guizhen. Social Media Information Classification of Earthquake Disasters Based on BERT Transfer Learning Model[J]. Geomatics and Information Science of Wuhan University, 2024, 49(9): 1661-1671. DOI: 10.13203/j.whugis20220167
Citation: LIN Sen, LIU Beibei, LI Jianwen, LIU Xu, QIN Kun, GUO Guizhen. Social Media Information Classification of Earthquake Disasters Based on BERT Transfer Learning Model[J]. Geomatics and Information Science of Wuhan University, 2024, 49(9): 1661-1671. DOI: 10.13203/j.whugis20220167

Social Media Information Classification of Earthquake Disasters Based on BERT Transfer Learning Model

More Information
  • Received Date: October 20, 2022
  • Available Online: September 05, 2022
  • Objectives 

    With the rapid development of the Internet, social media has become an important information source of emergency events. However, there are a lot of duplication, errors and even malicious contents in social media, which need to be effectively classified to provide more accurate information for disaster emergency response.

    Methods 

    Deep learning has greatly improved the accuracy and efficiency of text classification. This paper takes earthquake disaster as an example, and builds a multi-label classification model based on bidirectional encoder representation from transformers (BERT) transfer learning. Over 50 000 posts about 5 earthquakes are collected as training samples from SINA Weibo, which is a very popular social media in China. Each sample is manually marked as one or more labels, such as hazards information, loss information, rescue information, public opinion information and useless information.

    Results 

    By fine-tune training, the classification accuracies of the proposed model on training dataset and test dataset reach 97% and 92%, respectively. The area under curve score of each label ranges from 0.952 to 0.998.

    Conclusions 

    The results prove that the multi-label classification using BERT transfer learning is of high reliability. The proposed model can be applied to the emergency management services for earthquake events, which is beneficial for the rapid disaster rescue and relief.

  • [1]
    白华, 林勋国. 基于中文短文本分类的社交媒体灾害事件检测系统研究[J]. 灾害学, 2016, 31(2): 19-23.

    Bai Hua, Lin Xunguo. Sina Weibo Disaster Information Detection Based on Chinese Short Text Classification[J]. Journal of Catastrophology, 2016, 31(2): 19-23.
    [2]
    Velev D, Zlateva P. Use of Social Media in Natural Disaster Management[C]// International Economics Development and Research Center(IEDRC), Hong Kong, China, 2012.
    [3]
    薄涛. 基于社交媒体的地震灾情数据挖掘与烈度快速评估应用[D]. 哈尔滨: 中国地震局工程力学研究所, 2018.

    Bo Tao. Application of Social Media-Based Earthquake Disaster Data Mining and Rapid Intensity Assessment[D]. Harbin: Institute of Engineering Mechanics, China Earthquake Administration, 2018.
    [4]
    单杰, 秦昆, 黄长青, 等. 众源地理数据处理与分析方法探讨[J]. 武汉大学学报(信息科学版), 2014, 39(4): 390-396.

    Shan Jie, Qin Kun, Huang Changqing, et al. Methods of Crowd Sourcing Geographic Data Processing and Analysis[J]. Geomatics and Information Science of Wuhan University, 2014, 39(4): 390-396.
    [5]
    Goodchild M F. Citizens as Sensors: The World of Volunteered Geography[J]. GeoJournal, 2007, 69(4): 211-221.
    [6]
    Schade S, Díaz L, Ostermann F, et al. Citizen-Based Sensing of Crisis Events: Sensor Web Enablement for Volunteered Geographic Information[J]. Applied Geomatics, 2013, 5(1): 3-18.
    [7]
    沈伟豪,钟燕飞,王俊珏,等.多模态数据的洪涝灾害知识图谱构建与应用[J]. 武汉大学学报(信息科学版),2023, 48(12): 2009-2018.

    Shen Weihao, Zhong Yanfei, Wang Junjue, et al. Construction and Application of Flood Disaster Knowledge Graph Based on Multi-modal Data[J]. Geomatics and Information Science of Wuhan University, 2023, 48(12): 2009-2018.
    [8]
    蔡梅竹. 突发自然灾害事件网络舆论特征研究[D]. 武汉: 华中科技大学, 2012.

    Cai Meizhu. Research on the Characteristics of Network Public Opinion of Sudden Natural Disasters[D]. Wuhan: Huazhong University of Science and Technology, 2012.
    [9]
    宋建功, 王之欣, 李勤勇, 等. 面向地震应急响应的互联网信息处理[J]. 北京航空航天大学学报, 2017, 43(6): 1155-1164.

    Song Jiangong, Wang Zhixin, Li Qinyong, et al. Internet Information Processing for Earthquake Emergency Response[J]. Journal of Beijing University of Aeronautics and Astronautics, 2017, 43(6): 1155-1164.
    [10]
    Goodchild M F, Glennon J A. Crowd Sourcing Geographic Information for Disaster Response: A Research Frontier[J]. International Journal of Digital Earth, 2010, 3(3): 231-241.
    [11]
    帅向华, 胡素平, 刘钦, 等. 地震灾情网络媒体获取与处理模型[J]. 自然灾害学报, 2013, 22(3): 178-184.

    Xianghua Shuai, Hu Suping, Liu Qin, et al. Internet Media-Based Acquisition and Processing Model of Earthquake Disaster Situation[J]. Journal of Natural Disasters, 2013, 22(3): 178-184.
    [12]
    Heinzelman J, Waters C. Crowdsourcing Crisis Information in Disaster[R]. New York, USA: United States Institute of Peace, 2010.
    [13]
    陈科帆, 余伟. 地方政府应急管理信息化建设思路研究[J]. 通信与信息技术, 2020(3): 69-72.

    Chen Kefan, Yu Wei. Research on the Ideas of Local Government Emergency Management Information Construction [J]. Communication & Information Technology, 2020(3): 69-72.
    [14]
    Zhang C, Fan C, Yao W L, et al. Social Media for Intelligent Public Information and Warning in Disasters: An Interdisciplinary Review[J]. International Journal of Information Management, 2019, 49: 190-207.
    [15]
    Qu Y, Huang C, Zhang P Y, et al. Microblogging After a Major Disaster in China: A Case Study of the 2010 Yushu Earthquake[C]//The ACM Conference on Computer Supported Cooperative Work, Hangzhou, China, 2011.
    [16]
    Imran M, Elbassuoni S, Castillo C, et al. Practical Extraction of Disaster-Relevant Information from Social Media[C]//The 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil, 2013.
    [17]
    Takahashi B, Tandoc E C, Carmichael C. Communicating on Twitter During a Disaster: An Analysis of Tweets During Typhoon Haiyan in the Philippines[J]. Computers in Human Behavior, 2015, 50: 392-398.
    [18]
    王艳东, 李萌萌, 付小康, 等. 基于社交媒体共词网络的灾情发展态势探测方法[J]. 武汉大学学报(信息科学版), 2020, 45(5): 691-698.

    Wang Yandong, Li Mengmeng, Fu Xiaokang, et al. A New Method to Detect the Development Situation of Disasters Based on Social Media Co-word Network[J]. Geomatics and Information Science of Wuhan University, 2020, 45(5): 691-698.
    [19]
    Chakrabarti S, Roy S, Soundalgekar M V. Fast and Accurate Text Classification via Multiple Linear Discriminant Projections[J]. The VLDB Journal, 2003, 12(2): 170-185.
    [20]
    贺捷. 随机森林在文本分类中的应用[D]. 广州: 华南理工大学, 2015.

    He Jie. Application of Random Forest in Text Classification[D]. Guangzhou: South China University of Technology, 2015.
    [21]
    平源. 基于支持向量机的聚类及文本分类研究[D]. 北京: 北京邮电大学, 2012.

    Ping Yuan. Research on Clustering and Text Classification Based on Support Vector Machine[D]. Beijing: Beijing University of Posts and Telecommunications, 2012.
    [22]
    余同瑞, 金冉, 韩晓臻, 等. 自然语言处理预训练模型的研究综述[J]. 计算机工程与应用, 2020, 56(23): 12-22.

    Yu Tongrui, Jin Ran, Han Xiaozhen, et al. Review of Pre-training Models for Natural Language Processing[J]. Computer Engineering and Applications, 2020, 56(23): 12-22.
    [23]
    Nguyen D T, Joty S, Imran M, et al. Applications of Online Deep Learning for Crisis Response Using Social Media Information[EB/OL]. [2016-01-30] http://arxiv.org/abs/1610.01030.
    [24]
    刘淑涵, 王艳东, 付小康. 利用卷积神经网络提取微博中的暴雨灾害信息[J]. 地球信息科学学报, 2019, 21(7): 1009-1017.

    Liu Shuhan, Wang Yandong, Fu Xiaokang. Extracting Rainstorm Disaster Information from Microblogs Using Convolutional Neural Network[J]. Journal of Geo⁃Information Science, 2019, 21(7): 1009-1017.
    [25]
    杨腾飞, 解吉波, 闫东川, 等. 基于深度学习的社交媒体情感信息抽取及其在灾情分析中的应用研究[J]. 地理与地理信息科学, 2020, 36(2): 62-68.

    Yang Tengfei, Xie Jibo, Yan Dongchuan, et al. Extracting Sentiment Information from Social Media Based on Deep Learning and the Research on Disaster Reduction[J]. Geography and Geo⁃Information Science, 2020, 36(2): 62-68.
    [26]
    李舟军, 范宇, 吴贤杰. 面向自然语言处理的预训练技术研究综述[J]. 计算机科学, 2020, 47(3): 162-173.

    Li Zhoujun, Fan Yu, Wu Xianjie. Survey of Natural Language Processing Pre-training Techniques[J]. Computer Science, 2020, 47(3): 162-173.
    [27]
    Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[EB/OL]. [2018-04-08] http://arxiv.org/abs/1810.04805.
    [28]
    Peters M E, Neumann M, Iyyer M, et al. Deep Contextualized Word Representations[EB/OL]. [2018-05-16] http://arxiv.org/abs/1802.05365.
    [29]
    Radford A, Narasimhan K, Salimans T, et al. Improving Language Understanding by Generative Pre-training[EB/OL]. [2022-01-05] https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.
    [30]
    Heidari M, Jones J H. Using BERT to Extract Topic-Independent Sentiment Features for Social Media Bot Detection[C]//The 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, USA, 2020.
    [31]
    Mozafari M, Farahbakhsh R, Crespi N. A BERT-Based Transfer Learning Approach for Hate Speech Detection in Online Social Media[EB/OL]. [2019-12-14] http://arxiv.org/abs/1910.12574.
    [32]
    Jain P, Ross R, Schoen-Phelan B. Estimating Distributed Representation Performance in Disaster-Related Social Media Classification[C]//IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Vancouver, Canada, 2019.
    [33]
    Zahera H M. Fine-tuned BERT Model for Multi-label Tweets Classification[C]//Text REtrieval Conference (TREC) , Paris, France, 2019.
    [34]
    Liang C, Yu Y, Jiang H M, et al. BOND: BERT-Assisted Open-Domain Named Entity Recognition with Distant Supervision[C]//The 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, USA, 2020.
    [35]
    Souza F, Nogueira R, Lotufo R. Portuguese Named Entity Recognition Using BERT-CRF[EB/OL]. [2019-06-09] http://arxiv.org/abs/1909.10649.
    [36]
    Wang Z G, Ng P, Ma X F, et al. Multi-passage BERT: A Globally Normalized BERT Model for Open-Domain Question Answering[EB/OL]. [2019-08-17] http://arxiv.org/abs/1908.08167.
    [37]
    Yang W, Xie Y Q, Lin A, et al. End-to-End Open-Domain Question Answering with BERT Serini[EB/OL]. [2019-02-01] http://arxiv.org/abs/1902.01718.
    [38]
    谌志群, 鞠婷. 基于BERT和双向LSTM的微博评论倾向性分析研究[J]. 情报理论与实践, 2020, 43(8): 173-177.

    Chen Zhiqun, Ju Ting. Research on Tendency Analysis of Microblog Comments Based on BERT and BLSTM[J]. Information Studies: Theory & Application, 2020, 43(8): 173-177.
    [39]
    Tsoumakas G, Katakis I, Vlahavas I. Mining Multi-label Data[M]// Data Mining and Knowledge Discovery Handbook. Boston, USA: Springer, 2009: 667-685.
    [40]
    李纲, 海岚, 陈璟浩. 突发自然灾害事件网络媒体报道的周期特征分析: 以地震和台风灾害为例[J]. 信息资源管理学报, 2015, 5(3): 18-24.

    Li Gang, Lan Hai, Chen Jinghao. A Survival Analysis of Periodic Characteristics of China’s Emergent Natural Disaster Event Reported by Network Media: A Case Study on Earthquake and Typhoon Disaster[J]. Journal of Information Resources Management, 2015, 5(3): 18-24.
  • Related Articles

    [1]LI Wei, DU Jinchen, ZHANG Chao, XIE Xukang, ZHANG Chaoyue, LIU Dong, LIU Meilin, ZHANG Liqiong, YANG Jianhua, YAN Haowen. Spatiotemporal Evolution of Disaster and Population Casualties in Jishishan Earthquake by Fusing Multi-source Data[J]. Geomatics and Information Science of Wuhan University, 2025, 50(2): 271-283. DOI: 10.13203/j.whugis20240094
    [2]WANG Dingpan, DONG Xiaohuan, HUANG Lingyong, WANG Xiaohua, LI Qingjun, JI Shunping. Information Entropy Uncertainty Estimation Based Domain Adaptation for Land Cover Classification from Multi-source Remote Sensing Images[J]. Geomatics and Information Science of Wuhan University, 2024, 49(10): 1940-1952. DOI: 10.13203/j.whugis20220346
    [3]SHI Yongxin, ZHOU Weixun, SHAO Zhenfeng. Multi-view Remote Sensing Image Scene Classification by Fusing Multi-scale Attention[J]. Geomatics and Information Science of Wuhan University, 2024, 49(3): 366-375. DOI: 10.13203/j.whugis20220737
    [4]XIE Mingli, JU Nengpan, ZHAO Jianjun, FAN Qiang, HE Chaoyang. Comparative Analysis on Classification Methods of Geological Disaster Susceptibility Assessment[J]. Geomatics and Information Science of Wuhan University, 2021, 46(7): 1003-1014. DOI: 10.13203/j.whugis20190317
    [5]DU Zhiqiang, LI Yu, ZHANG Yeting, TAN Yuqi, ZHAO Wenhao. Knowledge Graph Construction Method on Natural Disaster Emergency[J]. Geomatics and Information Science of Wuhan University, 2020, 45(9): 1344-1355. DOI: 10.13203/j.whugis20200047
    [6]GONG Lifang, LI Aiqin, CHEN Zhangjian, HU Fengwei, DU Qingyun, HOU Wanyue. Emergency Mapping Model for Geological Disaster[J]. Geomatics and Information Science of Wuhan University, 2020, 45(8): 1273-1281. DOI: 10.13203/j.whugis20200140
    [7]SUI Haigang, LIU Chaoxian, LIU Junyi, ZHENG Xiaocui, LI Haifeng, YU Shuhai, LI Qiyu. Reflection and Exploration of Rapid Remote Sensing Emergency Response for Typical Natural Disasters[J]. Geomatics and Information Science of Wuhan University, 2020, 45(8): 1137-1145. DOI: 10.13203/j.whugis20200065
    [8]YUE Chong, LIU Changjun, WANG Xiaofang. Classification Algorithm for Laser Point Clouds of High-steep Slopes Based on Multi-scale Dimensionality Features and SVM[J]. Geomatics and Information Science of Wuhan University, 2016, 41(7): 882-888. DOI: 10.13203/j.whugis20140335
    [9]WEN Qi, XIA Liegang, LI Lingling, WU Wei. Automatically Samples Selection in Disaster Emergency Oriented Land-Cover Classification[J]. Geomatics and Information Science of Wuhan University, 2013, 38(7): 799-804.
    [10]CHEN Fulong, WANG Chao, ZHANG Hong, WU Fan. Multi-temporal SAR Images Classification Using Case-Based Reasoning[J]. Geomatics and Information Science of Wuhan University, 2008, 33(11): 1154-1157.
  • Cited by

    Periodical cited type(1)

    1. 李博宁. 基于支持向量机的电力营销信息多标签分类方法. 电气技术与经济. 2024(11): 299-301 .

    Other cited types(2)

Catalog

    Article views (1094) PDF downloads (121) Cited by(3)
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return