林森, 刘蓓蓓, 李建文, 刘旭, 秦昆, 郭桂祯. 基于BERT迁移学习模型的地震灾害社交媒体信息分类研究[J]. 武汉大学学报 ( 信息科学版). DOI: 10.13203/j.whugis20220167
引用本文: 林森, 刘蓓蓓, 李建文, 刘旭, 秦昆, 郭桂祯. 基于BERT迁移学习模型的地震灾害社交媒体信息分类研究[J]. 武汉大学学报 ( 信息科学版). DOI: 10.13203/j.whugis20220167
LIN Sen, LIU Beibei, LI Jianwen, LIU Xu, QIN Kun, GUO Guizhen. Social media information classification of earthquake disasters based on BERT transfer learning model[J]. Geomatics and Information Science of Wuhan University. DOI: 10.13203/j.whugis20220167
Citation: LIN Sen, LIU Beibei, LI Jianwen, LIU Xu, QIN Kun, GUO Guizhen. Social media information classification of earthquake disasters based on BERT transfer learning model[J]. Geomatics and Information Science of Wuhan University. DOI: 10.13203/j.whugis20220167

基于BERT迁移学习模型的地震灾害社交媒体信息分类研究

Social media information classification of earthquake disasters based on BERT transfer learning model

  • 摘要: 社交媒体数据具有现势性高、传播快、信息丰富、成本低、数据量大等优点,已经成为分析突发灾害事件的重要信息源。但是,社交媒体数据也存在质量各异、冗余而又不完整、覆盖不均匀、缺少统一规范、隐私与安全难以控制等缺点。为了能够利用社交媒体数据为灾害应急响应提供精准化依据,迫切需要提出能够甄别社交媒体内容并进行有效分类的先进技术。针对此问题,本文提出利用基于变换器的双向编码表征模型(Bidirectional Encoder Representation From Transformers,BERT)的迁移学习方法,面向灾后应急需求,利用基于海量语料库的预训练模型,对地震灾害事件后“黄金”72小时内的微博数据进行多标签文本分类,将其划分为致灾信息、损失信息、救援救助信息、舆情信息、无用信息5种类型,提取出可用于灾情分析的精细化信息。本文模型在训练集和测试集上的分类准确率分别达97%和92%,有效提升了微博文本数据的分类精度。评估结果表明:该模型能很好地提取社交媒体中地震灾害信息,可应用于地震灾害事件的快速灾情研判,弥补传统信息获取手段的滞后性。

     

    Abstract: Objectives:  In recent years, extreme weather events have increased and sudden disasters have occurred frequently, which puts forward higher requirements for disaster emergency response. Once a disaster happened, information collection is the key to decision-making of response. With the rapid development of the Internet, social media platform has become an important source of emergency disaster information. However, social media platforms have a lot of duplication, errors and even malicious content in a short time. Social media content needs to be effectively screened through technical means to provide basis for accurate disaster emergency response.   Methods:  The development of deep learning greatly boosts the accuracy and the efficiency of text task. This study took earthquake disasters as an example, over 50K microblog data in the 72 hours after 5 major earthquakes in China during 2013-2022 were obtained. A multi-label classification model was built by transfer learning based on BERT pre-trained model. Each sample was manually marked as one or more of five types of labels: hazards information, loss information, rescue information, public opinion information and useless information.   Results:  By fine-tune training, the classification accuracy of the model on the training set and the test set reached 95% and 91%, respectively. Single-label AUC score ranged from 0.952 to 0.998.   Conclusions:  Both metrics proved the model is of high reliability. The model can be applied to the emergency management in sudden disaster events, which is conducive to rapidly assisting disaster judgment.

     

/

返回文章
返回