基于地名树的最佳空间尺度新闻事件地点提取方法

Extraction of News Location with Best Spatial Scale Based on Toponymic Tree

  • 摘要: 从新闻纯文本数据中识别地名信息并确定对应的最佳空间尺度与事件所属地点,是准确抽取新闻事件发生地点的关键。针对上述目标,提出了基于隶属关系地名树的最佳空间尺度新闻事件地点提取方法。在完成地名实体识别和歧义消除的文本数据预处理工作的基础上,提出了一种顾及新闻结构的方法消除语义干扰等噪声的影响;通过引入虚父节点构建合理准确的隶属关系地名树,结合最小包围盒的概念实现了最佳空间尺度的选取,使用地名实体权重和实体相关性完成了事件地候选集推荐排序,从而合理定位事件发生地。实验证明,所提出的新闻文本地理信息抽取方法可以较高的准确率获取新闻所对应的最佳空间尺度和相应的事件地点。讨论和解决了新闻文本地理信息抽取涉及的空间尺度问题,使得新闻文本中抽取的地理信息具有更好的可用性和可解释性,在丰富地理信息数据来源的同时,可实现数量呈几何级增长的网络新闻自动地域划分,有助于人们对各类事件空间态势的关注与认知。

     

    Abstract: Online news provides users with access to current affairs in a timelier manner. As one of the key elements of news, news location also plays an important role in multi-source geographic information. The key technologies to extract news location from text data include the recognizing of toponymic information from text and the determination of the best spatial scale accordingly. To achieve these goals, we present an approach to extract news location with best spatial scale based on administrative district relation tree. A method is put forward to remove semantic interference on the basis of identifying place names to eliminate ambiguities. By introducing virtual parent node, the minimum bounding box, node weight and association relationship, it is possible to build toponymic trees accurately and select news locations with reasonable scale. The experiment shows that the method we proposed can get suitable scale and accurate location of news. In this paper, we discuss and solve the problem of spatial scale involved in the extraction of geographic information from news text, making geographical information more useful and interpretable. Apart from enriching geographical information, the method also makes sense in helping people form spatial awareness for all kinds of events, and helps tag exponentially-increasing online news by region.

     

/

返回文章
返回