Abstract:
Online news provides users with access to current affairs in a timelier manner. As one of the key elements of news, news location also plays an important role in multi-source geographic information. The key technologies to extract news location from text data include the recognizing of toponymic information from text and the determination of the best spatial scale accordingly. To achieve these goals, we present an approach to extract news location with best spatial scale based on administrative district relation tree. A method is put forward to remove semantic interference on the basis of identifying place names to eliminate ambiguities. By introducing virtual parent node, the minimum bounding box, node weight and association relationship, it is possible to build toponymic trees accurately and select news locations with reasonable scale. The experiment shows that the method we proposed can get suitable scale and accurate location of news. In this paper, we discuss and solve the problem of spatial scale involved in the extraction of geographic information from news text, making geographical information more useful and interpretable. Apart from enriching geographical information, the method also makes sense in helping people form spatial awareness for all kinds of events, and helps tag exponentially-increasing online news by region.