ZHANG Sanqiang, SONG Guomin, JIA Fenli, CHEN Lingyu. Character Life-Track Information Model and Information Extraction Method[J]. Geomatics and Information Science of Wuhan University, 2022, 47(5): 700-706. DOI: 10.13203/j.whugis20190424
Citation: ZHANG Sanqiang, SONG Guomin, JIA Fenli, CHEN Lingyu. Character Life-Track Information Model and Information Extraction Method[J]. Geomatics and Information Science of Wuhan University, 2022, 47(5): 700-706. DOI: 10.13203/j.whugis20190424

Character Life-Track Information Model and Information Extraction Method

  •   Objectives  In the field of human-related geographic information systems (GIS), the spatiotemporal analysis of character information has received increasingly more attention. It is important in that it helps GIS users to generate various thematic maps and achieve the visualization of human geographic content. For adaptation to the development direction of GIS intellectualization, it is of great significance to combine GIS requirements with natural language processing (NLP) methods and build a character information model.
      Methods  Firstly, we expound the research status of character information models in GIS and NLP and put forward the concept of life-track, which is mainly composed of a series of character event mentions. Secondly, considering the feasibility of the existing information extraction methods, a conceptual character life-track information model is determined. This model focuses on event information to highlight character spatiotemporal elements and also includes character attribute and relationship information. Finally, a complete information extraction process is designed for the model with online character encyclopedia pages as the data source. This paper focuses on two sub-tasks in the process: One is to use time features and OpenHowNet semantic calculations to identify event mentions, and the other is to use linguistics features and the conditional random field (CRF) model to extract location information.
      Results  Experiment results show that the method of event mention identification has an accuracy of 91.8%. Although the average F1 value of location information extraction is only 78% under the condition of a limited labeling corpus, some valuable experimental conclusions have been obtained by analyzing the weight of the transmit matrix of the CRF mod‍el: (1) The location phrase and its adjacent words have obvious characteristic effects. (2) ‍The dependency syntactic parsing and the relative position of the word in the sentence have little influence on the extraction. (3) The target of location information extraction is the place where the event occurred, but in a few cases, some location phrases are irrelevant to the location of the event. This is the main reason for the low accuracy.
      Conclusions  Combining GIS with NLP, intelligent GIS development will be prom‍is‍ing. The character life-track information model provides an example of the large-scale use of ubiquitous internet information. Improving methods applied in the extraction process and applying those methods to more online text types are the focus of our team's subsequent research.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return