中国省域综合地图集通用内容结构知识构建方法研究

Research on the Method of Constructing General Content Knowledge System for Provincial Comprehensive Atlas of China

  • 摘要: 地图集作为系统化的科学作品,其内容结构需遵循严谨的逻辑框架与核心主题。内容设计作为地图集编制的核心环节,奠定了整本图集科学性的基础。针对图集内容设计缺乏标准化参考的问题,提出一种融合自然语言处理(Natural Language Processing,NLP)与知识图谱(Knowledge Graph,KG)技术的省域综合地图集通用内容结构体系构建方法,旨在推动地图集内容设计的自动化与智能化。通过收集整理国内外98部区域综合地图集,构建结构化资料库,并基于预训练语言模型(Pre-trained Language Models,PLMs)实现子图组文本聚类。利用地图集数据集与知识图谱嵌入增强PLMs的语义理解能力。地图集知识与知识图谱的融合训练有效提升了PLMs在专题子图组聚类任务下的准确性与鲁棒性,聚类准确率最高提升11.46%。最终构建“图组——子图组——图幅”三级层次化内容结构体系,可为省域地图集内容设计提供科学依据,促进内容标准化与结构规范化,提升地图集的完备性与科学性。

     

    Abstract: Objectives: As a systematic scientific work, the content structure of an atlas must follow a rigorous logical framework and core themes. Content design, as a core aspect of atlas development, lays the foundation for the scientific nature of the atlas as a whole. Given the lack of a standardized reference for atlas content design, a method is proposed for constructing a general content structure system for provincial comprehensive atlases, integrating Natural Language Processing (NLP) and Knowledge Graph (KG), with the aim of promoting the automation and intelligence of atlas content design. Methods: The construction of a structured database is achieved by the collection and organization of 98 regional comprehensive atlases, both domestically and internationally. Subsequently, a general content structure system was devised for provincial atlases derived from subgroups. The text clustering of subgroups is facilitated by pre-trained language models (PLMs). The utilization of atlas datasets in conjunction with knowledge graph embedding techniques has been demonstrated to enhance the semantic understanding of PLMs. This, in turn, has been shown to result in an improvement in the classification accuracy of thematic subgroups. Results: (1)The construction of a three-tier, hierarchical, standardized content structure system of 'groups – subgroups – maps' was completed. This structure system comprised 4 groups, 55 subgroups and 289 maps. (2)The application of a PLM to the hierarchical clustering of 2,319 atlas subgroup texts has revealed that the clustering metrics exhibit inflection points within the typical range observed for the number of subgroups in the thematic group. (3)The integration of atlas knowledge and KG has been demonstrated to enhance the precision and resilience of PLMs in the context of the clustering task of thematic subgroups. This integration has been shown to result in an enhancement of up to 11.46% in the clustering accuracy (ACC). Conclusions: The establishment of a general content structure system can provide a scientific basis for the content design of provincial atlases, promote the standardization of content and structure, and enhance the completeness and scientific quality of the atlases. A novel methodology is proposed for enhancing PLMs performance by integrating knowledge graph embedding technology. This approach offers a novel framework for fine-tuning PLMs, specifically for text clustering applications, with the objective of enhancing the clustering accuracy.

     

/

返回文章
返回