中国省域综合地图集通用内容结构知识构建方法研究

Method of Constructing General Content Knowledge System for Provincial Comprehensive Atlas of China

  • 摘要: 地图集作为系统化的科学作品,其内容结构需遵循严谨的逻辑框架与核心主题。内容设计作为地图集编制的核心环节,奠定了整本图集科学性的基础。针对图集内容设计缺乏标准化参考的问题,提出一种融合自然语言处理与知识图谱技术的省域综合地图集通用内容结构体系构建方法,旨在推动地图集内容设计的自动化与智能化。通过收集整理国内外98部区域综合地图集,构建结构化资料库,并基于预训练语言模型(pre-trained language models,PLM)实现子图组文本聚类,利用地图集数据集与知识图谱嵌入增强PLM的语义理解能力。地图集知识与知识图谱的融合训练有效提升了PLM在专题子图组聚类任务下的准确性与鲁棒性,聚类准确率最高提升11.46%。构建了图组—子图组—图幅三级层次化内容结构体系,可为省域地图集内容设计提供科学依据,促进内容标准化与结构规范化,提升地图集的完备性与科学性。

     

    Abstract:
    Objectives The content structure of an atlas, as a systematic scientific work, must adhere to a rigorous logical framework and coherent thematic organization. Content design constitutes the foundation of atlas scientific quality. However, the absence of standardized references for atlas content structuring limits consistency and automation. We propose a general content structure system for provincial comprehensive atlases by integrating natural language processing and knowledge graph technologies, aiming to advance intelligent and automated atlas content design.
    Methods A structured database was constructed based on the collection and organization of 98 regional comprehensive atlases from domestic and international sources. A general content structure framework for provincial atlases was then derived through subgroup-based analysis. Hierarchical text clustering of atlas subgroup texts was conducted using pre-trained language models (PLM). To enhance semantic representation, atlas domain knowledge was incorporated through knowledge graph embedding techniques, thereby improving the semantic understanding capacity of PLM.
    Results A three-tier hierarchical and standardized content structure system, organized as group-ssubgroups-maps was successfully established, comprising 4 groups, 55 subgroups, and 289 map categories. Hierarchical clustering of 2 319 atlas subgroup texts using PLM revealed distinct inflection points in clustering evaluation metrics, corresponding to the typical range of subgroup numbers within thematic groups. Furthermore, the integration of atlas domain knowledge through knowledge graph embedding significantly enhanced clustering robustness and precision, improving overall clustering accuracy by up to 11.46%.
    Conclusions The proposed general content structure system provides a scientific foundation for provincial atlas content design, enhances structural standardization, and improves the completeness and scientific rigor of atlas compilation. Furthermore, the integration of knowledge graph embedding offers a novel framework for enhancing PLM performance in domain-specific text clustering tasks, contributing to improved clustering accuracy and intelligent atlas design methodologies.

     

/

返回文章
返回