一种WMS领域主题文本提取及元数据扩展方法

A Text-Based WMS Domain Themes Extraction and Metadata Extension Method

  • 摘要: 由于网络地图服务(Web map service,WMS)元数据缺乏显式的领域主题描述机制,用户很难准确、全面地发现目标领域的地图数据资源。提出了一种面向地理信息资源检索的WMS领域主题文本提取及元数据扩展方法。首先,设计了一种非监督文本分类算法,利用地球与环境术语集语义网(semantic Web of Earth and environmental terminology,SWEET)和大型英语词汇语义网WordNet,综合计算WMS元数据能力文档中地学术语、通识型词汇与领域主题的语义相关度,为WMS及其图层提取多标签主题。然后,基于ISO19115 2003地理信息元数据标准,为WMS元数据组织模型扩展领域主题。实验结果表明,所提出的WMS元数据主题分类算法取得了较高的查准率和查全率,且相较于朴素贝叶斯、线性支持向量机(support vector machine,SVM)和逻辑回归等方法,整体上有较大的优势。该方法有望应用于当前的地理信息门户和目录服务,辅助用户快速、准确地定位目标领域的地图服务资源。

     

    Abstract: Since there is no explicit metadata description mechanism that defines map domain themes for a Web map service (WMS), end users cannot easily discover desired a map resource in a target domain. We propose a text-based WMS domain themes extraction and metadata extension method for better supporting geographical information retrieval. Specially, we present a new unsupervised multi-label text classification algorithm that measures the semantic relevancies between feature words in a WMS capabilities document and multiple domain themes defined by the GEOSS societal benefit areas (SBAs). The semantic Web of Earth and environmental terminology (SWEET) and WordNet dictionaries are used to calculate the shortest semantic path to a certain theme for both earth terminologies and general terms. In addition, we extend WMS domain theme description by employing theme tags to the ISO19115 2003 geographic information metadata standard, flexibly and conformably. Experimental results indicate that the proposed multi-label text classification method achieves higher recall and precision ratio than other text classification methods, such as native Bayesian, linear support vector machine (SVM) logistic regression, and methods that use SWEET or WordNet alone.

     

/

返回文章
返回