一种基于复合特征的中文地名识别方法

A Method of Chinese Place Name Recognition Based on Composite Features

  • 摘要: 中文地名识别是命名实体识别的重要研究课题之一,也是提高地理信息系统应用水平的关键。传统的地名识别主要基于词性或地名要素特征,特征类型有限。提出了一种基于复合特征的中文地名识别方法,挖掘中文地名在自然语言中的特点,设计了类型、路径、距离和数量四种句法特征,基于地名要素特征、词性特征、句法特征三种复合特征利用条件随机场模型实现了中文地名的训练和识别。通过实验对比复合特征在中文地名识别方法的效果,结果表明复合特征能够有效提高中文地名识别的准确率和召回率,尤其是对于复杂地名的识别,具有良好的效果。

     

    Abstract: Chinese place name recognition is a research topic in named entity recognition, and a key to improve the application level of the geographic information systems in China. The traditional place name recognition method is based on the element characteristics of a place name and the part of speech of words, and employs limited features. This paper proposes a method of Chinese place name recognition method using syntactic features, and mines the syntactic characteristics of place names in natural language. The design employs four syntactic features, class, path, distance, and number, in conditional random fields (CRF) to train and recognize Chinese place names based on place name element s, position of speech (POS) and syntactic features. Comparative experiments with composite features and traditional features for Chinese place name show that with the help of the three composite feature, s Chinese place name recognition accuracy and recall rate can be improved effectively and with good results for complex place names.

     

/

返回文章
返回