自适应多级特征融合的场景古汉字识别

Adaptive Multi-level Feature Fusion for Scene Ancient Chinese Text Recognition

  • 摘要: 自然场景中的古汉字图像具有背景复杂、字符数量庞大、书体形式多样的特点,其字符与书体形式众多导致了文本结构复杂度不同,现有研究方法未针对性解决复杂结构古汉字的识别难题。针对这一问题,本文提出一种自适应多级特征融合网络,首先根据古汉字的结构复杂度,自适应选择融合古汉字浅层细节信息和高层语义信息,获取古汉字的高区分度特征,提高模型对古汉字的识别能力。然后使用最大边界余弦损失,增大古汉字的类间间距,提高模型对相似结构古汉字特征的判别能力。实验结果表明,本文方法在多场景古汉字数据集上Top-1识别准确率为79.58%,与目前最优方法相比提高了3.27%,提高了场景古汉字的识别准确率。

     

    Abstract: Objectives: Ancient Chinese text are widely distributed in inscriptions, couplets, stone engravings and other scenes, which have the characteristics of complex background, large number of characters, and diverse writing forms. The large number of characters and writing forms directly lead to the difference in text structure complexity. Methods: To solve the difficulty of recognizing ancient Chinese text with complex structures, we propose an adaptive multilevel feature fusion network. First, ResNet152 is the main backbone network, and its deeper network and residual structure can fit more parameters to learn the features of ancient Chinese text and avoid the degradation of the model. Second, according to the structural complexity of ancient Chinese text, the importance of each feature map is automatically obtained through learning, so that the model adaptively selects and merges the shallow detail information and high-level semantic information of ancient Chinese text, obtains the high discrimination features of ancient Chinese text and improves the recognition ability of the model. Finally, the maximum boundary cosine loss is used to minimize the cosine similarity between different ancient Chinese text, increase the inter-class distance of ancient Chinese text, and reduce the intra-class distance between similar Chinese text. Combined with the cross entropy loss function as a loss function, the model can improve the discrimination ability of ancient Chinese text with similar structures. Results: The experimental results show that when the multistage feature fusion module is added to the proposed method, the Top-1 accuracy rate is increased by 1.59%, and when the maximum boundary cosine loss function is added, the Top-1 accuracy rate is increased by 1.09%. The best effect of Top-1 identification accuracy rate on the multi-scene ancient Chinese character dataset is 79.58%. Compared with the current optimal method, it improves the recognition accuracy of scene ancient Chinese text by 3.27%. Conclusions: In this paper, a multistage feature fusion network is designed to improve the feature extraction ability of the model, and the maximum boundary cosine loss is introduced to increase the distance between ancient Chinese text and narrow the distance within ancient Chinese text.

     

/

返回文章
返回