Adaptive Multi-level Feature Fusion for Scene Ancient Chinese Text Recognition
-
Graphical Abstract
-
Abstract
Objectives: Ancient Chinese text are widely distributed in inscriptions, couplets, stone engravings and other scenes, which have the characteristics of complex background, large number of characters, and diverse writing forms. The large number of characters and writing forms directly lead to the difference in text structure complexity. Methods: To solve the difficulty of recognizing ancient Chinese text with complex structures, we propose an adaptive multilevel feature fusion network. First, ResNet152 is the main backbone network, and its deeper network and residual structure can fit more parameters to learn the features of ancient Chinese text and avoid the degradation of the model. Second, according to the structural complexity of ancient Chinese text, the importance of each feature map is automatically obtained through learning, so that the model adaptively selects and merges the shallow detail information and high-level semantic information of ancient Chinese text, obtains the high discrimination features of ancient Chinese text and improves the recognition ability of the model. Finally, the maximum boundary cosine loss is used to minimize the cosine similarity between different ancient Chinese text, increase the inter-class distance of ancient Chinese text, and reduce the intra-class distance between similar Chinese text. Combined with the cross entropy loss function as a loss function, the model can improve the discrimination ability of ancient Chinese text with similar structures. Results: The experimental results show that when the multistage feature fusion module is added to the proposed method, the Top-1 accuracy rate is increased by 1.59%, and when the maximum boundary cosine loss function is added, the Top-1 accuracy rate is increased by 1.09%. The best effect of Top-1 identification accuracy rate on the multi-scene ancient Chinese character dataset is 79.58%. Compared with the current optimal method, it improves the recognition accuracy of scene ancient Chinese text by 3.27%. Conclusions: In this paper, a multistage feature fusion network is designed to improve the feature extraction ability of the model, and the maximum boundary cosine loss is introduced to increase the distance between ancient Chinese text and narrow the distance within ancient Chinese text.
-
-