Abstract:
Objectives The text in street view images is the clue to perceive and understand scene information. Low-resolution street view images lack details in the text region, leading to poor recognition accuracy. Super-resolution can be introduced as pre-processing to reconstruct edge and texture details of the text region. To improve text recognition accuracy, we propose a text super-resolution network combining attentional mechanism and sequential units.
Methods A hybrid residual attentional structure is proposed to extract spatial information and channel information of the image text region, learning multi-level feature representation. A sequential unit is proposed to extract sequential prior information between texts in the image through bidirectional gated recurrent units. Using gradient prior knowledge as the constraint, a gradient prior loss is designed to sharpen character boundaries.
Results In order to verify the effectiveness of the proposed method, we use real scene text images in TextZoom and synthetic text images to carry out comparative analysis experiments. Experimental results show that compared with the baseline and state-of-the-art general super-resolution algorithm, our model reconstruct sharper text edges and clearer texture details in visual perception, and achieve higher recognition accuracy.
Conclusions Our method can make better use of the prior knowledge of text areas in images, which help reconstruct text details, improving accuracy of the text recognition task.