韦豪东, 易尧华, 余长慧, 林立宇. 融合注意力与序列单元的文本超分辨率[J]. 武汉大学学报 ( 信息科学版), 2024, 49(7): 1120-1129. DOI: 10.13203/j.whugis20220158
引用本文: 韦豪东, 易尧华, 余长慧, 林立宇. 融合注意力与序列单元的文本超分辨率[J]. 武汉大学学报 ( 信息科学版), 2024, 49(7): 1120-1129. DOI: 10.13203/j.whugis20220158
WEI Haodong, YI Yaohua, YU Changhui, LIN Liyu. Text Super-Resolution Method with Attentional Mechanism and Sequential Units[J]. Geomatics and Information Science of Wuhan University, 2024, 49(7): 1120-1129. DOI: 10.13203/j.whugis20220158
Citation: WEI Haodong, YI Yaohua, YU Changhui, LIN Liyu. Text Super-Resolution Method with Attentional Mechanism and Sequential Units[J]. Geomatics and Information Science of Wuhan University, 2024, 49(7): 1120-1129. DOI: 10.13203/j.whugis20220158

融合注意力与序列单元的文本超分辨率

Text Super-Resolution Method with Attentional Mechanism and Sequential Units

  • 摘要: 街景影像中的文本信息是感知与理解场景的关键线索,低分辨率街景影像文本区域细节缺乏导致文本识别准确率降低。文本超分辨率通过增强文本区域边缘及纹理细节提高文本识别准确率,提出了融合注意力与序列单元的街景影像文本超分辨率方法。首先,采用混合残差注意力结构提取影像文本区域空间信息、通道信息并融合特征,序列单元通过双向门控循环结构提取影像中文本间的序列先验信息;然后利用梯度先验知识作为约束条件,重构街景影像文本区域。采用TextZoom真实场景影像及合成文本影像进行对比分析,实验结果表明,超分辨率重构的街景影像文本区域边缘清晰、纹理细节丰富,可以提高街景影像文本识别准确率。

     

    Abstract:
    Objectives The text in street view images is the clue to perceive and understand scene information. Low-resolution street view images lack details in the text region, leading to poor recognition accuracy. Super-resolution can be introduced as pre-processing to reconstruct edge and texture details of the text region. To improve text recognition accuracy, we propose a text super-resolution network combining attentional mechanism and sequential units.
    Methods A hybrid residual attentional structure is proposed to extract spatial information and channel information of the image text region, learning multi-level feature representation. A sequential unit is proposed to extract sequential prior information between texts in the image through bidirectional gated recurrent units. Using gradient prior knowledge as the constraint, a gradient prior loss is designed to sharpen character boundaries.
    Results In order to verify the effectiveness of the proposed method, we use real scene text images in TextZoom and synthetic text images to carry out comparative analysis experiments. Experimental results show that compared with the baseline and state-of-the-art general super-resolution algorithm, our model reconstruct sharper text edges and clearer texture details in visual perception, and achieve higher recognition accuracy.
    Conclusions Our method can make better use of the prior knowledge of text areas in images, which help reconstruct text details, improving accuracy of the text recognition task.

     

/

返回文章
返回