LI Xiaolin, LI Gang, ZHANG Enqi, GU Guanghua. Determinant Point Process Sampling Method for Text‑to‑Image Generation[J]. Geomatics and Information Science of Wuhan University, 2024, 49(2): 246-255. DOI: 10.13203/j.whugis20210373
Citation: LI Xiaolin, LI Gang, ZHANG Enqi, GU Guanghua. Determinant Point Process Sampling Method for Text‑to‑Image Generation[J]. Geomatics and Information Science of Wuhan University, 2024, 49(2): 246-255. DOI: 10.13203/j.whugis20210373

Determinant Point Process Sampling Method for Text‑to‑Image Generation

  • Objectives In recent years, a great breakthrough has been made in the text generation image problem based on generative adversarial networks (GAN). It can generate corresponding images based on the semantic information of the text, and has great application value. However, the current generated im‍age results usually lack specific texture details, and often have problems such as collapsed modes and lack of diversity.
    Methods This paper proposes a determinant point process for generative adversarial networks(GAN-DPP) to improve the quality of the generated samples, and uses two baseline models, StackGAN++ and ControlGAN, to implement GAN-DPP. During the training, it uses determinantal point process kernel to model the diversity of real data and synthetic data and encourages the generator to gen‍er‍ate diversity data similar to the real data through penalty loss. It improves the clarity and diversity of generated samples, and reduces problems such as mode collapse. No extra calculations were added during training.
    Results This paper compares the generated results through indicators. For the inception score, a high value indicates that the image clarity and diversity have improved. On the Oxford-102 datas‍et, the score of GAN-DPP-S is increased by 3.1% compared with StackGAN++, and the score of GAN-DPP-C is 3.4% higher than that of ControlGAN. For the CUB dataset, the score of GAN-DPP-S increased by 8.2%, and the score of GAN-DPP-C increased by 1.9%. For the Fréchet Inception Distance score, the lower the value, the better the quality of image generation. On the Oxford-102 dataset, the score of GAN-DPP-S is reduced by 11.1%, and the score of GAN-DPP-C is reduced by 11.2%. For the CUB dataset, the score of GAN-DPP-S is reduced by 6.4%, and the score of GAN-DPP-C is reduced by 3.1%.
    Conclusions The qualitative and quantitative comparative experiments prove that the proposed GAN-DPP method improves the performance of the generative confrontation network model. The image texture details generated by the model are more abundant, and the diversity is significantly improved.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return