Message Board

Respected readers, authors and reviewers, you can add comments to this page on any questions about the contribution, review,        editing and publication of this journal. We will give you an answer as soon as possible. Thank you for your support!

Name
E-mail
Phone
Title
Content
Verification Code
Turn off MathJax
Article Contents

Li Xiaolin, Li Gang, Zhang Enqi, Gu Guanghua. Determinant Point Process Sampling Method for Text-to-Image Generation[J]. Geomatics and Information Science of Wuhan University. doi: 10.13203/j.whugis20210373
Citation: Li Xiaolin, Li Gang, Zhang Enqi, Gu Guanghua. Determinant Point Process Sampling Method for Text-to-Image Generation[J]. Geomatics and Information Science of Wuhan University. doi: 10.13203/j.whugis20210373

Determinant Point Process Sampling Method for Text-to-Image Generation

doi: 10.13203/j.whugis20210373
Funds:

The National Natural Science Foundation of China(No.62072394)

  • Received Date: 2021-06-21
  • Objectives:In recent years, a great breakthrough has been made in the text generation image problem based on Generative Adversarial Networks (GAN). It can generate corresponding images based on the semantic information of the text, and has great application value. However, the current generated image results usually lack specific texture details, and often have problems such as collapsed modes and lack of diversity. Methods:This paper proposes a Determinant Point Process for Generative Adversarial Networks (GAN-DPP) to improve the quality of the generated samples, and uses two baseline models, StackGAN++ and ControlGAN, to implement GAN-DPP. During the training, it uses Determinantal Point Process kernel to model the diversity of real data and synthetic data and encourages the generator to generate diversity data similar to the real data through penalty loss. It improves the clarity and diversity of generated samples, and reduces problems such as mode collapse. No extra calculations were added during training. Results:This paper compares the generated results through indicators. For the Inception Score score, a high value indicates that the image clarity and diversity have improved. On the Oxford-102 dataset, the score of GAN-DPP-S is increased by 3.1% compared with StackGAN++, and the score of GAN-DPP-C is 3.4% higher than that of ControlGAN. For the CUB dataset, the score of GAN-DPP-S increased by 8.2%, and the score of GAN-DPP-C increased by 1.9%. For the Fréchet Inception Distance score, the lower the value, the better the quality of image generation. On the Oxford-102 dataset, the score of GAN-DPP-S is reduced by 11.1%, and the score of GAN-DPP-C is reduced by 11.2%. For the CUB dataset, the score of GAN-DPP-S is reduced by 6.4%, and the score of GAN-DPP-C is reduced by 3.1%. Conclusions:The qualitative and quantitative comparative experiments prove that the proposed GAN-DPP method improves the performance of the generative confrontation network model. The image texture details generated by the model are more abundant, and the diversity is significantly improved.
  • [1] Wang M, Ai T, Yan X, et al.Grid Pattern Recognition in Road Networks Based on Graph Convolution Network Model[J].Geomatics and Information Science of Wuhan University, 2020, 45(12):1960-1969
    [2] Zheng C X, Cham T J, Cai J F.Pluralistic image completion[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Long Beach, CA, USA:1438-1447
    [3] Karnewar A, Wang O.MSG-GAN:Multi-Scale Gradient GAN for Stable Image Synthesis[OL].https://arxiv.org/abs/1903.06048, 2019
    [4] Li Y T, Gan Z, Shen Y L, et al.StoryGAN:A sequential conditional GAN for story visualization[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Long Beach, CA, USA:6322-6331
    [5] Xu K, Ba J L, Kiros R, et al.Show, Attend and Tell:Neural Image Caption Generation with Visual Attention[C]//Proceedings of the 32nd International Conference on International Conference on Machine Learning-Volume 37.2015:2048-2057
    [6] Wei Y C, Zhao Y, Lu C Y, et al.Cross-Modal Retrieval with CNN Visual Features:A New Baseline[J].IEEE Transactions on Cybernetics, 2017, 47(2):449-460
    [7] Goldberg Y.Neural network methods for natural language processing[M].[San Rafael]:Morgan&Claypool Publishers,[2017]
    [8] Goodfellow I, Pouget-Abadie J, Mirza M, et al.Generative Adversarial Networks[J].Communications of the ACM, 2020, 63(11):139-144
    [9] Mirza M, Osindero S.Conditional Generative Adversarial Nets[OL].https://arxiv.org/abs/1411.1784, 2014
    [10] Odena A, Olah C, Shlens J.Conditional Image Synthesis with Auxiliary Classifier GANs[C].//The 34th International Conference on Machine Learning.Sydney, Australia, 2017
    [11] Reed S, Akata Z, Yan X C, et al.Generative Adversarial Text to Image Synthesis[C]//Proceedings of the 33rd International Conference on International Conference on Machine Learning-Volume 48.2016:1060-1069
    [12] Isola P, Zhu J Y, Zhou T H, et al.Image-to-image translation with conditional adversarial networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition.Honolulu, HI, USA:5967-5976
    [13] Nilsback M E, Zisserman A.Automated flower classification over a large number of classes[C]//2008 Sixth Indian Conference on Computer Vision, Graphics&Image Processing.Bhubaneswar, India.:722-729
    [14] Wah C, Branson S, Welinder P, et al.The Caltech-UCSD Birds-200-2011 Dataset[J].California Institute of Technology, 2011, 7(1):1-8
    [15] Zhang H, Xu T, Li H S, et al.StackGAN:text to photo-realistic image synthesis with stacked generative adversarial networks[C]//2017 IEEE International Conference on Computer Vision.Venice, Italy:5908-5916
    [16] Zhang H, Xu T, Li H S, et al.StackGAN:Realistic Image Synthesis with Stacked Generative Adversarial Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8):1947-1962
    [17] Mao Q, Lee H Y, Tseng H Y, et al.Mode seeking generative adversarial networks for diverse image synthesis[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Long Beach, CA, USA:1429-1437
    [18] Srivastava A, Valkov L, Russell C, et al.VEEGAN:Reducing Mode Collapse in GANs Using Implicit Variational Learning[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.2017:3310-3320
    [19] Xu T, Zhang P C, Huang Q Y, et al.AttnGAN:fine-grained text to image generation with attentional generative adversarial networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Salt Lake City, UT, USA:1316-1324
    [20] Li B W, Qi X J, Lukasiewicz T, et al.Torr.Controllable Text-to-Image Generation[C]//The International Conference on Neural Information Processing Systems.Vancouver, Canada, 2019
    [21] Borodin A.Determinantal Point Processes[OL].https://arxiv.org/abs/0911.1153, 2009
    [22] Reed S, Akata Z, Lee H, et al.Learning deep representations of fine-grained visual descriptions[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas, NV, USA:49-58
    [23] Zhou J, Xu W.End-to-end learning of semantic role labeling using recurrent neural networks[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1:Long Papers).Beijing, China.2015
    [24] Macchi O.The Coincidence Approach to Stochastic Point Processes[J].Advances in Applied Probability, 1975, 7(1):83-122
    [25] Hough J B, Krishnapur M, Peres Y, et al.Determinantal Processes and Independence[J].Probability Surveys, 2006, 3(1):206-229
    [26] Gong B Q, Chao W L, Grauman K, et al.Diverse Sequential Subset Selection for Supervised Video Summarization[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2.2014:2069-2077
    [27] Elfeki M, Couprie C, Riviere M, et al.GDPP:Learning Diverse Generations Using Determinantal Point Process[OL].https://arxiv.org/abs/1812.00068v1, 2018
    [28] Kulesza A, Taskar B.Structured Determinantal Point Processes[C]//Proceedings of the 23rd International Conference on Neural Information Processing Systems-Volume 1.2010:1171-1179
    [29] Salimans T, Goodfellow I, Zaremba W, et al.Improved Techniques for Training GANs[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems.2016:2234-2242
    [30] Szegedy C, Vanhoucke V, Ioffe S, et al.Rethinking the inception architecture for computer vision[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas, NV, USA:2818-2826
    [31] Heusel M, Ramsauer H, Unterthiner T, et al.GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.2017:6629-6640
    [32] Zhang Z Z, Xie Y P, Yang L.Photographic text-to-image synthesis with a hierarchically-nested adversarial network[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Salt Lake City, UT, USA:6199-6208
  • 加载中
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Article Metrics

Article views(102) PDF downloads(8) Cited by()

Related
Proportional views

Determinant Point Process Sampling Method for Text-to-Image Generation

doi: 10.13203/j.whugis20210373
Funds:

The National Natural Science Foundation of China(No.62072394)

Abstract: Objectives:In recent years, a great breakthrough has been made in the text generation image problem based on Generative Adversarial Networks (GAN). It can generate corresponding images based on the semantic information of the text, and has great application value. However, the current generated image results usually lack specific texture details, and often have problems such as collapsed modes and lack of diversity. Methods:This paper proposes a Determinant Point Process for Generative Adversarial Networks (GAN-DPP) to improve the quality of the generated samples, and uses two baseline models, StackGAN++ and ControlGAN, to implement GAN-DPP. During the training, it uses Determinantal Point Process kernel to model the diversity of real data and synthetic data and encourages the generator to generate diversity data similar to the real data through penalty loss. It improves the clarity and diversity of generated samples, and reduces problems such as mode collapse. No extra calculations were added during training. Results:This paper compares the generated results through indicators. For the Inception Score score, a high value indicates that the image clarity and diversity have improved. On the Oxford-102 dataset, the score of GAN-DPP-S is increased by 3.1% compared with StackGAN++, and the score of GAN-DPP-C is 3.4% higher than that of ControlGAN. For the CUB dataset, the score of GAN-DPP-S increased by 8.2%, and the score of GAN-DPP-C increased by 1.9%. For the Fréchet Inception Distance score, the lower the value, the better the quality of image generation. On the Oxford-102 dataset, the score of GAN-DPP-S is reduced by 11.1%, and the score of GAN-DPP-C is reduced by 11.2%. For the CUB dataset, the score of GAN-DPP-S is reduced by 6.4%, and the score of GAN-DPP-C is reduced by 3.1%. Conclusions:The qualitative and quantitative comparative experiments prove that the proposed GAN-DPP method improves the performance of the generative confrontation network model. The image texture details generated by the model are more abundant, and the diversity is significantly improved.

Li Xiaolin, Li Gang, Zhang Enqi, Gu Guanghua. Determinant Point Process Sampling Method for Text-to-Image Generation[J]. Geomatics and Information Science of Wuhan University. doi: 10.13203/j.whugis20210373
Citation: Li Xiaolin, Li Gang, Zhang Enqi, Gu Guanghua. Determinant Point Process Sampling Method for Text-to-Image Generation[J]. Geomatics and Information Science of Wuhan University. doi: 10.13203/j.whugis20210373
Reference (32)

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return