23 May 2023 Semantic layout aware generative adversarial network for text-to-image generation
Jieyu Huang, YongHua Zhu, Zhuo Bi, Wenjun Zhang
Author Affiliations +
Proceedings Volume 12604, International Conference on Computer Graphics, Artificial Intelligence, and Data Processing (ICCAID 2022); 126041W (2023)
Event: 2nd International Conference on Computer Graphics, Artificial Intelligence, and Data Processing (ICCAID 2022), 2022, Guangzhou, China
Text-to-image(T2I) generation methods aim to synthesize a high-quality image which is semantically consistent with the given text descriptions. Previous (T2I) generative adversarial networks generally first create a low-resolution image with rough shapes and colors, and then refine the initial image into a high-resolution image. Most stacked architecture still remains two main problems. (1) The final images generated by these methods depend heavily on the quality of the initial image. If the initial one is not initialized correctly, the resulted image seems like a simple combination of visual features from several images scales. (2) The cross-modal fusion methods about text and image that previous works widely adopted is limited in the text-image fusion process. In the paper, we propose a novel generation model, which introduce a one-stage backbone directly generate high-quality images without multi generators and a novel semantic layout deep fusion network to sufficiently fuse text features and image features. Experiments on the challenging CUB and COCO-Stuff datasets demonstrates the ability of our model in generating images, regarding both semantic consistency with input text description and visual fidelity.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Jieyu Huang, YongHua Zhu, Zhuo Bi, and Wenjun Zhang "Semantic layout aware generative adversarial network for text-to-image generation", Proc. SPIE 12604, International Conference on Computer Graphics, Artificial Intelligence, and Data Processing (ICCAID 2022), 126041W (23 May 2023);
Get copyright permission  Get copyright permission on Copyright Marketplace
Image fusion


Education and training

Computer vision technology

Image processing

Image quality

Data modeling


Study on text to image generation method based on deep...
Proceedings of SPIE (September 12 2024)
Face aging on SiGan
Proceedings of SPIE (March 16 2023)
A text driven image style transfer model based on CLIP...
Proceedings of SPIE (October 10 2023)

Back to Top