Semantic Image Synthesis From Natural Language Descriptions Using Adaptive Multimodal GANs

Semantic Image Synthesis From Natural Language Descriptions Using Adaptive Multimodal GANs

S. Rubin Bose (SRM Institute of Science and Technology, India), B. Anthati Sai Gopi Krishna (SRM Institute of Science and Technology, Ramapuram, India), N. S. Vishnu Shankar (SRM Institute of Science and Technology, Ramapuram, India), S. Nishesh (SRM Institute of Science and Technology, Ramapuram, India), Palanivel Rathinasabapathi Velmurugan (Berlin School of Business and Innovation, Berlin, Germany), S. Saranya (Dhaanish Ahmed College of Engineering, India), and Rahul Chauhan (The Fitch Group, East Rutherford, USA)
Copyright: © 2026 |Pages: 22
DOI: 10.4018/979-8-3373-1987-2.ch004
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Semantic picture synthesis from natural language descriptions is a difficult but important task at the junction of CV and NLP. Using Adaptive Multimodal Generative Adversarial Networks (GANs), this work proposes a new approach for creating high-fidelity and semantically coherent images from textual descriptions. Using a dynamic attention mechanism (DAM), the model aggregates textual and visual modalities, enabling the generator to focus on relevant linguistic traits and dynamically produce images. Moreover, a multi-stage refinement (MSR) process ensures that image features progressively align with the input text. In addition to visual realism, the approach offers a modality-aware discriminator assessing semantic conformity with the description. Extensive tests on benchmark datasets reveal that the method outperforms existing models in terms of variety, text-image consistency (TIC), and picture quality. This work opens exciting future directions, including design prototyping, art generation, and content creation assistance.
Chapter Preview

Complete Chapter List

Search this Book:
Reset