A Comparative Study of Generative Adversarial Networks for Text-to-Image Synthesis

A Comparative Study of Generative Adversarial Networks for Text-to-Image Synthesis

Muskaan Chopra, Sunil K. Singh, Akhil Sharma, Shabeg Singh Gill
DOI: 10.4018/IJSSCI.300364
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Text-to-picture alludes to the conversion of a textual description into a semantically similar image.The automatic synthesis of top-quality pictures from text portrayals is both exciting and useful at the same time.Current AI systems have shown significant advances in the field,but the work is still far from complete. Recent advances in the field of Deep Learning have resulted in the introduction of generative models that are capable of generating realistic images when trained appropriately.In this paper,authors will review the advancements in architectures for solving the problem of image synthesis using a text description.They begin by studying the concepts of the standard GAN, how the DCGAN has been used for the task at hand is followed by the StackGAN with uses a stack of two GANs to generate an image through iterative refinement & StackGAN++ which uses multiple GANs in a tree-like structure making the task of generating images from the text more generalized. They look at the AttnGAN which uses an attentional model to generate sub-regions of an image based on the description.
Article Preview
Top

1. Introduction

Text-to-image or automatically producing pictures from a text portrayal is a very complex machine learning and computer-vision problem that has witnessed a lot of interesting research in recent years. This is partly due to the fact that the automatic creation of images from natural language descriptions can have a significant impact in various other fields. For instance, the concept of text to image generation can be used in tasks like pictorial art generation (Elgammal, A. et al., 2017) generating video games (Isola et al., 2018), computer-aided design, and so on using a rich and visual natural language description of the object. Earlier, text-to-image was carried out using a process that combined the concepts of supervised learning and search (Xiaojin Zhu et al., 2007). Methods like these were interesting since they combined concepts from various fields like computer vision, machine learning, computer graphics, and natural language processing. These methods do not generate original images but simply manipulate the existing images.

GAN involves two autonomous organizations. One is called Generator and the other one is called Discriminator. The generator creates manufactured examples given an arbitrary clamor [sampled from idle space] and the Discriminator is a double classifier that separates between whether the information test is genuine [output a scalar worth 1] or phony [output scalar worth 0]. Tests created by Generator is named as a phony example. The magnificence of this plan is the antagonistic nature among Generator and Discriminator. Discriminator needs to manage its work in most ideal manner, when a phony example [which are created by Generator] is given to a Discriminator, it needs to call it out as phony yet the Generator needs to produce tests in a manner with the goal that the Discriminator commits an error in calling it out as a genuine one. In some sense, the Generator is attempting to trick the Discriminator.

On the other hand, generative networks have shown significant advancements when it comes to learning to generate visual data based on a sample or training distribution. It is also interesting to note that most of the cutting-edge solutions for text-to-image are based on a generative adversarial architecture. The authors will now explore the architecture and workings of a standard generative adversarial network. In the following sections, they will explore various derivatives of this standard architecture that aim to solve this problem.

Complete Article List

Search this Journal:
Reset
Volume 16: 1 Issue (2024)
Volume 15: 1 Issue (2023)
Volume 14: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 13: 4 Issues (2021)
Volume 12: 4 Issues (2020)
Volume 11: 4 Issues (2019)
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing