Article Preview
Top1. Introduction
Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the human visual system can do. Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information. Understanding in this context means the transformation of visual images into descriptions of the world that make sense to thought processes and can elicit appropriate action (Meng et. al., 2021). This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory. Sub-domains of computer vision include Image inpainting, image enhancement, event detection, object recognition, 3D pose estimation, learning, indexing, motion estimation and image restoration. Generative Adversarial Networks can be effectively used in the area of computer vision (Jiancheng et. al., 2020).
“Generative Adversarial Network is the most interesting idea in the last ten years in machine learning” said Yann LeCun, VP and Chief AI Scientist at Facebook. Generative Adversarial Network (GAN) is an idea arising from game theory, they were introduced to the machine learning community in 2014 by Ian J. Goodfellow et. al., in the article Generative Adversarial Nets.
It is an approach to generative modeling using deep learning methods, such as convolutional neural networks. Generative modeling is an unsupervised learning task in machine learning that involves automatically discovering and learning the regularities or patterns in input data in such a way that the model can be used to generate or output new examples that plausibly could have been drawn from the original dataset.
GANs are a clever way of training a generative model by framing the problem as a supervised learning problem with two sub-models: the generator model that is trained to generate new examples, and the discriminator model that tries to classify examples as either real, from the domain or fake which is the generated one. The two models are trained together in a zero-sum game, adversarial, until the discriminator model is fooled about half the time, meaning the generator model is generating plausible examples.
The core idea of a GAN is based on the “indirect” training through the discriminator, which itself is also being updated dynamically. This basically means that the generator is not trained to minimize the distance to a specific image, but rather to fool the discriminator. This enables the model to learn in an unsupervised manner. Recent works in this area have led to Recurrent Generative Adversarial Network (RGAN) (Qiang et. al., 2021) for tackling GANs limitations.
Image Inpainting as per Varuni et. al., (2018) is the process of reconstructing lost or deteriorated parts of images and videos. Basically, the missing regions in an image are reconstructed. It is an important problem in computer vision and an essential functionality in many imaging and graphics applications, e.g., object removal, image restoration, manipulation, re-targeting, compositing, and image-based rendering. Inpainting plays a vital role in situations such as occlusion removal or image restoration. The occluded area is reconstructed similar to the dataset on which the model was trained. In recent years image inpainting has become a hot topic of research in the domain of computer vision.