Generation of Synthetic Data: A Generative Adversarial Networks Approach

Generation of Synthetic Data: A Generative Adversarial Networks Approach

André Ferreira, Ricardo Magalhães, Victor Alves
DOI: 10.4018/978-1-7998-9172-7.ch010
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Artificial intelligence is growing, but techniques like deep learning require more data than is usually available, especially in the medical context. Usually, the available data sets are not representative of reality, meaning that more samples have to be acquired, which is very costly. The demand for tools that can generate as much data as needed has increased. Traditional data augmentation tools are used to expand the available data, but they are not able to generate new data. The use of generative adversarial networks to generate synthetic data has proven revolutionary for big data as it increases the amount of available data without much cost. To this end, an adaptation of alpha-GAN for 3D MRI scans was developed to create a pipeline for generating as many synthetic scans of rat brains as needed. The applicability of the synthetic data was tested in a segmentation test and the realism by visual assessment.
Chapter Preview
Top

Introduction

Every day, various medical imaging techniques are used to assess the health condition of many patients, e.g. Computed Tomography (CT), X-rays and Magnetic Resonance Imaging (MRI). The latter has a great advantage over other methods: it has no known side effects; it can produce multiplanar and three-dimensional images of in vivo structures with high spatial resolution; it does not expose the user to high levels of radiation (Cleary & Guimarães, 2014). Normally, these images, called scans, are evaluated by specialists who can detect anomalies, but this process can be very time-consuming. To solve this problem, several artificial intelligence-based decision support systems are currently being developed. A specific branch of artificial intelligence called Deep Learning (DL) has already been used for many applications such as decision support systems (Kose et al., 2021). However, DL usually requires a large amount of data to achieve high performance.

Collecting large amounts of medical imaging data can be very expensive, time-consuming or even impossible due to restrictive laws such as the ethical 3Rs rule (Russell & Burch, 1959), and often these data cannot be freely shared due to data protection laws (Foroozandeh & Eklund, 2020; Shin et al., 2018). The UK Biobank (Collins, 2020) in the United Kingdom is a successful attempt to overcome the problem of data scarcity by building huge long-term data repositories. This data can be used by the public or private sector without restriction. This initiative was very successful as it would support many researchers with a wide range of data.

Traditional data augmentation (Nalepa et al., 2019) and generative models have also been used to address the problem of lack of data. These techniques have shown some improvements in DL models, but only to a very limited extent (Foroozandeh & Eklund, 2020; Kodali et al., 2017). Conventional data augmentation does not fill all existing gaps in the data set distribution and some generative models are not sufficiently realistic and representative. Therefore, there is a need to develop better tools that can generate big data to better fill the gaps in the data set distribution. This tool can be created using Generative Adversarial Networks (GANs) (I. J. Goodfellow et al., 2014). With a well-trained generator, it is possible to create as many realistic scans as necessary.

In this paper, the use of GANs, in particular, 𝛼-GAN (Kwon et al., 2019; Rosca et al., 2017), to generate synthetic MRI scans was investigated, reviewing the literature on the subject, analysing successful experiments and examining the authors' experiments. The advantages and disadvantages of this method for generating large amounts of data were also analysed. Since the lack of information is not unique to human MRI scans, a study was conducted using rat brain MRI scans. The contributions of this work are:

  • a new architecture that can be used to train a generator to produce realistic synthetic MRI scans of the rat brain;

  • a pipeline that generates as many synthetic MRI scans of rats brains as needed;

  • proof that synthetic data can be successfully applied and is realistic.

Key Terms in this Chapter

Alpha-GANs: Junction of a VAE architecture with a GAN architecture to solve mode collapse and blurriness problems.

Synthetic Data: Information generated in a non-natural way, i.e., not by measuring or performing the usual operations.

Generative Models: A set of operations that involve the distribution of the data set itself and can generate synthetic data. This can be divided into two approaches: generative (joint distribution) and discriminative (conditional distribution).

GANs: Specific generative networks called Generative Adversarial Networks.

MRI Scans: Medical imaging modality to observe soft parts of the body. Very useful to study and observe the brain, breasts, joints, heart, and other organs.

Big Data: Large amounts of information on a topic that are sometimes difficult to manage. It can be structured or unstructured. This term is often associated with Data Bases or Deep Learning.

Adversarial Networks: Networks that compete with each other. In GANs, the generator tries to fool the discriminator by producing realistic data and the discriminator has to penalize the production of bad synthetic data.

Static Magnetic Field: Coil surrounding the in vivo object. This field must be constant and homogeneous in the entire object volume. The higher the field, the higher the spatial resolution.

Complete Chapter List

Search this Book:
Reset