Fundamentals of Genetic Programming

Fundamentals of Genetic Programming

DOI: 10.4018/978-1-5225-6005-0.ch001

Abstract

In the living world, all species share one very important property, they receive it right after the birth, and it is called the survival instinct. Since the middle of the twentieth century, scientists have been applying the phenomenon in engineering in order to define computer algorithms which follow the principles of biological evolution of species. Eighty years later, scientists and engineers are still applying the phenomenon in order to solve today's most complex and wide variety of problems. This chapter introduces evolutionary algorithms and motivates the reader to start a journey into genetic programming (GP). The chapter starts with the introduction and detailed insights into GP by describing its main parts and terminology in order to define and mimic biological terms with terms in genetic programming. Then the reader is introduced with the historical evolution of GP and the main and the most popular genetic programming variants, it may find dozens of cited references in it. The chapter continues with detailed introduction on the chromosomes, population, initial and selection methods, main genetic operators, various types of fitness functions, termination criteria, etc. Since GP is processor intensive algorithm, it requires parallel execution to increase its performance which is described at the end of the chapter.
Chapter Preview
Top

Introduction

We are in the information technology century with fast-growing information industry. Statistics show that the most valuable firms in the world are those dealing with information technology, and the data is the key resource for them. In the 21st century, the data is more valuable than ever and undoubtedly power belongs to those who have more data. Tremendous amount of data is generated nonstop every second and in parallel, engineers are expected to develop new tool and techniques that increase human capabilities of extracting useful knowledge from such huge amount of data. This book explores the concept of genetic programming (GP), as supervised machine learning (ML) technique, from theory to computer implementation, and from the implementation to application, and presents the complete process of using such ML technique in some real-world engineering problems. There are many GP books, each of which provide general introduction to the field or they describe a theoretical aspect of GP. Despite increasing interest in the applications of GP in diverse set of problem domains, so far, no efforts have been made to provide an introductory book on the GP applications in different engineering problems. This book attempts to fill that gap by providing an optimized open source GP software, which increases current GP software capability of solving different sets of problems using numerical, binary and categorical data. To this end, this chapter puts forward the fundamentals of GP in a simple and easy to understand way.

Before describing GP, let us remind ourselves about the principle of “survival of the fittest” in nature which is the core concept of GP. In nature, all species share one very important property, they receive it right after the birth which is called survival instinct. The survival instinct is expressed differently for different kind of species. A little turtle born on a dry island has the instinct which guides it to move toward the sea very quickly. Otherwise, it could become a food for another animal. As another, example when a fish was born, it might have better chance to survive if it is a member of a shoal. By forming bigger shoals, they are safer in their traveling to places where they can find food. In case that they are traveling alone, they almost have no chance to reach the food, since they are so small and weak. Every species has specific instinct of survival which consists of one or more inherited behavior. In the food chain of nature, some species are food for others. So, which one finishes its life as food, depends on many factors, but a healthy and fast individual undoubtedly has higher chance to survive. It is well known that lions or other wild cats in most cases catch sick gnu or other animal which cannot escape. On this way, only the best and healthy gnu in the herd will survive, which leads the population of gnu to consist always of healthy and fast gnu. The breeding process for most species is going to happen between the best possible individuals in the population, since the survival instinct as well as other properties are transferred by transferring genes between parents and offspring. During mating, in case the parent has some sort of disease, bad genes would also be transferred, and the offspring would have those genes too, which is opposite to survival instinct. Every species has inherited instinct that produces the best offspring. So, in every healthy population, the most of individuals share the best genes. However, such population may contain individuals which are not as good as others. They have genes which are the products of some sort of mutation and disease, and therefore, they have less chance to survive and mate. This small set of individuals indicates diversity of the population. This is only small example which can be identified in the nature.

Suppose that we have a problem with a set of potential solutions. The solutions can be treated as a population of species. So, they can mate to produce offspring (i.e., new solutions) that gets genetic materials from their parents. Similar to the survival instinct mentioned above, better solutions have more chance to survive/mate. Therefore, better solutions will have born after certain number of generations. The reason behind will become clearer later in this chapter, but it is useful to keep in mind that the solution process mimics biological evolution. The following section explores the relation between biology and GP in detail.

Complete Chapter List

Search this Book:
Reset