Modern science connects many basic secrets of living matter with the genetic codes. Biological organisms belong to a category of very complex natural systems, which correspond to a huge number of biological species with inherited properties. But surprisingly, molecular genetics has discovered that all organisms are identical to each other by their basic molecular-genetic structures. Due to this revolutionary discovery, a great unification of all biological organisms has happened in science. The information-genetic line of investigations has become one of the most prospective lines not only in biology, but also in science as a whole. A basic system of genetic coding has become strikingly simple. Its simplicities and orderliness presented challenges to specialists from many scientific fields. Bioinformatics considers each biological organism as an ensemble of information systems which are interrelated to each other. The genetic coding system is the basic one. All other biological systems must be correlated to this system to be transmitted to next generations of organisms.
The natural technology of genetic coding is a major and most effective technology of life on our planet. Using this natural technology, huge biomass of living matter with unique and valuable properties is produced around the world. Bioinformatics and biotechnology have been applied to many areas such as biology, medicine, and life sciences. Bioinformatics knowledge is used to manufacture of biological organisms with new properties, to extend of human life, to diagnose and treat disease, to clone organisms, to develop new computer technologies, to create new materials with unique characteristics, and so on. It seems that all fields of human life will be influenced in the future by progress in bioinformatics.
Modern science recognizes a key meaning of information principles for inherited self-organization of living matter. In view of this, the following statements have appeared in the recent literature.
- “Notions of “information” or “valuable information” are not utilized in physics of non-biological nature because they are not needed there. On the contrary, in biology notions “information” and especially “valuable information” are main ones; understanding and description of phenomena in biological nature are impossible without these notions. A specificity of “living substance” lies in these notions” (Chernavskiy, 2000).
- “If you want to understand life, don’t think about vibrant, throbbing gels and oozes, think about information technology” (Dawkins, 1991).
Here one should add that modern informatics is an independent branch of science, which possesses its own language and mathematical formalisms and exists together with physics, chemistry and other scientific branches. A problem of information evolution of living matter has been investigated intensively in the last decades in addition to studies of the classical problem of biochemical evolution.
One of the effective methods of cognition of complex natural system, including the genetic coding system, is the investigation of symmetries. Modern science knows that deep knowledge about phenomenological relations of symmetry among separate parts of a complex natural system can tell many important things about the evolution and mechanisms of these systems. Physics and other natural sciences have great numbers of successful applications of a symmetry method. Principles of symmetry have become one of the bases of mathematical natural science. Nowadays, many physical theories, beginning from the theory of relativity to quantum mechanics, are created as theories of invariants of mathematical groups of transformations, in other words as theories of special kinds of symmetry. The study of symmetries and asymmetries in molecular structures is one of the important branches of chemistry. For example, functional differences between the right forms of molecules and the left forms of molecules in living organisms have become known to mankind due to investigations of symmetry in biological molecules. Principles of symmetry have a new essential quality in modern science.
But not only physics and chemistry deal with principles and methods of symmetry, informatics and digital signal processing also pay great attention to them. How is theory of signal processing connected to geometry and geometrical symmetries? Signals are represented there in a form of a sequence of the numeric values of their amplitude in reference points. The theory of signal processing is based on an interpretation of discrete signals as a form of vector of multi-dimensional spaces. In each tact time a signal value is interpreted as the corresponding value of one coordinate of a multi-dimensional vector space of signals. In this way the theory of discrete signals turns out to be the science of geometries of multi-dimensional spaces. The number of dimensions of such a space is equal to the quantity of referent points for the signal. Metric notions and all other necessary things are introduced in these multi-dimensional vector spaces for those or other problems of maintenance of reliability, speed, economy of the signal information. For example, the important notions of the energy and the power of a discrete signal appear in multi-dimensional geometry of the space of signals as forms of a square of the length of a multi-dimensional vector-signal and of a square of the length of a vector-signal divided by the number of dimensions of an appropriate space. On this geometrical basis, many methods and algorithms of recognition of signals and images, coding information, detections and corrections of information mistakes, and artificial intellect and training of robots are constructed. One can add here about the importance of symmetries in permutations of components for coding signals, in spectral analysis of signals, in orthogonal and other transformations of signals, and so on.
An investigation of symmetrical and structural analogies between computer informatics and genetic informatics is one of the important tasks of modern science in connection to the creation of DNA-computers, DNA-robotics and to a development of bioinformatics. A significant part of this book describes the study of symmetries in matrix forms of the genetic code systems (“matrix genetics”). The results of this study are new examples of the usefulness of symmetry investigations in natural systems. In this book we first present matrix methods of presentation and the analysis of molecular ensembles of the genetic code systems. Secondly, we present special multi-dimensional matrix algebras related to the genetic code and describe the importance of phenomenological symmetries in matrix forms of presentation of the genetic code. Furthermore we present advanced patterns and applications.
A biological meaning of genetic informatics is reflected in the brief statement: "life is a partnership between genes and mathematics" (Stewart, 1999). But what kind of mathematics has partner relations with the genetic code and what kind of mathematics is behind genetic phenomenology which includes a great noise-immunity of the genetic code? This question is one of the main challenges in mathematical natural sciences today. A significant part of the challenge is the question of an adequate mathematics for the phenomenon of degeneracy of the genetic code. A character of this degeneracy is reflected in symmetrical patterns of black-and-white mosaics of genetic matrices of 64 triplets (for example, see a genetic matrix with a black-and-white symmetrical mosaic on Figure 2.2. in Chapter 2.
Why do genetic matrices of 64 triplets posess such symmetrical mosaics? Is degeneracy of the genetic code an accidental choice of nature? Is it provided by substantial mathematics of the genetic code? Is the construction of the genetic code non-accidental at all? The last question is essential because the famous hypothesis by F. Crick (1968) about “the frozen accident” in the origin of the genetic code has supposed that the first accidental system of coding, which possessed satisfactory features, was reproduced in biological evolution with its further evolutionary improvements.
We are searching for scientific answers to facilitate an analysis of the genetic code phenomenology from the viewpoint of mathematics of discrete signal processing, of computer informatics, and of noise-immunity coding in digital communication. This book describes substantial answers to these questions by means of discovering deep connections of the genetic code with hypercomplex numeric systems and their matrix algebras (which can be multi-dimensional algebras of operators simultaneously). These multi-dimensional algebras and their relevant geometries are interpreted in relation to multi-dimensional vector spaces of bioinformatics (or bioinformation vector spaces). An example of such an algebra is the 8-dimensional Yin-Yang-algebra (or the bipolar algebra), which is the algebra of degeneracy of the genetic code and which is described in Chapter 7. Recent progress in the determination of genomic sequences yields many millions of gene sequences now. But what do these sequences tell us and what generalities and rules govern them? The modern situation in the theoretic field of genetic informatics can be characterized by the following citation:
“What will we have when these genomic sequences are determined? What do we have now in the 10 million nucleotide of sequence data determined to date? We are in the position of Johann Kepler when he first began looking for patterns in the volumes of data that Tycho Brahe had spent his life accumulating. We have the program that runs the cellular machinery, but we know very little about how to read it. Bench biologists, by experiment and by close association with the data, have found meaningful patterns. Theoreticians, by careful reasoning and use of collections of data, have found others, but we still understand frustratingly little” (Fickett & Burks, 1989).
Kepler is mentioned here not without reason. The history of science shows the importance of cognitive forms of presentation of phenomenological data to find regularities or laws in this phenomenology. The work by Kepler is the classical example of an important meaning of a cognitive form of presentation of phenomenological data. He did not make his own astronomic observations, but he found the cognitive form of presentation in the huge astronomic data from the collection of Tycho Brahe. This discovered form, which was connected to the general idea of movements along ellipses, allowed him to formulate the famous Kepler’s laws of planetary movements relative to the Sun. Owing to this cognitive form, Kepler and Newton have led us to the law of Newtonian attraction.
A discovery of such a cognitive form of presentation in the case of the phenomenology of genetic code systems is one more challenge, which arises from the very beginning in the course of attempts to find regularities among a huge number of genetic data and to create a relevant theory. Matrix genetics proposes a new cognitive form of presentation of phenomenological data in the field of genetic informatics. This cognitive matrix form gives new tools to analyze and to model ensembles of the genetic code as well. It paves the way for a worthy attempt at answering the mentioned challenges.
SEARCHING FOR A SOLUTION
This book presents a matrix form of presentation of the genetic code as an effective cognitive form of presentation of relevant phenomenological data. An initial choice of such a form of presentation of molecular ensembles of the genetic code is explained by the following main reasons.
- Information is usually stored in computers in the form of matrices.
- Noise-immunity codes are constructed on the basis of matrices.
- Quantum mechanics utilizes matrix operators, connections with which can be detected in matrix forms of presentation of the genetic code. The significance of matrix approach is emphasized by the fact that quantum mechanics has arisen in a form of matrix mechanics by W. Heisenberg.
- Complex and hypercomplex numbers, which are utilized in physics and mathematics, possess matrix forms of their presentation. The notion of number is the main notion of mathematics and mathematical natural sciences. In view of this, investigation of a possible connection of the genetic code to multi-dimensional numbers in their matrix presentations can lead to very significant results.
- Matrix analysis is one of the main investigation tools in mathematical natural sciences. The study of possible analogies between matrices, which are specific for the genetic code, and famous matrices from other branches of sciences can be heuristic and useful.
- Matrices, which are a kind of union of many components in a single whole, are subordinated to certain mathematical operations, which determine substantial connections between collectives of many components. Such connections can be essential for collectives of genetic elements of different levels as well.
The authors utilize a presentation of molecular ensembles of genetic multiplets in the form of a Kronecker family of genetic matrices [C A; U G](n), where C, A, U, G are nitrogenous bases cytosine, adenine, uracil, guanine, and (n) is a Kronecker power. The genetic matrix [C A; U G](3) contains all 64 triplets in an ordering arrangement, which is comfortable and effective to study degeneracy of the genetic code. Kronecker families of square matrices are utilized in the theory of noise-immunity coding and of discrete signal processing. Applying these matrix families to genetic informatics is justified by a discrete character of the genetic code. This matrix form has allowed us to derive the following main results:
- new phenomenological rules of evolution of the genetic code;
- the connections of the genetic code structures with multi-dimensional numeric systems;
- multi-dimensional algebras for modelling and for analysing the genetic code systems;
- Hadamard matrices and matrices of a hyperbolic turn in the Kronecker family of genetic matrices;
- parallels with quantum computers;
- hidden interrelations between the golden section and parameters of genetic multiplets;
- relations between the Pythagorean musical scale and an important class of quint genetic matrices which show a molecular genetic basis with a sense of musical harmony and of aesthetics of proportions;
- cyclic algebraic principles in the structure of matrices of the genetic code;
- generalized hypercomplex numeric systems, which are new for mathematical natural sciences and which allow one to model a binary opposition of male and female beginnings on the level of genetic-molecular ensembles;
- materials for a chronocyclic conception, which connects structures of the genetic system with chrono-medicine and a problem of the internal clock of organisms;
- parallels with famous symbolic tables of the Ancient Chinese book “I Ching” which declares a cyclic principle in nature and which is very important for all Oriental medicine (acupuncture, pulse diagnostics of Tibetan medicine, and so on.);
- a new answer to the fundamental questions – “why are there 4 letter in the genetic alphabet?” and “why 20 amino acids?”
One of the most important results is that degeneracy of the genetic code agrees with the 8-dimensional algebra, which is unknown in modern mathematical natural science. This algebra and the elements of its multi-dimensional geometry are presented in Chapters 7 and 11. After the discovery of non-Euclidean geometries and of Hamilton quaternions, it is known that different natural systems can possess their own geometry and their own algebra. The genetic code is connected with its own multi-dimensional numerical system or the multi-dimensional algebra. This genetic algebra can be considered as the pre-code or the mathematical model of the genetic code. This algebra allows one to reveal hidden peculiarities of the structure and evolution of the genetic code. The genetic code has its own forms of ordering. It seems that many difficulties of modern bioinformatics are connected with utilizing inadequate algebras, which were developed for completely different natural systems. Hamilton had similar difficulties in his attempts to describe 3D-space transformations by means of 3-dimensional numbers while this description needs 4-dimensional quaternions. We proposed a new algebraic system for bioinformatics and for mathematical biology. The described results are interesting from the viewpoint of many modern tasks: creating computers from DNA molecules; understanding the genetic system as a quantum computer; creating new kinds of neurocomputers and cellular automata on the basis of principles of genetic code systems.
A set of these results and proposed matrix methods in the field of genetic forms a new scientific discipline – “matrix genetics”, which is related to symmetrical analyses and visual patterns of bioinformatics closely. This book can be considered as an introduction to matrix genetics. The main intended audiences are students and scientists in the fields of genetics, bioinformatics, theoretical biology, mathematical biology, computer informatics, neurocomputing, theory of symmetries, biotechnology, mathematics, theoretical physics, medicine, physiology, psychophysics, art design, music, cellular automata. Our mathematical approaches and results about structural peculiarities of genetic code systems increase knowledge and further investigations for many scientists and students. The presented genetic matrices and their ensembles are interesting not only by beautiful mathematical properties but, first of all, by their reflection of fundamental phenomenology of the genetic code. Therefore science will return to them in future at different levels of knowledge again and again.
ORGANIZATION OF THE BOOK
The book is organized into twelve chapters. A brief description of each chapter follows.
Chapter 1 is devoted to symmetrical analysis for genetic code systems. The genetic coding possesses noise-immunity. Mathematical theories of noise-immunity coding and discrete signals processing are based on matrix methods of representation and analysis of information. These matrix methods, which are connected closely with relations of symmetry, are borrowed for a matrix analysis of ensembles of molecular elements of the genetic code. This chapter describes a uniform representation of ensembles of genetic multiplets in the form of matrices of a cumulative Kronecker family. The analysis of molecular peculiarities of the system of nitrogenous bases reveals the first significant relations of symmetry in these genetic matrices. It permits one to introduce a natural numbering of the multiplets in each of the genetic matrices and to give the basis for further analysis of genetic structures. The connection of the numerated genetic matrices with famous matrices of dyadic shifts is demonstrated.
Chapter 2 describes symmetries of the degeneracy of the vertebrate mitochondrial genetic code in the mosaic matrix form of its presentation. The initial black-and-white genomatrix of this code is reformed into a new mosaic matrix when internal positions in all triplets are permuted simultaneously. It is revealed unexpectedly that for all six variants of positional permutations in triplets (1-2-3, 2-3-1, 3-1-2, 1-3-2, 2-1-3, 3-2-1) the appropriate genetic matrices possess symmetrical mosaics of the code degeneracy. Moreover the six appropriate mosaic matrices in their binary presentation have the general non-trivial property of their “tetra-reproduction”, which can be utilized in particular for mathematical modeling of the phenomenon of the tetra-division of gametal cells in meiosis. Mutual interchanges of the genetic letters A, C, G, U in the genomatrices lead to new mosaic genomatrices, which possess similar symmetrical and tetra-reproduction properties as well.
Chapter 3 demonstrates results of a comparative investigation of characteristics of degeneracy of all known dialects of the genetic code. This investigation is conducted on the basis of the results of symmetrological analysis, which were described in Chapter 2, about the division of the set of the 20 amino acids into the two canonical sub-sets: the sub-set of the 8 high-degeneracy acids and the sub-set of the 12 low-degeneracy acids. The existence of numerical and structural invariants in the set of these dialects is shown. The derived results from the comparative investigation permit one to formulate some phenomenological rules of evolution of these dialects. These numeric invariants and parameters of code degeneracy draw attention to the formal connection of this evolution with famous facts of chrono-biology and chrono-medicine. The chronocyclic conception of the functioning of molecular-genetic systems is proposed on this basis. The biophysical basis of this conception provides connection to the genetic code structures with mechanisms of photosynthesis which produce living substance by means of utilization of solar energy. And the solar energy comes cyclically on the surface of the Earth. The revealed numeric invariants of evolution of the genetic code give new approaches to the fundamental question, why do 20 amino acids exist? We will demonstrate new patterns of the genetic code systems.
Chapter 4 is devoted to a consideration of the Kronecker family of the genetic matrices but in the new numerical form of their presentation. This numeric presentation gives opportunities to investigate ensembles of parameters of the genetic code by means of system analysis including matrix and symmetric methods. In this way new knowledge is obtained about hidden regularities of element ensembles of the genetic code and about connections of these ensembles with famous mathematical objects and theories from other branches of science. First of all, this chapter demonstrates the connection of molecular-genetic system with the golden section and principles of musical harmony.
Chapter 5 uses the Gray code representation of the genetic code C = 00, U = 10, G = 11 and A = 01 (C pairs with G, A pairs with U) to generate a sequence of genetic code-based matrices. In connection with these code-based matrices, we use the Hamming distance to generate a sequence of numerical matrices. We then further investigate the properties of the numerical matrices and show that they are doubly stochastic and symmetric. We determine the frequency distributions of the Hamming distances, building blocks of the matrices, decomposition and iterations of matrices. We present an explicit decomposition formula for the genetic code-based matrix in terms of permutation matrices. Furthermore we establish a relation between the genetic code and a stochastic matrix based on hydrogen bonds of DNA. Using fundamental properties of the stochastic matrices, we determine explicitly the decomposition formula of genetic code-based biperiodic table. By iterating the stochastic matrix, we demonstrate the symmetrical relations between the entries of the matrix and DNA molar concentration accumulation. The evolution matrices based on genetic code were derived by using hydrogen bonds-based symmetric stochastic (2x2)-matrices as primary building blocks. The fractal structure of the genetic code and stochastic matrices were illustrated in the process of matrix decomposition, iteration and expansion corresponding to the fractal structure of the biperiodic table introduced by the authors.
Chapter 6 continues an analysis of the degeneracy of the vertebrate mitochondrial genetic code in the matrix form of its presentation, which possesses the symmetrical black-and-white mosaic. Taking into account a symmetry breakdown in molecular compositions of the four letters of the genetic alphabet, the connection of this matrix form of the genetic code with a Hadamard (8x8)-matrix is discovered. Hadamard matrices are one of the most famous and the most important kind of matrices in the theory of discrete signals processing and in spectral analysis. The special U-algorithm of transformation of the symbolic genetic matrix [C A; U G](3) into the appropriate Hadamard matrix is demonstrated. This algorithm is based on the molecular parameters of the letters A, C, G, U/T of the genetic alphabet. In addition the analogical relations is shown between Hadamard matrices and other symmetrical forms of genetic matrices, which are produced from the symmetrical genomatrix [C A; U G](3) by permutations of positions inside triplets. Many new questions arise due to the described fact of the connection of the genetic matrices with Hadamard matrices. Some of them are discussed here including questions about an importance of amino-group NH2 in molecular-genetic systems, and about possible relations with the theory of quantum computers, where Hadamard gates are utilized. A new possible answer is proposed to the fundamental question concerning reasons for the existence of four letters in the genetic alphabet. Some thoughts about cyclic codes and a principle of molecular economy in genetic informatics are presented as well.
Chapter 7 analyzes algebraic properties of the genetic code. The investigations of the genetic code on the basis of matrix approaches (“matrix genetics”) are described. The degeneracy of the vertebrate mitochondrial genetic code is reflected in the black-and-white mosaic of the (8*8)-matrix of 64 triplets, 20 amino acids and stop-signals. The special algorithm, which is based on features of genetic molecules, exists to transform the mosaic genomatrix into a numeric matrix, which is the matrix form of presentation of the special 8-dimensional genetic algebra. This algebra can be named as Yin-Yang-algebra or bipolar algebra. Main mathematical properties of this genetic algebra and its relations with other algebras are analyzed together with some important consequences from the adequate algebraic models of the genetic code. Elements of a new “genovector calculation” and ideas of “genetic mechanics” are discussed. The revealed fact of the relation between the genetic code and these genetic algebras, which define new multi-dimensional numeric systems, is discussed in connection with the famous idea by Pythagoras: “All things are numbers”. Simultaneously these genetic algebras can be utilized as the algebras of genetic operators in biological organisms. The described results are related to the problem of algebraization of bioinformatics. They draw attention to the question: what is life from the viewpoint of algebra?
Chapter 8 considers the octet Yin-Yang-algebra as the model of the genetic code. From the viewpoint of this algebraic model, for example, the sets of 20 amino acids and of 64 triplets consist of sub-sets of “male”, “female” and “androgynous” molecules, etc. This algebra allows one to reveal the hidden peculiarities of the structure and evolution of the genetic code and to propose the conception of “sexual” relationships among genetic molecules. The first results of the analysis of the genetic code systems from such an algebraic viewpoint speak about the close connection between evolution of the genetic code and this algebra. They include 7 phenomenological rules of evolution of the dialects of the genetic code. The evolution of the genetic code appears as the struggle between male and female beginnings. The hypothesis about new biophysical factor of “sexual” interactions among genetic molecules is proposed. The matrix forms of presentation of elements of the genetic octet Yin-Yang-algebra are connected with Hadamard matrices by means of the simple U-algorithm. Hadamard matrices play a significant role in the theory of quantum computers, in particular. It leads to new opportunities for the possible understanding of genetic code systems as quantum computer systems. Revealed algebraic properties of the genetic code allow one to put forward the problem of algebraization of bioinformatics on the basis of the algebras of the genetic code.
Chapter 9 returns to the kind of numeric genetic matrices, which were discussed in Chapter 4-6. This kind of genomatrix is not connected with the degeneracy of the genetic code directly, but it is related to some other structural features of genetic code systems. The connection of the Kronecker families of such genomatrices with special categories of hypercomplex numbers and with their algebras is demonstrated. Hypercomplex numbers of these two categories are named “matrions of a hyperbolic type” and “matrions of a circular type”. These hypercomplex numbers are a generalization of complex numbers and double numbers. Mathematical properties of these additional categories of algebras are presented. A possible meaning and possible applications of these hypercomplex numbers are discussed. The investigation of these hyperbolic numbers in connection with the parameters of molecular systems of the genetic code can be considered as a continuation of the Pythagorean approach to understanding natural systems.
Chapter 10 describes data suggesting a connection between matrix genetics and one of the most famous branches of mathematical biology: phyllotaxis laws of morphogenesis. Thousands of scientific works are devoted to this morphogenetic phenomenon, which relates with Fibonacci numbers, the golden section and beautiful symmetrical patterns. These typical patterns are realized by nature in a huge number of biological bodies on various branches and levels of biological evolution. Some matrix methods are known for a long time to simulate in mathematical forms these phyllotaxis phenomena. This chapter describes connections of the famous Fibonacci (2x2)-matrices with genetic matrices. Some generalizations of the Fibonacci matrices for cases of (2nx2n)-matrices are proposed. Special geometrical invariants, which are connected with the golden section and Fibonacci numbers and which characterize some proportions of human and animal bodies, are described. All these data are related to matrices of the genetic code in some aspects.
Chapter 11 presents data about cyclic properties of the genetic code in its matrix forms of presentation. These cyclic properties concern cyclic changes of genetic Yin-Yang-matrices and their Yin-Yang-algebras at many kinds of circular permutations of genetic elements in genetic matrices. These circular permutations lead to such reorganizations of the matrix form of presentation of the initial genetic Yin-Yang-algebra that such matrices serve as matrix forms of presentations of new Yin-Yang-algebras. They are connected algorithmically with Hadamard matrices. New patterns and relations of symmetry are described. The discovered existence of a hierarchy of the cyclic changes of genetic Yin-Yang-algebras allows one to develop new algebraic models of cyclic processes in bioinformatics and in other related fields. These cycles of changes of the genetic 8-dimensional algebras and of their 8-dimensional numeric systems have many analogies with famous facts and doctrines of modern and ancient physiology, medicine, etc. This viewpoint proposes that the famous idea by Pythagoras (about organization of natural systems in accordance with harmony of numerical systems) should be combined with the idea of cyclic changes of Yin-Yang-numeric systems in considered cases. This second idea suggests the ancient idea of cyclic changes in nature. From such an algebraic-genetic viewpoint, the notion of biological time can be considered as a factor in coordinating these hierarchical ensembles of cyclic changes of the genetic multi-dimensional algebras.
Chapter 12 considers the topic of connections of the genetic code with various fields of culture and with inherited physiological properties which provide existence of these fields. Some examples of such physiological bases for branches of culture are described. These examples are related to linguistics, music and physiology of color perception. Special attention is paid to connections between the genetic matrices and the system of the Ancient Chinese book “I Ching”. The conception and its arguments are put forward that the famous table of 64 hexagrams of “|I Ching” reflects the notions of Ancient Chinese about music quint harmony as a universal archetype.
Sergey Petoukhov, Russian Academy of Sciences, Russia
Matthew He, Nova Southeastern University, USA