Prediction of protein secondary structure (alpha-helix, beta-sheet, coil) from primary sequence of amino acids is a very challenging and difficult task, and the problem has been approached from several angles. A protein is a sequence of amino acid residues and can thus be considered as a one dimensional chain of ‘beads’ where each bead correspond to one of the 20 different amino acid residues known to occur in proteins. The length of most protein sequence ranges from 50 residues to about 1000 residues but longer proteins are also known, e.g. myosin, the major protein of muscle fibers, consists of 1800 residues (Altschul et al. 1997). Many techniques were used many researchers to predict the protein secondary structure, but the most commonly used technique for protein secondary structure prediction is the neural network (Qian et al. 1988). This chapter discusses a new method combining profile-based neural networks (Rost et al. 1993b), Simulated Annealing (SA) (Akkaladevi et al. 2005; Simons et al. 1997), Genetic algorithm (GA) (Akkaladevi et al. 2005) and the decision fusion algorithms (Akkaladevi et al. 2005). Researchers used the neural network (Hopfield 1982) combined with GA and SA algorithms, and then applied the two decision fusion methods; committee method and the correlation methods and obtained improved results on the prediction accuracy (Akkaladevi et al. 2005). Sequence profiles of amino acids are fed as input to the profile-based neural network. The two decision fusion methods improved the prediction accuracy, but noticeably one method worked better in some cases and the other method for some other sequence profiles of amino acids as input (Akkaladevi et al. 2005). Instead of compromising on some of the good solutions that could have generated from either approach, a combination of these two approaches is used for obtaining better prediction accuracy. This criterion is the basis for the Bayesian inference method (Anandalingam et al. 1989; Schmidler et al. 2000; Simons et al. 1997). The results obtained show that the prediction accuracy improves by more than 2% using the combination of the decision fusion approach and the Bayesian inference method.
A lot of interesting work has been done on protein secondary structure prediction problem, and over the last 10 to 20 years the methods have gradually improved in accuracy. The most successful application of neural networks (Hopfield 1982) to secondary structure prediction was obtained by Rost and Sander (Rost et al. 1993b; Rost et al. 1993c; Rost 1996; Rost et al. 1994), which resulted in the prediction mail server called PHD (Rost et al. 1993c). Using profile-based neural network and a few other methods, the performance of the network is reported to be up to 67.2% (Rost et al. 1993b).
Key Terms in this Chapter
Neural Network: A Neural Network is an information processing paradigm that is inspired by the way biological nervous systems, such as the brain, process information. The key element of this paradigm is the novel structure of the information processing system. It is composed of a large number of highly interconnected processing elements (neurons) working in unison to solve specific problems.
Simulated Annealing Algorithm: Simulated annealing (SA) is a generic probabilistic meta-algorithm for the global optimization problem, namely locating a good approximation to the global optimum of a given function in a large search space.
Profile-Based Neural Network: This type of neural network configuration results when we feed the multiple alignments in the form of a sequence profile (for each position an amino acid frequency vector is fed to the network) instead of a base sequence to a neural network.
Genetic Algorithm: Genetic Algorithms (GAs) are adaptive heuristic search algorithm premised on the evolutionary ideas of natural selection and genetic. The basic concept of GAs is designed to simulate processes in natural system necessary for evolution, specifically those that follow the principles first laid down by Charles Darwin of survival of the fittest. As such they represent an intelligent exploitation of a random search within a defined search space to solve a problem.
Secondary Structure: In biochemistry and structural biology, secondary structure is the general three-dimensional form of local segments of biopolymers such as proteins and nucleic acids (DNA/RNA).
Bayesian Inference: Bayesian inference is statistical inference in which evidence or observations are used to update or to newly infer the probability that a hypothesis may be true.
Protein: A large molecule composed of one or more chains of amino acids in a specific order determined by the base sequence of nucleotides in the DNA coding for the protein.
Decision Fusion: The process of combining classifiers is called decision fusion. Results from different methods, algorithms, sources or classifiers can often be combined (fused) to give estimates of a better quality than could be obtained from any of the individual sources alone.