Protein-Protein Interactions (PPI) via Deep Neural Network (DNN)

Protein-Protein Interactions (PPI) via Deep Neural Network (DNN)

Zizhe Gao, Hao Lin
DOI: 10.4018/978-1-7998-8455-2.ch006
(Individual Chapters)
No Current Special Offers


Entering the 21st century, computer science and biological research have entered a stage of rapid development. With the rapid inflow of capital into the field of significant health research, a large number of scholars and investors have begun to focus on the impact of neural network science on biometrics, especially the study of biological interactions. With the rapid development of computer technology, scientists improve or perfect traditional experimental methods. This chapter aims to prove the reliability of the methodology and computing algorithms developed by Satyajit Mahapatra and Ivek Raj Gupta's project team. In this chapter, three datasets take the responsibility to testify the computing algorithms, and they are S. cerevisiae, H. pylori, and Human-B. Anthracis. Among these three sets of data, the S. cerevisiae is the core subset. The result shows 87%, 87.5%, and 89% accuracy and 87%, 86%, and 87% precision for these three data sets, respectively.
Chapter Preview


Neural network computing, as the name suggests, is to simulate the operation mode of the computer to achieve the simulation of human brain neural autonomous judgment ability, adaptability, and the ability of multi-project integration and parallel processing. From a biological perspective, the human cerebral cortex contains more than 10 billion neurons connected to form a neural network. Information is collected through the body's sense of touch, taste and other sensory organs. The collected data is transmitted to the central nervous system and the connection points of the neural network. Then the data collected will be screened, sorted out and cleaned by the nerve center system, and then transmitted to the whole body according to different analysis results to coordinate the functions of various organs in the body.

Figure 1.

Neural network framework of inputs and functions to produce perceptron.


From a historical perspective, in the 1940s, McCulloch and Pitts discovered the work of turning logic devices into neurons (Palm, 1986); In the 1960s, Rosenblatt proposed the Perceptron model (Fig. 1), which utilizes simulated learning and recognition functions. It has several input terminals to represent the reception of signals from input or another perceptron. After summation of these signals, outputs or transmits them to other perceptrons through excitation functions. Nerve cells dendrites and axons, some close connection between neurons, some loose connection, and performance are high on the side of each connection weights on the perception of low, save the weight information is neural network acquisition of “knowledge”. However, due to the data dependence, use limitations and forced to end. Until recent years, modern high-tech approaches have provided a solid foundation for research into artificial intelligence, neural networks, deep learning, and other technologies, and the consumer market's expectation of intelligent devices has fueled researchers' enthusiasm for the development of neural networks and other related technologies (Admin, 2021).

Compared with the traditional computing methods, the deep neural network algorithm has the ability to self-adaptation and self-organization (Mahapatra, Gupta, Sahu, & Panda, 2021). The traditional computing methods rely on the knowledge and ability written by the programmers to the systematic program. Hence, it primarily depends on the early programmer to write the code, as well as the maintenance and update of the database. In this traditional model, the computing power and intelligence of the computer will be dramatically limited. They were looking at the new updated deep neuron network. The neural network algorithm provides machines with self-learning, self-identification and this specific ability can develop the computer beyond the capacity of its designer itself. Typically, the program designer sets up a learning method for the neural network algorithm or the content to imitate and periodically provides a sample standard for the computer to replicate. The other way is to let the algorithm learn on its own. In this case, the program designer only needs to build a learning instruction for the algorithm, and the algorithm will learn accordingly to the type and pattern of the imported data sets to achieve the goal of identifying the regularity and characteristics of the environment (Oja, 1994).

From the perspective of actual demand, in the 21st century, with the global outbreak of COVID-19 since late 2020, medical and health care has become the focus of exploring international research institutions. There is also a growing trend in understanding Protein-Protein Interactions (PPIs) around the world, such as the proteins that are the core building blocks of viruses. The PPI can provide series of exchanges of amino acids between protein molecules. After extracting the interactions between the molecules, the deep neuron network algorism can develop an ability to predict the future interactions in any given protein. That is to say, the core of a virus is a protein that interacts with each other to reproduce, which means that understanding these interactions could speed up efforts to defeat the virus.

Key Terms in this Chapter

Conjoint analysis: By assuming that the products have specific attributes, the actual effects are simulated, and then consumers are allowed to evaluate these virtual products according to their preferences. In addition, these characteristics are separated from the utility of attribute level by the mathematical statistics method so that the importance of each attribute and attribute level can be quantitatively evaluated.

Amino-Acid Composition (AAC): Amino acid composition gives the fraction of each amino acid type within a protein.

Receiver Operating Characteristic Curve (ROC): According to a series of different dichotomies (demarcation or determination threshold), the valid positive rate (sensitivity) is plotted on the vertical axis, and the false positive rate (1-specificity) is plotted on the abscissa.

Amino Acid: An organic acid containing an amino group, and it is the building block of proteins. Mainly from proteolysis, it can also be synthesized chemically or by microbial fermentation.

Protein-Protein Interactions (PPI): A reaction in which two or more proteins are joined together by physical contact. Typically, biologists use this method to find new combinations and new.

Naive Bayes (NB): NB model is a classification model based on Bayes' theorem and independent assumption of characteristic conditions. It requires few estimated parameters, is not sensitive to missing data, and the algorithm is relatively simple. In theory, the NB model has the smallest error rate compared with other classification methods.

K-Nearest Neighbor (KNN): A more mature method in theory, but also one of the simplest machine learning algorithms. In the feature space, if the majority of the k nearest (i.e., the closest in the feature space) samples in the vicinity of a sample belong to a specific category, then the sample also belongs to this category.

Deep Neural Network (DNN): A technology in the field of Machine Learning that can use statistical learning methods to extract high-level features from original sensory data and obtain an adequate representation of input space in a large amount of data.

Rectified Linear Unit (ReLU): In the neural network, linear rectification, as the neuron's activation function, defines the nonlinear output of the neuron after the linear transformation. In other words, for the input vectors from the upper layer of the neural network that enters the neuron, the neuron using the linear rectification activation function will be output to the next layer of the neuron or as the output of the entire neural network (depending on the position of the existing neuron in the network structure).

Grid Search: Grid Search is about traversing every intersection in the grid to find the best combination. The dimension of the grid is the number of over arguments. If we have k super parameters, and each super parameter has m candidates, we have to go through km combinations.

Complete Chapter List

Search this Book: