Natural Language Processing and Biological Methods

Natural Language Processing and Biological Methods

Gemma Bel Enguix (Rovira i Virgili University, Spain) and M. Dolores Jiménez López (Rovira i Virgili University, Spain)
Copyright: © 2009 |Pages: 6
DOI: 10.4018/978-1-59904-849-9.ch171
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

During the 20th century, biology—especially molecular biology—has become a pilot science, so that many disciplines have formulated their theories under models taken from biology. Computer science has become almost a bio-inspired field thanks to the great development of natural computing and DNA computing. From linguistics, interactions with biology have not been frequent during the 20th century. Nevertheless, because of the “linguistic” consideration of the genetic code, molecular biology has taken several models from formal language theory in order to explain the structure and working of DNA. Such attempts have been focused in the design of grammar-based approaches to define a combinatorics in protein and DNA sequences (Searls, 1993). Also linguistics of natural language has made some contributions in this field by means of Collado (1989), who applied generativist approaches to the analysis of the genetic code. On the other hand, and only from theoretical interest a strictly, several attempts of establishing structural parallelisms between DNA sequences and verbal language have been performed (Jakobson, 1973, Marcus, 1998, Ji, 2002). However, there is a lack of theory on the attempt of explaining the structure of human language from the results of the semiosis of the genetic code. And this is probably the only arrow that remains incomplete in order to close the path between computer science, molecular biology, biosemiotics and linguistics. Natural Language Processing (NLP) –a subfield of Artificial Intelligence that concerns the automated generation and understanding of natural languages— can take great advantage of the structural and “semantic” similarities between those codes. Specifically, taking the systemic code units and methods of combination of the genetic code, the methods of such entity can be translated to the study of natural language. Therefore, NLP could become another “bio-inspired” science, by means of theoretical computer science, that provides the theoretical tools and formalizations which are necessary for approaching such exchange of methodology. In this way, we obtain a theoretical framework where biology, NLP and computer science exchange methods and interact, thanks to the semiotic parallelism between the genetic code and natural language.
Chapter Preview
Top

Background

Most current natural language approaches show several facts that somehow invite to the search of new formalisms to account in a simpler and more natural way for natural languages. Two main facts lead us to look for a more natural computational system to give a formal account of natural languages: a) natural language sentences cannot be placed in any of the families of the Chomsky hierarchy (Chomsky, 1956) in which current computational models are basically based, and b) rewriting methods used in a large number of natural language approaches seem to be not very adequate, from a cognitive perspective, to account for the processing of language.

Now, if to these we add (1) that languages that have been generated following a molecular computational model are placed in-between Context-Sensitive and Context-Free families; (2) that genetic model offers simpler alternatives to the rewriting rules; (3) and that genetics is a natural informational system as natural language is, we have the ideal scene to propose biological models in NLP.

The idea of using biological methods in the description and processing of natural languages is backed up by a long tradition of interchanging methods in biology and natural/formal language theory:

Key Terms in this Chapter

Neural Network: Interconnected group of artificial neurons that uses a mathematical or a computational model for information processing based on a connectionist approach to computation. It involves a network of simple processing elements that can exhibit complex global behaviour.

Mutations: Several types of transformations in a single string.

Grammar Systems Theory: A consolidated and active branch in the field of formal languages that provides syntactic models for describing multi-agent systems at the symbolic level using tools from formal languages and grammars.

Splicing: Operation which consists of splitting up two strings in an arbitrary way and sticking the left side of the first one to the right side of the second one (direct splicing), and the left side of the second one to the right side of the first one (inverse splicing).

Natural Computing: Research field that deals with computational techniques inspired by nature and natural systems. This type of computing includes evolutionary algorithms, neural networks, molecular computing and quantum computing.

Multi-Agent System: A system composed of a set of computational agents that perform local problem solving and cooperatively interact to solve a single problem (or reach a goal) difficult to be solve (achieved) by an individual agent.

Membrane Systems: In a membrane system multisets of objects are placed in the compartments defined by the membrane structure that delimits the system from its environment. Each membrane identifies a region, the space between it and all directly inner membranes. Objects evolve by means of reaction rules associated with compartments, and applied in a maximally parallel, nondeterministic manner. Objects can pass through membranes, membranes can change their permeability, dissolve and divide.

Complete Chapter List

Search this Book:
Reset