Automatic natural language processing captures a lion’s share of the attention in open information management. In one way or another, many applications have to deal with natural language input. In this chapter the authors investigate the problem of natural language parsing from the perspective of biolinguistics. They argue that the human mind succeeds in the parsing task without the help of languagespecific rules of parsing and language-specific rules of grammar. Instead, there is a universal parser incorporating a universal grammar. The main argument comes from language acquisition: Children cannot learn language specific parsing rules by rule induction due to the complexity of unconstrained inductive learning. They suggest that the universal parser presents a manageable solution to the problem of automatic natural language processing when compared with parsers tinkered for specific purposes. A model for a completely language independent parser is presented, taking a recent minimalist theory as a starting point.
In linguistics, as in the field of parsing technologies mentioned above, there are basically two different perspectives one can assume. First, one can concentrate on the detailed description of a specific language. There are around 6000 languages spoken around the world, each with its own intricate rules of construction, vocabulary, and stylistic rules (Comrie, 2001; Greenberg, 1963). Each individual grammar can be further dissolved into several interacting levels, such as semantics, syntax, morphology, morphosyntax, phonology and phonetics. It is thus possible to develop specialized grammatical systems, specialized scientific techniques, and nomenclature for the description of different languages and their subcomponents. The first attempts in this direction were made already two thousand years ago, as in the case of Panini’s grammar for Sanskrit. Moreover, such descriptions can achieve considerable precision due to the fact that the different levels of human language consist of smaller units that are put together according to well-defined, combinatorial rules.
Some 50 years ago linguists studying natural language grammars began to pursue a different track. Instead of developing new technologies and methods for the description of individual languages and their subcomponents, they studied the method 2-4-year old children use in order to break the code of their own native language. It is well-known that this happens effortlessly, without much linguistic input or cognitive sophistication, and in effect in a couple of years (Chomsky, 1969; Graffi, 2001; Marcus, 1993; Moro, 2008; Pinker, 1994). When the problem is set this way, it becomes feasible to find out the cognitive representational apparatus that the child uses when acquiring and using her own language(s).