A Critical Review of the Current State of Natural Language Processing in Mexico and Chile

César Aguilar (Pontificia Universidad Católica de Chile, Chile) and Olga Acosta (Singularyta SpA, Chile)
This chapter presents a critical review of the current state of natural language processing in Chile and Mexico. Specifically, a general review is made regarding the technological evolution of these countries in this area of research and development, as well as the progress they have made so far. Subsequently, the remaining problems and challenges are addressed. Specifically, two are analyzed in detail here: (1) the lack of a strategic policy that helps to establish stronger links between academia and industry and (2) the lack of a technological inclusion of the indigenous languages, which causes a deep digital divide between Spanish (considered in Chile and Mexico as their official language) with them.
This chapter presents a critical review about the evolution of natural language processing (NLP) in Mexico and Chile, in order to provide information that allows us to have an idea about the status of this line of research and development in both, as well as the challenges and future projections that can be currently identified. For the purposes of this work, these two countries are considered due to the strategic value they provide today to the relationship between academia and business, with a view to developing their own technological niche, which has a positive impact on the economy of both countries, and eventually become competitive internationally.

The justification of this text is justified by the shortage of academic papers that speak on the subject. In general, the issue has been approached from a business perspective, through reports that offer a summary view regarding the current state of language technologies in Latin America. However, as this chapter tries to show, the panorama is much more complex, especially if taken into account that the region is a multilingual area, where languages such as Spanish and Portuguese coexist with languages such as Nahuatl, Maya or the Mapuche, to mention just a few.

The methodology that has been used to obtain the information presented here consists of a review of several similar reports and documents generated by consultants and government entities, as well as some academics who have been interested in the subject. Therefore, without pretending to exhaust the problem, this chapter shows a description of the current state in NLP in both countries focusing on the following points:

    A general description of the projects carried out in both countries related to NLP.

    A briefly point out some collaborative initiatives between the two countries related to this topic.

    A summarized exposition about some advances made at the industrial level, emphasizing the potential that Chile and Mexico have to develop technologies that can innovate in the area.

Finally, some future challenges are identified, taking into account the strengths and weaknesses that exist today to invest in the development of language technologies in these two countries.


The Situation Of Latin America

Latin America is currently in the transition from an economy that produces raw materials to other that conceives the knowledge as a generator of technological transformations. Nevertheless, such transition has not been easy. Since the last decade there is a fluctuation process, which has directly affected both the government and the industry in their ability to invest in science and technology. In the following graph, we can observe the variations of GDP around the world from 1963 to 2013:

Key Terms in this Chapter

Digital Divide: It refers to an unequal distribution regarding access and use of Information or Computational Technologies among communities. This inequality is due to political, economic, cultural, racial, gender, etc. Initially, the term focused on the lack of means to access the Internet, either by mobile phone or by computers. However, today it refers to any technological deficiency, so it can be used as a parameter to determine the distribution of wealth in a society, takes into account the technological divide as an indicator of the economic and social deficiencies that such communities suffer.

Computational Linguistics: It is a subfield of the NLP, oriented to develop useful mathematical models to understand and generate human language from computers and other similar electronic machines. Taking this orientation into account, computational linguistics maintains a close relationship with the cognitive sciences, in particular with cognitive psychology, as well as the philosophy of mind and language. Likewise, computational linguistics has established close ties with statistics, which has allowed it to interact with innovative lines of research such as machine learning, which has had a positive impact on its work projects.

Language Engineering: It is a line of research and development focused on creating electronic tools capable of processing natural language, either in oral or written format. Unlike NLP and computational linguistics, linguistic engineering has a more applied perspective, so its priority is the design and implementation of such tools.

Artificial Intelligence: It is an interdisciplinary line of research focused on the design and construction of intelligent machines, which are seen as agents capable of simulating human rationing that allow them to solve specific problems. An example is the development of machines capable of understanding and generating human language.

Knowledge Economy: It is an emerging economic area, oriented towards the creation of goods and services derived from the exploitation of specialized knowledge, provided by highly qualified workers. Such knowledge is acquired by these workers from their university studies, therefore universities acquire a strategic value as centers capable to generate innovation and disruptive knowledge. In this sense, the knowledge economy is located in a post-industrial stage, since it is oriented towards a services market, where what is exchanged, in addition to industrial and human resources, ideas, designs and useful concepts to improve the productivity.

Linguistic Death: It refers to a linguistic process that occurs when a language loses its last native speaker, thus leading to its extinction. In addition to this loss, the most serious problem occurs when there is no oral or written record of that language. Such a process is different from what has happened with Latin, since although this language lost its acoustic registers, in contrast it has extensive documentation, in addition to its evolution giving rise to new languages. The death of a language may be due to historical wear, or it may be caused by linguistic policies, e. g., in cases where a country decides to adopt one or more official languages, those that are not recognized with this status may tend to disappear over time.

Natural Language Processing: is an interdisciplinary research area that integrates theories and methods from linguistics and computer science, particularly artificial intelligence, in order to create models and tools for understanding and generate human language. Natural language processing can be divided into two major sub-domains: (i) computational linguistics, focused on solving theoretical questions involved in designing computational models of human language; and (ii) linguistic engineering, geared more towards the development of computational tools and resources capable of processing oral data (e.g.: sequences of dialogues) or written (e.g.: text corpora).

