Machine Translation within Commercial Companies

Machine Translation within Commercial Companies

Tomáš Hudík (Teradata Corporation, Czech Republic)
DOI: 10.4018/978-1-4666-8690-8.ch010
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

This chapter gives a short introduction to machine translation (MT) and its use within commercial companies with special focus on the localization industry. Although MT is not a new field, many scientists and researchers are still interested in this field and are frequently coming up with challenges, discoveries and novel approaches. Commercial companies need to keep track with them and their R&D departments are making good progress with the integration of MT within their complicated workflows as well as minor improvements in core MT in order to gain a competitive advantage. The chapter describes differences in research within university and commercial environments. Furthermore, there will be given the main obstacles in the deployment of new technologies and typical way in which a new technology can be deployed in corporate environment.
Chapter Preview
Top

Introduction

Machine translation is a part of computational linguistic. Its aim is to use machines for easier, faster and cheaper translations. There are more areas which closely correlate with MT. For example, Computer Aided Translation (CAT) which uses software tools to facilitate the translation process, spell checkers, terminologies, concordances, translation memories tools etc. There have already been developed software packages which wrap all these small applications into one big program like SDL Trados, Déjà Vu, MemSource and many others. The 1990s witnessed the expansion of the CAT tools market by making the software affordable to small businesses and freelance translators, but prices and the resource requirements were still too high. The introduction of the Internet and the possibility for translators to exchange data worldwide required adaptation and the introduction of generally acceptable standards. Translation memories represented such a standard, and their adoption was soon followed by an exponential growth of the market for CAT software (Gocci, 2009).

The main difference between MT and CAT tools is that while MT programs are used for the translation process itself, CAT tools tend to help a human translator. In the real world projects, both of them are used in parallel. CAT tools for preparing the translation process and MT for the translation itself, then CAT tools are used again for the evaluation, post-editing done by humans and finally for shipment to original format.

In this chapter, the focus is on machine translation of natural languages (e.g., from English to Russian), we will not describe various types of translators translating artificial languages such as programming languages or automata.

The history of machine translation can be traced back to the 17th century when Leibnitz and Descartes laid theoretical aspects of the first translators based on mechanical devices; however, those theories have never been implemented. In the 1930s, some interesting works, such as Mechanical Brain patent, dealing with MT appeared. The first suggestion that electronic computers could be used to translate from one natural language into another was written by Andrew D. Booth and Warren Weaver in 1948. As the computer era started, two main MT paradigms appeared. They have been up to now in use. The first is rule-based where humans are trying to identify various language rules and the second is called statistical MT where rules are identified by computers themselves based on statistics applied to big training sets. In the 1950s and early 1960s, MT became very popular, expectations were high and research was heavily funded. The constant threat of the Cold War caused euphoria in government and military circles regarding the anticipated possibilities of MT. Until 1966, great amounts of money were spent in order to develop MT systems, mostly for the English-Russian language constellation. Mainly rule-based MT was popular in those days. It was the preferred paradigm since there were not enough bilingual training sets for statistical MT and also computers were not powerful enough to provide complex math operations on large datasets. On the other hand, rule-based MT requires two professions – a linguist who creates rules and a programmer coding the rules into the machine. People have continually realized that acquired results were not good enough and MT hype started to decrease. With the publication of the Automatic Language Processing Advisory Committee (ALPAC) – report in 1964, by the request of the US administration, the CIA and the National Science Foundation, the funding decreased immediately, due to the prognosis that MT is neither useful nor does it seem to provide any considerable advance or meaningful progress. With the exception of some practically oriented teams in Europe and the USA, research and development of MT expired.

Since the 1970s, MT research was slowly revitalized and headed to a continuously increasing popularity from the beginning of the 1980s (Stein, 2013). Due to rapid development of computer hardware and computer technologies, statistical machine translation became a more significant branch of MT since 1990. This era started with the invention of IBM translation models in the ‘80s (Brown, Pietra, Pietra, & Mercer, 1993). Nowadays there can be seen another big wave of interest in MT with many people involved and researching in the field. There are many conferences, projects and grants like MT Marathons, International Association for Machine Translation (IAMT), The Prague Bulletin of Mathematical Linguistics (PBML) and many others. Contrary to the hype in the 1960s, at this time also the private sector (many translation companies) is using and enjoying benefits of MT.

Complete Chapter List

Search this Book:
Reset