Syntactic Pattern Based Word Alignment for Statistical Machine Translation

Syntactic Pattern Based Word Alignment for Statistical Machine Translation

Quang-Hung LE (Faculty of Information Technology, Quynhon University, Vietnam & University of Engineering and Technology, Vietnam National University, Hanoi, Vietnam) and Anh-Cuong LE (University of Engineering and Technology, Vietnam National University, Hanoi, Vietnam)
Copyright: © 2014 |Pages: 10
DOI: 10.4018/ijkss.2014070103
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Word alignment is the task of aligning bilingual words in a corpus of parallel sentences, and determining the probabilities for these aligned bilingual word pairs. It is the most important factor affecting the quality of any Statistical Machine Translation (SMT) systems. The IBM word alignment models are most well-known in the SMT research community. These models are pure statistical models and therefore they are not good for some language pairs which have differences in linguistic aspects (e.g. grammatical structures). This paper aims to improve the IBM models by using syntactic information. The authors first propose a new type of constraint based on bilingual syntactic patterns, and then integrate it into the IBM models. Finally, they show how to estimate the models' parameters using this new type of constraint. The experiments are conducted on the English-Vietnamese language pair for evaluation.
Article Preview

2. Background

2.1. Word alignment

Given a source language sentence f consisting of J words and a target language sentence e consisting of I words , the alignmentabetweeneandfis defined as a subset of the Cartesian product of the word positions:

(1)

The alignment a connects words in the target language sentence to words in the source language sentence. The set of alignments a is defined as the set of all possible connections between each word at position i in the target language sentence to one word at position j in the source language sentence. Figure 1 illustrates a word alignment between an English-Vietnamese sentence pair. The Vietnamese word tôi is aligned to the English word me because they are translations of one another. Similarly, the Vietnamese word vượt is aligned to the English word passed, etc.

Figure 1.

An example of a word alignment between an English-Vietnamese sentence pair, the English and Vietnamese words are listed vertically and horizontally, respectively. The dark grey cells indicate the correspondences between the words in the two languages.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 8: 4 Issues (2017): 3 Released, 1 Forthcoming
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing