Syntactic Pattern Based Word Alignment for Statistical Machine Translation

Syntactic Pattern Based Word Alignment for Statistical Machine Translation

Quang-Hung LE, Anh-Cuong LE
Copyright: © 2014 |Pages: 10
DOI: 10.4018/ijkss.2014070103
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Word alignment is the task of aligning bilingual words in a corpus of parallel sentences, and determining the probabilities for these aligned bilingual word pairs. It is the most important factor affecting the quality of any Statistical Machine Translation (SMT) systems. The IBM word alignment models are most well-known in the SMT research community. These models are pure statistical models and therefore they are not good for some language pairs which have differences in linguistic aspects (e.g. grammatical structures). This paper aims to improve the IBM models by using syntactic information. The authors first propose a new type of constraint based on bilingual syntactic patterns, and then integrate it into the IBM models. Finally, they show how to estimate the models' parameters using this new type of constraint. The experiments are conducted on the English-Vietnamese language pair for evaluation.
Article Preview
Top

2. Background

2.1. Word alignment

Given a source language sentence f consisting of J words ijkss.2014070103.m01 and a target language sentence e consisting of I words ijkss.2014070103.m02, the alignmentabetweeneandfis defined as a subset of the Cartesian product of the word positions:

ijkss.2014070103.m03
(1)

The alignment a connects words in the target language sentence to words in the source language sentence. The set of alignments a is defined as the set of all possible connections between each word at position i in the target language sentence to one word at position j in the source language sentence. Figure 1 illustrates a word alignment between an English-Vietnamese sentence pair. The Vietnamese word tôi is aligned to the English word me because they are translations of one another. Similarly, the Vietnamese word vượt is aligned to the English word passed, etc.

Figure 1.

An example of a word alignment between an English-Vietnamese sentence pair, the English and Vietnamese words are listed vertically and horizontally, respectively. The dark grey cells indicate the correspondences between the words in the two languages.

ijkss.2014070103.f01

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024)
Volume 14: 1 Issue (2023)
Volume 13: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing