Handling of Infinitives in English to Sanskrit Machine Translation

Handling of Infinitives in English to Sanskrit Machine Translation

Vimal Mishra (Banaras Hindu University, India) and R. B. Mishra (Banaras Hindu University, India)
Copyright: © 2010 |Pages: 16
DOI: 10.4018/jalr.2010070101
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

The development of Machine Translation (MT) system for ancient language like Sanskrit is a fascinating and challenging task. In this paper, the authors handle the infinitive type of English sentences in the English to Sanskrit machine translation (EST) system. The EST system is an integrated model of a rule-based approach of machine translation with Artificial Neural Network (ANN) model that translates an English sentence (source sentence) into the equivalent Sanskrit sentence (target sentence). The authors use feed forward ANN for the selection of Sanskrit words, such as nouns, verbs, objects, and adjectives, from English to Sanskrit User Data Vector (UDV). Due to morphological richness of Sanskrit, this system uses only morphological markings to identify Subject, Object, Verb, Preposition, Adjective, Adverb, Conjunctive and as well as an infinitive types of sentence. The performance evaluations of our EST system with different methods of MT evaluations are shown using a table.
Article Preview

1. Introduction

India is a multilingual country with eighteen constitutionally recognized languages (Sinha & Jain, 2003). Even though, Sanskrit is understand by 0.01% (49,736) as per census of India, 1991. Therefore, machine translation provides a solution in breaking the language barrier within the country. Correct karaka assignment poses the greatest problem in this regards (Samantaray, 2004; Bharti & Kulkarni, 2007). There are no existing machine translation systems that work on English to Sanskrit translation. Some works on Sanskrit parser and morphological analyzers have done earlier which is briefly described below.

The work of Ramanujan (1992) describes that morphological analysis of Sanskrit is the basic requirement for the computer processing of Sanskrit. The Nyaya (Logic), Vyakarana (Grammar) and Mimamsa (Vedic interpretation) are suitable solutions that cover syntactic, semantic and contextual analysis of Sanskrit sentence. P. Ramanujan has developed a Sanskrit parser ‘DESIKA’, which is Paninian grammar based analysis program. DESIKA1 includes vedic processing and shabda-bodha.

Briggs (1985) uses semantic nets (knowledge representation scheme) to analyze sentences unambiguously. He compares the similarity between English to Sanskrit and provides the theoretical implications of their equivalence.

Huet (2003) has developed a Grammatical Analyzer System, which tags NPs (Noun Phrase) by analyzing sandhi, samasa and sup affixation2.

The works in Sanskrit processing tools and Sanskrit authoring system have carried out Jawaharlal Nehru University, New Delhi-India3. It is currently engaged in karaka Analyzer, sandhi splitter and analyzer, verb analyzer, NP gender agreement, POS tagging of Sanskrit, online Multilingual amarakosa, online Mahabharata indexing and a model of Sanskrit Analysis System (SAS) (Jha et al., 2006).

Morphological analyzers for Sanskrit have developed by Akshara Bharathi Group at Indian Institute of Technology, Kanpur-India, and University of Hyderabad.

We have developed a prototype model of English to Sanskrit machine translation (EST) system using ANN model and rule based approach. ANN model gives matching of equivalent Sanskrit word of English word which handles noun and verb. The ANN based system gives us faster matching of English noun (subject or object) or verb to appropriate Sanskrit noun (subject or object) or dhaatu. The rule based model generates Sanskrit translation of the given input English sentence using rules that generate verb and noun for Sanskrit. The rule based approaches mostly make use of hand written transfer rules to the translation of substructures from source language (English sentence) to target language (Sanskrit sentence). The main advantages of rule based approaches are easy implementation and small memory requirement (Jain et al., 2001).

We have divided our work into the following sections. Section 2 describes the linguistic feature of Sanskrit, its equivalence in English and comparative view of English and Sanskrit. Section 3 presents infinitives in English and Sanskrit that describe the rules for forming words of infinitives in Sanskrit which is based on Panini grammar. Section 4 describes the system model of our EST system. Section 5 presents implementation and the illustration with examples as well as the result of the translation in GUI form. In section 6, we show the performance evaluation results of our EST system with different MT evaluation methods such as BLEU (BiLingual Evaluation Understudy), unigram Precision, unigram Recall, F-measure and METEOR score using table and column chart. The conclusions and scope for future work are mentioned in Section 7.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 7: 2 Issues (2017)
Volume 6: 2 Issues (2016)
Volume 5: 1 Issue (2015)
Volume 4: 1 Issue (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing