Dependency Parsing in Bangla

Dependency Parsing in Bangla

Utpal Garain (Indian Statistical Institute, India) and Sankar De (Gupta College of Technological Sciences, India)
DOI: 10.4018/978-1-4666-3970-6.ch008
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

A grammar-driven dependency parsing has been attempted for Bangla (Bengali). The free-word order nature of the language makes the development of an accurate parser very difficult. The Paninian grammatical model has been used to tackle the free-word order problem. The approach is to simplify complex and compound sentences and then to parse simple sentences by satisfying the Karaka demands of the Demand Groups (Verb Groups). Finally, parsed structures are rejoined with appropriate links and Karaka labels. The parser has been trained with a Treebank of 1000 annotated sentences and then evaluated with un-annotated test data of 150 sentences. The evaluation shows that the proposed approach achieves 90.32% and 79.81% accuracies for unlabeled and labeled attachments, respectively.
Chapter Preview
Top

2. Paninian Grammar

The Paninian framework was originally designed more than two millennia ago for writing a grammar for Sanskrit. This framework is now being adapted for analyzing modern Indian Languages (ILs) which are actually the derivatives of Sanskrit. Paninian grammar is particularly suited for morphologically rich free word ordered languages like most ILs including Bangla.

As conceived by the syntactico-semantic model of Paninian Grammar, every verbal root (dhaatu) denotes an action consisting of: (1) an activity and (2) a result. Result is the state which when reached the action is complete. Activity consists of actions carried out by different participants or Karakas (mostly noun groups) involved in the action. The Karakas have direct relation to the verb. The Paninian model used only six such Karakas such as K1, K2, K3, K4, K5, K7. Some additional relations have been described in (Bharti, 2009c) and the complete tag set has been given in Appendix A. In this approach the verb demands some karakas carryout the activity. Thus verb groups are known as Demand Groups and Karakas as the Source Groups or arguments. So for a very simple sentence (single Demand Group) like S1, the verb group is the root of the dependency tree connecting some noun groups with appropriate Karaka labels (Bharati, 1993). Consider the sentence in Box 1. The parsed output will be shown in Figure 1.

Table 1.
Karaka frame for verb ‘gon’ (to count)
Arc_labelNecessityVibhaktiLexical Type
k1mΦNoun
k2mΦNoun
k3dদিয়ে/এNoun
k7tdΦ/কে/এ/তে/এতে/য়Noun
k7pdতে/এতে/য়Noun

Complete Chapter List

Search this Book:
Reset