Dependency Parsing in Bangla

Utpal Garain (Indian Statistical Institute, India) and Sankar De (Gupta College of Technological Sciences, India)
DOI: 10.4018/978-1-4666-6042-7.ch076
A grammar-driven dependency parsing has been attempted for Bangla (Bengali). The free-word order nature of the language makes the development of an accurate parser very difficult. The Paninian grammatical model has been used to tackle the free-word order problem. The approach is to simplify complex and compound sentences and then to parse simple sentences by satisfying the Karaka demands of the Demand Groups (Verb Groups). Finally, parsed structures are rejoined with appropriate links and Karaka labels. The parser has been trained with a Treebank of 1000 annotated sentences and then evaluated with un-annotated test data of 150 sentences. The evaluation shows that the proposed approach achieves 90.32% and 79.81% accuracies for unlabeled and labeled attachments, respectively.
2. Paninian Grammar

The Paninian framework was originally designed more than two millennia ago for writing a grammar for Sanskrit. This framework is now being adapted for analyzing modern Indian Languages (ILs) which are actually the derivatives of Sanskrit. Paninian grammar is particularly suited for morphologically rich free word ordered languages like most ILs including Bangla.

As conceived by the syntactico-semantic model of Paninian Grammar, every verbal root (dhaatu) denotes an action consisting of: (1) an activity and (2) a result. Result is the state which when reached the action is complete. Activity consists of actions carried out by different participants or Karakas (mostly noun groups) involved in the action. The Karakas have direct relation to the verb. The Paninian model used only six such Karakas such as K1, K2, K3, K4, K5, K7. Some additional relations have been described in (Bharti, 2009c) and the complete tag set has been given in Appendix A. In this approach the verb demands some karakas carryout the activity. Thus verb groups are known as Demand Groups and Karakas as the Source Groups or arguments. So for a very simple sentence (single Demand Group) like S1, the verb group is the root of the dependency tree connecting some noun groups with appropriate Karaka labels (Bharati, 1993). Consider the sentence in Box 1. The parsed output will be shown in Figure 1.

Figure 1.

Parsed output of S1

