Automatic Generation of Synsets for Wordnet of Hindi Language

Automatic Generation of Synsets for Wordnet of Hindi Language

Priyank Pandey, Manju Khari, Raghavendra Kumar, Dac-Nhuong Le
Copyright: © 2018 |Pages: 17
DOI: 10.4018/IJNCR.2018040103
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

India is a land of 122 languages and numerous dialects. Lack of competent lexical resources for Indian languages is a ubiquitous fact, which negatively affects the development of tools for NLP of Indian languages. Recent advancements like the Indo WordNet project has significantly contributed to dealing with the scarcity of lexicons, but the progress and coverage is a matter of dispute. The bottlenecks, cost, time, and skilled lexicographers further slackens the progress. In this article, the authors propose a technique to automate the generation of lexical entries using a machine learning approach which visibly expedites the process of lexicon generation like WordNet. The reluctance to adopt an automated approach is majorly credited to a lack of accuracy, the inability to capture a regional touch of a language, incorrect back-translation, etc. To overcome this issue, the author will use Wikipedia to validate the synsets.
Article Preview
Top

2. Conventional Way Of Automatic Wordnet Generation

Before heading towards our approach in constructing wordnets and translating the senses of each term for multiple languages, this section will introduce the conventional approaches or models in creating wordnets that was used by early researchers.

First approach is merge model, in which conversion of thesaurus into a wordnet is mentioned. Then, this wordnet is linked semi-automatically to another wordnet. The disadvantage is this model is only confined to small number of languages. Unless there are few available word-net for every language, it is not applicable to a wide span of languages.

Second approach is the expand model, is better than the previous one as it consumes less pre-existing resources. This uses a strategy as follows:

  • 1.

    Consider an available wordnet for some language LO, generally English Princeton WordNet;

  • 2.

    In every sense s mentioned in the wordnet, then, using translation dictionary, translation of the elements linked through s from LO to destination language LN is taken place;

  • 3.

    In addition, preserve all related relations concerned with semantics among senses from t already available wordnet so that it can arrive for a new wordnet i.e. LN.

Complete Article List

Search this Journal:
Reset
Volume 12: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 11: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 10: 4 Issues (2021)
Volume 9: 4 Issues (2020)
Volume 8: 4 Issues (2019)
Volume 7: 4 Issues (2018)
Volume 6: 2 Issues (2017)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing