Article Preview
TopIntroduction
Language is a necessary entity for communication. To easiness the way of human-computer interaction, it is preferred for machines to comprehend natural languages. In order to incorporate natural language understanding feature in machines, several Natural Language Processing (NLP) techniques are used. NLP techniques depict how machines can be used to process and analyze natural languages. In previous years, several NLP applications such as text summarization, question-answering, machine translation have been developed for many natural languages. During the development of such NLP applications, a task namely Named Entity Recognition (NER) is often performed as a preprocessing step. NER is a two-step process which is used to identify proper nouns from text and classify them into predefined categories such as person, location, measure, organization, time, etc. Integration of NER in NLP applications increases the accuracy level of these applications. For example, question-answering system (Greenwood & Gaizauskas, 2003), machine-translation system (Babych & Hartley, 2003) and text clustering (Toda & Kataoka, 2005) performed well after integration of NER. Nowadays, these applications are also available for the Hindi language, therefore a state-of-art Hindi NER tool is essential to increase the performance of these NLP applications. Most of the available systems for Hindi NER use traditional approaches such as handcrafted rules with gazetteers and machine learning based models. Handcrafting rules are time consuming and require either intensive knowledge of grammar or language experts to write rules. Furthermore, several machine learning based methods have been used to implement NER including hidden markov model (Chopra, Joshi, & Mathur, 2016; Morwal, Jahan, & Chopra, 2012), support vector machine (Ekbal & Bandyopadhyay, 2010), conditional random field (Ekbal & Bandyopadhyay, 2009) and combination of these methods. Again, these methods rely on an intensive knowledge of grammar and handcrafted features. Handcrafting of features are difficult due to lack of linguistic resources in Hindi. Additionally, there are several challenges in the Hindi language such as free word order, no capitalization, lack of labeled data, ambiguity in proper nouns, etc. which need to be addressed while developing NER tool for Hindi.