Tamil Question Answering System Using Machine Learning

Tamil Question Answering System Using Machine Learning

Ashok Kumar L., Karthika Renuka D., Shunmugapriya M. C.
DOI: 10.4018/978-1-6684-6001-6.ch010
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Tamil question answering system (QAS) is aimed to find relevant answers in in the native language. The system will help farmers to get information in Tamil related to the agriculture domain. Tamil is one of the morphologically rich languages. As a result, developing such systems that process Tamil words is a difficult task. The list of stop words in Tamil has to be collected manually. Parts of speech (POS) tagging is used to identify suitable POS tag for a sequence of Tamil words. The system employs Hidden Markov Model (HMM)-based viterbi algorithm, a machine learning technique for parts of speech tagging of Tamil words. The analyzed question is given to the Google search to obtain relevant documents. On top of Google search, locality sensitive hashing technique (LSH) is utilized to retrieve the five relevant items for the input Tamil question. Jaccard similarity is used to obtain the response from the retrieved document items. The proposed system is modelled using a dataset of 1000 sentences in the agriculture domain.
Chapter Preview
Top

Introduction

Internet provides high source of data but English is the dominant language. Though the people get information from search engine it may or may not be useful and relevant. They spend much of time to get the relevant answer. The key idea of this work is to find and represent the exact answer to the user’s query. The difficulty of finding and validating the right answer makes a QAS complex than the general information retrieval task.

The system processes Tamil questions and provides answers in Tamil to the user in the agriculture domain. Uzhavan mobile application for agriculture domain in Tamil provides information regarding agriculture schemes and subsidy. But it does not give concise answers to the farmer queries. To address such problems Tamil QAS system gives exact answers to user queries.

Table 1 shows the different type of questions in Tamil. In this work, the user queries in the agriculture domain is processed by the proposed model to get answer in Tamil. The key idea of this system is to find agriculture domain based question answering system in Tamil language. Stop words in Tamil are collected manually and the Tamil words are POS tagged. Locality sensitive hashing is used to retrieve relevant documents. Finally the answers are ranked using Jaccard similarity.

Table 2 shows the different types of QASs. QAS are classified based on application domain, analysis done and techniques used. Based on application domain it is classified as open and closed domain. Based on analysis QAS is further classified as semantic QAS, syntactic QAS and morphological QAS. Based on technique used it is classified as web based and ontology based.

Table 1.
Types of Question
Question TypeDescriptionSample Question (In Tamil)
Factoid QuestionDescription and
definition type of questions
வறண்ட லத்திற்கு எவ்வாறான
பழப்பயிர்களை
தேர்ந்தெடுக்க
வேண்டும்?
Listing QuestionTypes and list of questionsநெற்பயிரைத் தாக்கும் பூச்சிகள்?
Affirmative QuestionYes or No questionsகிழங்கு
சாகுபடிக்கு
20 செ.மீ இடைவெளி
இருப்பது சரியா?
Table 2.
Types of Question Answering System
Types of QASDescriptionExample
Open Domain QASUsers can ask questions in any domain and get relevant answers from the system.IBM Watson
Closed Domain QASUsers can ask question in one particular domain such as medical, science domain.QAPD
(physics domain)
Ontology Based QASThe system makes use of ontology such as DBPedia Ontology and SparQL query to answer users questions as a knowledge source.AquaLog and DeepQA IBM Watson System
Web Based QASThe system exploits web resources such as Google, Wikipedia as knowledge repository to answers the users.WEBQA
Semantic Analysis based QASThe system conceptually analyzes the user questions and provides exact answers.QUERIX and PANTO
Syntactic Analysis based QASKeyword based analysis is done by the system to answer queries.ASKME
Morphological Analysis based QASThe Morphological analyzer breaks the word (பயிர்கள்) into root word (பயிர்) and associated morpheme features (கள்).TamilQAS

Complete Chapter List

Search this Book:
Reset