A Tool to Extract Name Entity Recognition From Big Data in Banking Sectors

A Tool to Extract Name Entity Recognition From Big Data in Banking Sectors

C. Janarish Saju (Anna University, Tamil Nadu, India) and S. Ravimaran (M.A.M. College of Engineering, Tamil Nadu, India)
Copyright: © 2020 |Pages: 22
DOI: 10.4018/IJWSR.2020040102

Abstract

Generally, the Internet is the global system of interconnected computer networks, connecting millions of computers as well as people, and thus generates a massive quantity of information on a daily basis. This leads to extracting the necessary information using information filtering (IF) in several domains. In our implementation, the named entity recognition (NER) technique is employed to automatically extract valuable data from the unstructured natural language texts. As several works has been outlined in detecting named entities, plenty of very different NER tools exist for several domains. However, NER remains a giant challenge so to solve this problem we proposed a novel framework by combining three efficient classifiers. This article proposes a three-layered neural network approach with conditional random field (CRF), the Pachinko allocation model (PAM), and the Adaptive Neuro-Fuzzy Inference System (ANFIS) for detecting named entities in three steps. First, a classifier based on CRF is employed to train the input file. Second, PAM is employed to boost the previous output created by CRF to enhance the label annotation. Third, the ANFIS captures the deep features of the information by itself from the pre-trained information to attain accurate predictions. Experimental results show that the learned model yields a banking domain with a recall rate of 92%, a precision rate of 95% and F-measure of 92% by implementing it in the R Platform.
Article Preview
Top

1. Introduction

The Internet is a revolution in information technology, which also plays an important role in our daily life, because of its services and features. In today's networked international world, people, goods, data, and information move very widely, quickly and independently within the speed of light across various boundaries. This new environment has brought tremendous opportunities in the form of challenges for businesses, governments and scientific communities (Lee, 2015) as well as for the people, politics and other knowledge (Lee & Olson, 2010). The advancement in Information and communication technology (ICT) have brought an incredible increase within the quantity of data created and shared (Big Data). Knowledge analytics are also used for a spread of purposes (business, security and safety, scientific discovery, etc.), domains (biology, medicine, education, etc.), and stakeholders (businesses, governments, scientists, and consumers) to understand the impact of knowledge management (KM), since we handle a large amount of data on a daily basis (Kim, Trimi, & Chung, 2014). Therefore, data extraction is very essential for several sectors like academia, the trade, banks and governments.

Numerous organizations are continuously assembling knowledge (grabbed from different sources and languages) by some procedure, and analyzing many data in textual format. The assembling and analyzing of such type of data are turned out to be more and harder, because of the blemished, fragmented, and unstructured nature of data at all time. The developing territories for text analytics are (1) Information extraction (IE) consequently extracting organized information from archives; (2) Topic model (TM) finding the most topics in an exceptionally monstrous and unstructured accumulation of records by utilizing calculations; (3) Opinion mining get to, separate, group, and see the conclusions communicated in a few sources together with interpersonal organizations; assessment investigation is moreover utilized for sentiment mining; and (4) Question answering (Q&A)- noting accurate inquiries (e.g., IBM's Watson, Apple's Siri, Amazon's Alexa, and so forth) in the view of methods from statistical Natural Language Processing (NLP), Information (IR), and Human-Computer Interaction (HCI) (Chen H, Chiang RH, Storey VC 2012).

Named Entity Recognition (NER) is the task of recognizing named elements like a person, area, association, time, amount and so on in the content. NER frameworks are regularly utilized as the initial phase being referred to replying, information retrieval, co-reference goals, and topic modeling, and so on (Abraham, Liu, Lin, & Sun 2012). The web is a noteworthy wellspring of data in this cutting edge world and so as to handle a lot of data stream we need Information Filtering (IF) (Mamat, Mansouri, & Suriani, 2008).The clarified extraction technique improves the extraction nature of the framework by increasing the precision of entity extraction (McIntosh, Murphy, & Curran, 2006), because in our approach the first separated elements will be utilized as a seed for preparing other AI (ML) frameworks (Kanimozhi & Manjula, 2016), across domains and languages. Along these lines, disposing of requirement for manual training and seed extending (through learning) will keep on expanding the accuracy and utilization of element extraction across domains and languages (Ahmed & Sathyaraj, 2015). Energy investment for the preparation of framework (as is finished by the seed elements made by us, and used by Machine Learning and deep learning) (Powley & Dale, 2007) will be reduced by the specialists' involvement and use. Furthermore this framework will takes less time for fixing faulty entities and connection removal to make better data-driven decision (Yoshioka & Thaer, 2015).

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 18: 4 Issues (2021): 1 Released, 3 Forthcoming
Volume 17: 4 Issues (2020): 3 Released, 1 Forthcoming
Volume 16: 4 Issues (2019)
Volume 15: 4 Issues (2018)
Volume 14: 4 Issues (2017)
Volume 13: 4 Issues (2016)
Volume 12: 4 Issues (2015)
Volume 11: 4 Issues (2014)
Volume 10: 4 Issues (2013)
Volume 9: 4 Issues (2012)
Volume 8: 4 Issues (2011)
Volume 7: 4 Issues (2010)
Volume 6: 4 Issues (2009)
Volume 5: 4 Issues (2008)
Volume 4: 4 Issues (2007)
Volume 3: 4 Issues (2006)
Volume 2: 4 Issues (2005)
Volume 1: 4 Issues (2004)
View Complete Journal Contents Listing