Text Clustering Using PSO Based Dynamic Adaptive SOM for Detecting Emergent Trends

Text Clustering Using PSO Based Dynamic Adaptive SOM for Detecting Emergent Trends

Chandrakala D (Department of Computer Science and Engineering, Kumaraguru College of Technology, Coimbatore, India), Sumathi S (Department of Electrical and Electronics Engineering, PSG College of Technology, Coimbatore, India), Saran Kumar A (Department of Computer Science and Engineering, Bannari Amman Institute of Technology, Sathyamangalam, India) and Sathish J (Senior Software Engineer, Capgemini, India)
Copyright: © 2019 |Pages: 15
DOI: 10.4018/IJIIT.2019070104


Detection and realization of new trends from corpus are achieved through Emergent Trend Detection (ETD) methods, which is a principal application of text mining. This article discusses the influence of the Particle Swarm Optimization (PSO) on Dynamic Adaptive Self Organizing Maps (DASOM) in the design of an efficient ETD scheme by optimizing the neural parameters of the network. This hybrid machine learning scheme is designed to accomplish maximum accuracy with minimum computational time. The efficiency and scalability of the proposed scheme is analyzed and compared with standard algorithms such as SOM, DASOM and Linear Regression analysis. The system is trained and tested on DBLP database, University of Trier, Germany. The superiority of hybrid DASOM algorithm over the well-known algorithms in handling high dimensional large-scale data to detect emergent trends from the corpus is established in this article.
Article Preview

1. Introduction

In recent years, the development of information systems in the field of business, academics and medicine has led to an increase in the amount of stored data year by year. A vast majority of business data are stored in documents that are virtually unstructured. This is where the text mining fits into the picture. Text mining technology is very helpful for people to process huge information (Adeva, 2005). Text mining involves imposing structure upon text so that relevant information can be extracted from it (Miller, 2005). For example, in the field of academics, many scholastic conferences take place every year. To extend the knowledge of interest of the current focus of a conference, organizers often desire to offer additional workshops. In many cases, these additional events are intended to introduce the participants of the program to significant streams of research in related fields of study and try to identify the emergent technologies in terms of research interests and focus. Identification of reasonable candidate technologies for such workshops is often subjective rather than objectively deriving from the existing and emerging research (Romero, 2007). An emergent trend is a topic area that is growing in interest and utility over time. The detection of new phrases and emerging technical terms has become very important (Abe, 2009 and Amarasiri et al., 2005).

Clustering by document concepts is a powerful way of retrieving information from a large number of documents (Amarasiri, 2005). This task in general does not make any assumption on the data distribution. Popular automatic pattern clustering methods such as K-means clustering (Bekkerman, 2001) and Minimal spanning trees (Solka Jeffery, 2005) are available for enabling text categorization process. (Mei and Zhai, 2005) suggested a method for finding emergent theme patterns on the basis of a finite state machine by using Hidden Markov Model (HMM) as one of the advanced theme detection schemes. (Kostoff et al., 2007) provided a brief introduction and background to assess India’s and China’s Science and Technology literature using text mining approach to achieve good results. The Text mining techniques such as morphological analysis (Ohsumi, 2009), syntax analysis (Sato, 2007), co-occurrence relation (Iwashita Motoi, 2011) and Multi grain hierarchical topic extraction (Kontostathis, 2004) have been reported as effective methods in achieving good performance. (Abe and Tsumoto, 2009) suggested an ETD method to detect emergent trends from the corpus using linear regression algorithm. In detecting trends, Term Frequency–Inverse Document Frequency (TF-IDF), Jaccard’s Similarity Coefficient (JSC), Simple Appearance Ratio or Odds parameter were used as feature vectors for each technical term. However, the time taken to classify emerging and subsiding trends by calculating the degree and intercept using linear regression algorithm of each technical term from the voluminous data is relatively high.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 16: 4 Issues (2020): 2 Released, 2 Forthcoming
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing