Incorporating LDA With Word Embedding for Web Service Clustering

Incorporating LDA With Word Embedding for Web Service Clustering

Yi Zhao (School of Computer Science, Wuhan University, Wuhan, China), Chong Wang (School of Computer Science, Wuhan University, Wuhan, China), Jian Wang (School of Computer Science, Wuhan University, Wuhan, China) and Keqing He (School of Computer Science, Wuhan University, Wuhan, China)
Copyright: © 2018 |Pages: 16
DOI: 10.4018/IJWSR.2018100102
OnDemand PDF Download:
No Current Special Offers


With the rapid growth of web services on the internet, web service discovery has become a hot topic in services computing. Faced with the heterogeneous and unstructured service descriptions, many service clustering approaches have been proposed to promote web service discovery, and many other approaches leveraged auxiliary features to enhance the classical LDA model to achieve better clustering performance. However, these extended LDA approaches still have limitations in processing data sparsity and noise words. This article proposes a novel web service clustering approach by incorporating LDA with word embedding, which leverages relevant words obtained based on word embedding to improve the performance of web service clustering. Especially, the semantically relevant words of service keywords by Word2vec were used to train the word embeddings and then incorporated into the LDA training process. Finally, experiments conducted on a real-world dataset published on ProgrammableWeb show that the authors' proposed approach can achieve better clustering performance than several classical approaches.
Article Preview

1. Introduction

With the rapid development of SOA (Service Oriented Architecture), the number of available Web service resources on the Internet is increasing rapidly. As of 30 Dec. 2017, for example, over 18,000 Web services have been published on ProgrammableWeb1 (PW), one of the most popular service registries. These Web services follow various protocols, including SOAP (Simple Object Access Protocol) (Li, 2008), XML-RPC (Cerami, 2002) (XML Remote Procedure Call), REST (Wagh, 2012) (Representational State Transfer), and so on. Meanwhile, they have diverse service description formats, such as WSDL (Web Service Description Language), WADL (Web Application Description Language), and natural language text. Currently, almost all the Web services in PW are described in unstructured short texts. These heterogeneous and unstructured service descriptions bring many difficulties in service discovery. Therefore, how to accurately discover appropriate Web services for users becomes an important issue in services computing.

Web service search engines or online Web service directories are the major sources of service discovery. However, these search engines based on keyword matching usually suffer from using synonyms or variations of predefined keywords, and thus lead to returning inaccurate results (Al-Masri, 2008). Towards this problem, many semantic Web service discovery approaches (e.g. Mier, 2016) have been proposed to improve service discovery by semantically annotating attributes of Web services with domain ontologies. However, this is still a time-consuming task, which makes it difficult to apply these approaches in practice. In recent years, Web service clustering has shown its advantages in improving the performance of Web service discovery (Chen, 2011; Richi, 2007), by grouping Web service descriptions into clusters based on their functionalities. Among the existing service clustering approaches, LDA (Latent Dirichlet Allocation) (Blei, 2003) is the most widely adopted model, since it can be used to extract unobserved groups that explain why some parts of the service descriptions are similar and capture the underlying domain semantics. However, the word distributions in Web service descriptions are usually very sparse, so that the latent topics learned by LDA and other topic models are still inaccurate. As a result, this may lead to unsatisfactory clustering performance.

Complete Article List

Search this Journal:
Volume 19: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 18: 4 Issues (2021)
Volume 17: 4 Issues (2020)
Volume 16: 4 Issues (2019)
Volume 15: 4 Issues (2018)
Volume 14: 4 Issues (2017)
Volume 13: 4 Issues (2016)
Volume 12: 4 Issues (2015)
Volume 11: 4 Issues (2014)
Volume 10: 4 Issues (2013)
Volume 9: 4 Issues (2012)
Volume 8: 4 Issues (2011)
Volume 7: 4 Issues (2010)
Volume 6: 4 Issues (2009)
Volume 5: 4 Issues (2008)
Volume 4: 4 Issues (2007)
Volume 3: 4 Issues (2006)
Volume 2: 4 Issues (2005)
Volume 1: 4 Issues (2004)
View Complete Journal Contents Listing