Article Preview
Top1. Introduction
With the rapid development of SOA (Service Oriented Architecture), the number of available Web service resources on the Internet is increasing rapidly. As of 30 Dec. 2017, for example, over 18,000 Web services have been published on ProgrammableWeb1 (PW), one of the most popular service registries. These Web services follow various protocols, including SOAP (Simple Object Access Protocol) (Li, 2008), XML-RPC (Cerami, 2002) (XML Remote Procedure Call), REST (Wagh, 2012) (Representational State Transfer), and so on. Meanwhile, they have diverse service description formats, such as WSDL (Web Service Description Language), WADL (Web Application Description Language), and natural language text. Currently, almost all the Web services in PW are described in unstructured short texts. These heterogeneous and unstructured service descriptions bring many difficulties in service discovery. Therefore, how to accurately discover appropriate Web services for users becomes an important issue in services computing.
Web service search engines or online Web service directories are the major sources of service discovery. However, these search engines based on keyword matching usually suffer from using synonyms or variations of predefined keywords, and thus lead to returning inaccurate results (Al-Masri, 2008). Towards this problem, many semantic Web service discovery approaches (e.g. Mier, 2016) have been proposed to improve service discovery by semantically annotating attributes of Web services with domain ontologies. However, this is still a time-consuming task, which makes it difficult to apply these approaches in practice. In recent years, Web service clustering has shown its advantages in improving the performance of Web service discovery (Chen, 2011; Richi, 2007), by grouping Web service descriptions into clusters based on their functionalities. Among the existing service clustering approaches, LDA (Latent Dirichlet Allocation) (Blei, 2003) is the most widely adopted model, since it can be used to extract unobserved groups that explain why some parts of the service descriptions are similar and capture the underlying domain semantics. However, the word distributions in Web service descriptions are usually very sparse, so that the latent topics learned by LDA and other topic models are still inaccurate. As a result, this may lead to unsatisfactory clustering performance.