Article Preview
Top1. Introduction
Due to the fast advancement of Web 2.0 technologies and service-oriented computing, more and more service providers publish their services on the internet mainly in the form of web APIs. They can be more easily organized and manipulated in a loosely coupled style for creating service mashups to fulfill comprehensive functional requirements and offer value-added integrated software systems with complex business processes. As the rapid increase in the number and diversity of web services, it accelerates the interoperable machine-to-machine interaction and greatly promotes the procedure of service discovery, optimum selection, automatic composition and recommendation (Xia, Luo, Li, & Zhu, 2013; Xia, Liu, Liu, & Zhu, 2012; Li, Luo, Xia, Han, & Zhu, 2015). However, with the boom of overwhelming number of functional characteristics of the published web services, there are always hundreds of categories in an online RESTful service repository. As a result, it tends to be a labor-intensive challenging task for service providers to search and find an appropriate category from diverse registered ones, when publishing their API services on a service management platform. For example, ProgrammableWeb.com, which is the largest online RESTful service repository (APIs and mashups), collects over 19,000 APIs and 7000 mashups with more than 400 diverse categories on the web. In addition to providing basic registration information when service providers register their API services on ProgrammableWeb, it needs to further manually choose at least one desired category from more than 400 categories so that it can match corresponding service functional description. Therefore, how to design an effective approach that can classify web services and recommend an accurate category has become a critical research issue to be addressed (Ames & Naaman, 2007).
In recent years, correlative research efforts have been posed on web service classification (Tsoumakas, Katakis, & Taniar, 2008). These existing approaches achieve the goal of web service classification and service tag recommendation by training traditional supervised learning model (e.g., SVM) (Lopez & Maldonado, 2016; Wang, Shy, Zhou, & Bouguettaya, 2010), active learning-based supervised learning model (Tong & Koller, 2001; Liu, Agarwal, Ding, & Yu, 2016; Shi, Liu, & Yu, 2017), or a comprehensive supervised learning model where unlabeled probabilistic topic model (e.g., LDA) (Krestel, Fankauser, & Nejdl, 2009) has been applied to extract semantic feature of web services. Some of the works generally learn a classification model under an existing labeled service repository, while active learning method was taken into account for boosting the learned service classifier, where the most informative services are intellectually selected at each iteration and manually labeled with human efforts to enrich the quality of small scale training data. Although they take advantage of the existing service repository as training data to derive a service classifier which can be easily deployed and applied, it is still unsatisfactory for service providers’ demands on high accuracy of web service classification.
The essential reason is that existing approaches have deficiencies on their effectiveness and efficiency. More specially, the disadvantages of current paradigm for web service classification are twofold. (1) On one hand, they mainly rely on the original service descriptions for learning a service classifier, where each functional description of a RESTful web service only consists of a bunch of short text (e.g., 10 to 20 words), failing to be fully understood on its corresponding category. Furthermore, it is observed that some words are frequently repeated with high occurrence across different service descriptions, which obviously disturbs the purity of differentiating its category of web services. Therefore, it is crucially harmful to affect the classification accuracy. (2) On the other hand, most existing approaches leverage traditional classification algorithm (e.g., SVM), where multiple basic models need to be trained as a whole to perform web service classification, because each of them is a dichotomous classifier that cannot directly solve a multi-class problem. As a result, they accomplish the task with high complexity both on huge space consumption and slow convergence speed, when training a service classifier on a large-scale web service repository.