Article Preview
TopIntroduction
Web services, which share business logic, data and processes through a programmatic interface, represent an important way for businesses to communicate with each other and with clients. The concept of Web services has therefore become a widely applied paradigm in research and industry, with the number of services published on the Internet increasing rapidly over the last few years (Al-Masri & Mahmoud, 2008). With this proliferation of Web services, service discovery is becoming a challenging and time-consuming task. Efficient service discovery is very significant because other major problems in service-oriented computing, such as service selection and composition, are also related to efficient discovery. Reducing the query space by clustering services, thereby avoids many unnecessary similarity calculations in the matching process, and is an efficient approach to increasing the performance of service discovery. Web services can be clustered into functionally similar clusters by considering functional attributes such as input, output, precondition and effect (Dasgupta, Bhat et al., 2011). Alternatively, services can be clustered in terms of quality of service by considering their nonfunctional properties, such as cost and reliability (Xia, Chen et al., 2011). Some recent studies have proposed clustering services in terms of social properties (Chen, Paik et al., 2013). Here, we consider functional clustering.
One main issue in clustering is calculating the similarity or affinity between services. Recent studies have proposed several approaches to calculating functional similarity. Simple approaches include checking the one-to-one matching of features such as the service name and checking the matching of service signatures such as the messages (Elgazzar, Hassan et al., 2010). In some studies, information retrieval (IR) techniques are used (Platzer, Rosenberg et al., 2009). These include similarity-measuring methods such as search-engine-based (SEB) methods (Liu & Wong 2009) and cosine similarity (Chen, Yang et al., 2010; Ma, Zhang et al., 2008). Some researchers have used logical relationships such as exact and plug-in (Wagner, Ishikawa et al., 2011) or edge-counting-based techniques (Xie, Chen et al., 2011; Sun, 2010) to increase the semantics in the similarity calculations via ontologies. However, one-to-one matching, structure matching or a vector-space model may not accurately identify the semantic similarity among terms because of the heterogeneity and independence of service sources. These methods consider terms only at the syntactic level, whereas different service providers may use the same term to represent different concepts or may use different terms for the same concept. Furthermore, IR techniques such as cosine similarity usually focus on plain text, whereas Web services contain much more complex structures, often with very little textual description. This means that depending on IR techniques is very problematic. Moreover, there can be a loss of the machine-interpretable semantics found in service descriptions when converting data provided in service descriptions into vectors in IR techniques. In SEB similarity-measuring methods such as normalized Google distance (NGD), there is no guarantee that all the information needed to measure the semantic similarity between a given pair of words is contained in the top-ranking snippets. On the other hand, although ontologies help to improve semantic similarity, defining high-quality ontologies is a major challenge. Several methods have been used to develop ontologies in current approaches, including obtaining assistance from domain expertise, using resources such as WordNet (http://wordnet.princeton.edu/, n.d.) and using ontologies already available via the Internet (Xie, Chen et al., 2011). Developing ontology by obtaining assistance from domain expertise is a time-consuming task that requires considerable human effort. In addition, the lack of up-to-date information in a resource might fail to capture the latest concepts and relationships in a domain. Further, the lack of standards for integrating and reusing existing ontologies also hampers ontology-based (OB) semantics matching.