Leveraging Incrementally Enriched Domain Knowledge to Enhance Service Categorization

Leveraging Incrementally Enriched Domain Knowledge to Enhance Service Categorization

Jia Zhang (Carnegie Mellon University, Silicon Valley, USA), Jian Wang (State Key Lab of Software Engineering, Computer School, Wuhan University, China), Patrick Hung (University of Ontario Institute of Technology, Canada), Zheng Li (State Key Lab of Software Engineering, Computer School, Wuhan University, China), Neng Zhang (State Key Lab of Software Engineering, Computer School, Wuhan University, China) and Keqing He (State Key Lab of Software Engineering, Computer School, Wuhan University, China)
Copyright: © 2012 |Pages: 24
DOI: 10.4018/jwsr.2012070103
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

This paper reports the authors’ study over an open service and mashup repository, ProgrammableWeb, which groups stored services into predefined categories. Leveraging such a unique structural feature and hidden domain knowledge of the service repository, they extend the Support Vector Machine (SVM)-based text classification technique to enhance service-oriented categorization. An iterative approach is presented to automatically verify and adjust service categorization, which will incrementally enrich domain ontology and in turn enhance the accuracy of service categorization.
Article Preview

Introduction

The ultimate goal of cloud computing is to enable everything as a service (XaaS) (NIST, 2011), where Software as a Service (SaaS) is one core objective. While software being published as universally accessible Web services, users can leverage existing services and quickly compose new value-added business processes and services. However, as cloud has become an unprecedented driving factor to encourage people to publish and share software as services, how to effectively and efficiently discover interested services from a “cloud” of resources remains a big challenge.

One major technique is to establish service registries (Zhang et al., 2007) as centralized “service yellow pages” to help users find interested services. Earlier Universal Description, Discovery, and Integration (UDDI) registries are going out of date, however. Two major reasons are their tight binding to SOAP/WSDL services and their over standardization. In recent years, REpresentational State Transfer (REST) service, a light-weight HTTP Request/Response-based service style, has rapidly emerged and caught significant momentum (Pautasso et al., 2008). Thus, many non-UDDI service registries have been developed. Among them, The ProgrammableWeb (PW, http://www.programmableweb.com, acquired by Alcatel-Lucent in 2010) has become a popular one.

Without adopting the heavy UDDI standard, ProgrammableWeb provides a repository that allows people to publish reusable Web services in various formats (protocols including REST and SOAP), called Web APIs. Meanwhile, PW allows people to publish API-based applications, called mashups. A mashup represents a value-added business process leveraging one or more existing APIs published in PW. Such a light-weight service repository has attracted extensive attention. Since its inception in late 2005, the number of services published at PW has increased rapidly. Up to September 7, 2012, 7190 services and 6,763 mashups have been published at PW. Among the published services, 70% are REST services, 21% are SOAP services, 5% are JavaScript services, and 2% are XML-RPC services. Since APIs at PW represent reusable service components, throughout this paper, we will use the terms API and service interchangeably.

As the number of services accumulates at PW, it is important to facilitate users in querying and finding interested services (Gomadam et al., 2008). However, the current querying power at PW is limited. At publishing time, service providers are allowed to attach some user-defined name tags. Unlike UDDI that intends to regulate a comprehensive ontology system, ProgrammableWeb adopts a straightforward strategy. Every service is manually categorized into one of a preset list of domains (68 domains up to September 7, 2012) (Arabshian et al., 2012). The assigned domain name and provider-defined tags associated with the service are combined to support keyword-based search function.

Such an API search mechanism may cause confusion and decrease search accuracy. First, the manual process of service categorization may not be accurate. As a matter of fact, API “ShowMyIP” was originally classified in domain “Mapping”; and was moved to domain “Internet” later. In the metadata of the API, its description, summary and tags contain some representative keywords of domain “Internet” such as “IP” and “Internet.” Second, it may be difficult to decide one single domain for some APIs, because some predefined domains overlap with each other conceptually. For example, domains Travel, Transportation, and Weather share many common concepts. For another example, the aforementioned API “ShowMyIP” does relate to the category “Mapping” in addition to the category of “Internet.” Third, PW presets a special domain named “Other” and a significant number of services are found left in the category. Currently, 199 services are listed in the category of “Other,” which is the top 14th category with the most number of services (over the entire 68 preset domains). Fourth, user-defined tags may be ad hoc and inconsistent, and sometimes lack of tag (Gomadam et al., 2008), cannot effectively help users find their interested services.

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 14: 4 Issues (2017)
Volume 13: 4 Issues (2016)
Volume 12: 4 Issues (2015)
Volume 11: 4 Issues (2014)
Volume 10: 4 Issues (2013)
Volume 9: 4 Issues (2012)
Volume 8: 4 Issues (2011)
Volume 7: 4 Issues (2010)
Volume 6: 4 Issues (2009)
Volume 5: 4 Issues (2008)
Volume 4: 4 Issues (2007)
Volume 3: 4 Issues (2006)
Volume 2: 4 Issues (2005)
Volume 1: 4 Issues (2004)
View Complete Journal Contents Listing