A grid system is usually composed of thousands of nodes which are broadly distributed in different virtual organizations. Owing to geographical boundaries among these organizations, the system administrators suffer a great pressure to coordinate when grid system experiences a maintaining period. Furthermore, the runtime dynamicity of service state aggravates the complexity of tasks. Consequently, building an efficient and reliable maintaining model becomes an urgent challenge to ensure the correctness and consistency of grid nodes. In our experiment with ChinaGrid, a Dynamic Maintenance mechanism has been adopted in the fundamental grid middleware called ChinaGrid Support Platform. By resolving the above problems with system infrastructure, service dependency and service consistency, the availability of the system can be improved even the scope of maintenance extends to wider region.
Dynamic maintenance for large-scale resources in grid environment is a big challenge owing to complexity of grid services and exigent requirement of grid users. Inappropriate processes of maintenance lead to unpredictable failures in wide area. Due to geographical distribution of computing and data resources in different administrative regions, a reliable maintenance mechanism is urgently necessary to coordinate different hosts and ensure the efficiency of maintenance task.
For the administrators of grids, the maintaining task is running through the whole lifecycle of service components. As shown in Figure 1, Jin and Qi (2007) defined that each service component in grid has the lifecycle of: released, deployed, initialed, activated, and destroyed. Responding to these stages, the maintaining tasks include publish, deploy, undeploy, redeploy, configure, activate, and deactivate. Especially, these tasks should face the distributed challenges in grid environment.
Lifecycle of service component
A number of earlier investigations have addressed providing and standardizing maintenance for distributed resources. The Configuration, Description, Deployment and Lifecycle Management (CDDLM), proposed by Open Grid Forum (2006), is to standardize distributed software deployment and configuration in a validated lifecycle. Another specification of deployment infrastructure, the Installable Unit Deployment Descriptor (IUDD) released by W3C (2004), also provides a solution of supporting dynamic maintenance in run-time execution environment. Web Services Distributed Management (2006), proposed by Organization for the Advancement of Structured Information Standards (OASIS), discusses how management of any resource can be accessed via web services protocols and management of the web services resources via the former. Talwar and Milojicic (2005) discussed the approaches for service deployment, and defined Quality of Manageability to measure the quality and efficiency of maintenance for service components.
Today’s domain consumers demand the maintenances without shutting down the system, but the existing specifications and solutions can not efficiently reduce the downtime due to maintenance. Therefore, the performance and availability of grid services during maintenance need further attention when focusing on the maintenance of resources.
As the improvement from infrastructure, researchers believe the feature of dynamic deployment in grid container can achieve higher availability. Weissman (2005) proposed an architecture basing on Apache Tomcat’s dynamic deployment functionality which allows service renovating and reconfiguring without taking down the whole site. Smith and Friese (2005) also introduced a similar solution to support dynamic deployment. Liu and Lewis (2005) designed an intermediate language X# to support the dynamic deployment among heterogeneous implementations of grid container.
Key Terms in this Chapter
Availability of Maintenance: The proportion of time a system is in a functioning condition in the watching period. More specifically, the availability during the maintenance is the ratio of system’s available time to the longest maintaining time (i.e., watching period).
Dynamic Maintenance: Dynamic maintenance includes the operations (e.g., deploy, undeploy, and so forth) to large scale service components in the runtime. The dynamicity of maintenance means that the maintenance will not affect the execution of existing components and promise the downtime as less as possible. Normally the maintaining requests are delivered by the administrators and provisioning modules.
Consistency of Maintenance: Due to the complexity of grid system, the maintenance to particular services always is propagated to many replications. Consistency is a measure to promise the maintenance can be finished in valid period or the correct order.
Service-/Container-/Global-level of Maintenance: The maintenance of any new service components involves reloading (reinitializing and reconfiguring) the service (or container or whole grid respectively).
Service Dependency: The correct execution of a service component is always depending on the hosting environment, the dependent calling services, and the dependent deployment service respectively.
Quality of Manageability: It is a measure of the ability to manage a system component. QoM measures include number of lines of configuration code (LOC) for deployment, number of steps involved in deployment, LOC to express configuration changes, and time to develop, deploy, and make a change.
Grid Container: It hosts web or grid services and executes user requests issued by clients that invoke operations defined by those services.
Dynamic Deployment: It denotes the ability for remote clients to request the upload and deployment of new services into, or the undeployment of existing services from, existing grid containers. It is a special case of dynamic maintenance.