Communications
Volume 3, Issue 5, September 2015, Pages: 109-114

Rejuvenation in Virtualized Servers

Manel Sanheji1, Alia Maaloul1, Ridha Azizi2

1Higher institute of the technological studies of Gabes, Gabes Tunisia

2Higher institute of the technological studies of Sousse, Sousse, Tunisia

Email address:

(M. Sanhaji)
(A. Maaloul)
(R. Azizi)

To cite this article:

Manel Sanheji, Alia Maaloul, Ridha Azizi. Rejuvenation in Virtualized Servers.Communications.Vol.3, No. 5, 2015, pp. 109-114. doi: 10.11648/j.com.20150305.15


Abstract: Cloud computing and virtualization are current topics that imply not only the industry but also the university. As server virtualization is an essential software infrastructure in virtualized environment, virtualized servers availability in the cloud remains an open question and the subject of several research projects. Without a perfect solution for software aging, virtual servers and resources will be in risk and service reliability in cloud project will degrade. To counteract software aging a technique named rejuvenation has been proposed, in order to remove aging related failures and its effect from virtual machines. In this paper we present our research work which aims to expose the different concepts inherent in virtualization and data centers as well as the state of art of rejuvenation with a comparison between its different techniques.

Keywords: Cloud Computing, Virtualization, Availability, Rejuvenation


1. Introduction

In the CDWG reference guide on virtualization and infrastructure Optimization, virtualization is defined as "a method of decoupling an application and the resources required to run it — processor, memory, operating system, storage and network access — from the underlying hardware host." The important takeaway is that multiple resources can be accessed from a single server, resulting in fewer servers, less energy consumption and less maintenance.

Otherwise "Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. [...]." NIST [1]

Cloud computing and virtualization are emerging research area that involve not only the industry but also the university.

Indeed, virtualization tends to enter or win, increasingly in server farms, storage systems and network organizations.

With virtualization, efficiency and availability of IT resources and applications will be improved. We begin by abandoning the old model "one server, one application" and runs multiple virtual machines on each physical machine. IT administrators relieve pressure, who spend more time managing servers to innovate. In a non-virtualized datacenter, almost 70% of an IT budget is spent on such simple maintenance of existing infrastructure, leaving little for innovation.

Automated Datacenter, based virtualization platform VMware, proven in production can meet more effectively and faster to market changes. [2]

The crucial question is that of service reliability in cloud environments. Indeed, one of the attributes for reliability of cloud services is service availability that is Availability perceived by a cloud user. This is a defining attribute that affects availability of cloud, network availability, cloud performance, network performance and security of the cloud. It is also one of the most important reliability requirements of most major functions of the cloud. [3]

In the field of cloud computing, service unavailability is discussed not only to be understood as the unavailability of the network and the cloud, but also the performance (delivery) compromised of both and security breaches in the cloud. Because, in the experience of the user, the low performance of cloud or network, or Distributed Denial of Service (DDoS) Security violates the cloud, resulting unavailability service.

This generalization, however, remains quite idealistic. The reliability and performance attributes should be treated separately as proposed in [4]. Currently, there are providers who can offer availability as high as clouds 99.95% [5] which is about 4.5 hours of downtime in a year, which is quite low for many services, for example: Switching, control systems … [6]

It is in this context that our work is registered. It is a question of presenting the main facets of the virtualization in the data center and of studying the state of the art of rejuvenation in virtualized servers with comparison between its different techniques.

In this paper we will start initially by introducing the virtualization by showing its various specific characteristics. Then we will present the concepts of "virtual data centers", the migration of the virtual servers as well as the main failures.

Finally we will expose the state of the art of rejuvenation and we will offer comparison between its different techniques.

2. Overview on Virtualization

2.1. What is Virtualization

Virtualization covers all material techniques and / or software that can run on a single machine multiple operating systems, several different instances and partitioned a single system or multiple applications separately from each other as s they were operating on separate physical machines.

Each virtualization tool implements one or more of these concepts:

Hardware abstraction layer and / or software,

Host operating system (installed directly on the hardware)

Operating systems (or applications, or set of applications) "Virtualized (s)" or "guest (s)",

Partitioning, isolation and / or sharing of physical and / or software resources,

Manipulate pictures: start, stop, freeze, cloning, backup and recovery, backup context, migration from one physical machine to another virtual network: software-only network, internal to the host machine, between host and guest.

Figure 1. Virtualized architecture against traditional architecture.

Virtualization technology gives a simple physical appearance computer to function as multiple virtual computers. With virtualization, you can run multiple concurrent operating systems on a single physical server; each of the operating systems runs as a standalone computer. We can easily create a virtual machine and set up with the aim of a possible need for restoration in the event of unexpected downtime. [5]

2.2. The Interests of Virtualization

The interests of virtualization are:

Optimal use of resources of machinery (machinery breakdown virtual to physical machines according to the respective loads)

Installation, deployment and easy migration of virtual machines from one machine physical to another, especially in the context of a production start from of a qualifying environmental or pre-production, facilitated delivery

Savings on pooling hardware (power consumption, maintenance physical, monitoring, support, hardware compatibility, etc.)

Installation, testing, development, re-use opportunity to start off with the host system,

Securing and / or isolation of a network (stop virtual operating systems, but not the guest operating systems that are invisible to the attacker, tests of application and network architectures)

The insulation of different concurrent users on the same machine (using central site type)

The dynamic allocation of computing power according to the needs of each application at a given time,

The reduction of risks related to sizing servers when defining the architecture of an application, adding power (new server etc.) then being transparent.

2.3. History

Much of the work on virtualization was developed in IBM research center in Grenoble France (now defunct), who developed the experimental system CP / CMS, then becoming the product (so called hypervisor) VM / CMS proposed to catalog in 1972, subsequently mainframes were able to virtualize OS with specific and proprietary technologies, both software and hardware.

Large Unix followed with NUMA architectures of HP Superdome (PA-RISC and IA64) and E10000 / E15000 Sun (UltraSPARC).

In the second half of the 1990s, x86 emulators of old machines 1980 were a huge success, including computers Atari, Amiga, Amstrad and NES, SNES, Neo Geo and VMware The company developed and popularized at the beginning 2000s a proprietary software virtualization system for x86 architectures. Free software Xen, QEMU, Bochs,

Today, the leading provider of virtualization solutions are VMWare owners, publisher of the eponymous software market leader with Microsoft Virtual PC and Virtual Server (both products are specific to the Microsoft environment), which publishes Avanquest Parallels ( dedicated MacOS X on Intel), the only software on the market that allows 3D acceleration on guest system, which acquired Citrix Xen, SWSoft, Virtuozzo editor and Innotek GMBH with VirtualBox.

2.4. Techniques of Virtualization

There are various virtualization techniques include increasing by level of abstraction:

Isolation

Paravirtualization

The full virtualization or virtual machine

Hardware Partitioning

The isolation is to establish, on the same kernel, a strong separation between different software environments. This virtualization technology the more "light" that exists.

Paravirtualization present the operating systems a special generic machine, which therefore requires special interfaces, integrated systems guests in the form of drivers or kernel modifications. This is a compromise between a level of abstraction high and a satisfactory level of performance.

In full virtualization, the hypervisor transparently intercepts all calls that the operating system can do to material resources, and therefore supports unmodified guest systems.

The hardware partitioning, finally, is the historical technique used on large systems. It aims to separate the material resources at the motherboard of the machine.

This technique is most common in high-end servers, such as Logical Domains of at Sun. It is quite rare in the x86 world. The blades are one example, but they do not offer features as advanced as that found on other architectures like SPARC.

3. Virtual Data Center

3.1. What’s Data Center

A data center (sometimes called a server farm) is a centralized repository for the storage, management, and dissemination of data and information. Typically, a data center is a facility used to house computer systems and associated components, such as telecommunications and storage systems. Often times, there are redundant or backup power supplies, redundant data communications connections, environmental controls, and security devices.

With cloud-based computing, the applications run on servers in the data center, not the local laptop or desktop computer the user is operating. The user’s computer provides a window into the application, but does not actually run the application; in other words, it runs a user interface. This procedure reduces the need for big processing power and memory on the end user’s computer and centralizes it in the data center.

Data centers have a major role in the world of the web:

The search engines will store data on web pages worldwide.

The websites themselves are often housed in data centers.

Companies store data of their employees on data centers to make them available worldwide.

In general, the emergence of cloud (the famous cloud) is related to the use of data centers, which will store information and make it available through an Internet connection to users.

When talking about "Cloud", it is actually just the passage of a hard disk or a physical memory located on your machine (computer, tablet, smartphone) to a remote storage located in a data center.

Pretty term cloud (cloud) mask the physical reality which result: a massive need surfaces and energy. Your local hard drive uses almost no energy but place it on a server to make it available via the Internet, it's something else.

Outside, data centers are highly energy-intensive infrastructure: a server has a catastrophic energy efficiency (almost all of its energy is transformed into heat). In addition to electricity to run the machines, so you need more cooling.

3.2. Overview of the Main Failures in a Data Center

Failures at a data center can be of two types: operational failures due to operational or configuration errors and environmental failures caused by environmental disasters. Both types of failures will be detailed in this section.

For operational failures, Referring to data provided by Google we see that approximately 30% of failures in data centers arrive due to operational faults and configuration. These are accidental mistakes made by the human operator or the system configuration, for system upgrade or during repair. The extent to which this type of failure affects the cloud system depends on the level on which the fault happened. It might only affect a single VM if the fault happens in the virtual system software. It can affect a physical server, thus affecting all VMs running it if the fault reaches the virtualization layer.

There is a possibility to assign an entire group or even whole datacenter where the network node software is mis-configured. The worst case , however, Remains the mis-configuration management software (management) of cloud that can shoot down (lower) while the cloud immediately.

As for environmental failures, Environmental disasters also play a part in the reliability of a system. Factors such as floods, Power cuts, the lights are … and outside the cloud service provider’s control, but can always interrupt service availability. These factors affect an entire datacenter and therefore the consequences can be very disruptive large-scale service.

The operation server also depends on the thermal conditions of the location where the servers are installed. Hence any failure in the air conditioning system of the premises where servers are placed also causes failure in service availability for these servers. Thus the servers can be therefore considered unavailable.

The severity of these failures also varies. It might be a fan of a single server not working hence causing only one server to be affected, or it might be failure of the cooling system of a cluster room where servers are installed which will eventually cause all the servers in that cluster to become unavailable.

In addition, power disruption in datacenters also has potential to effect the service provision of a datacenter. In light of fore-mentioned power distribution infrastructure, it is deduced that there are a number of failure points which affect the service provision on different scales. [7]

4. State of the Art of Rejuvenation

4.1. Software Aging

System failures due to imperfect software behavior are usually more frequent than failures caused by hardware components faults [8]. These failures are the result of either inherent design defects in the soft- ware or from improper usage by clients [9]. In this section, we will focus on software faults. For long running software such as OS and virtualization, the aging-related bug is one of the major causes of software failures.

After using them for a long time, software ages like human being. So they show some error such memory leaking, memory fragmentation, degradation of software, crash or hang failure unreleased file locks, data corruption, storage space fragmentation and round of error. Eventually this lead to a system failure if not healed. Both System availability and reliability are severely degraded with software aging. Hence the performance of the system is reduced, clients are not satisfied with the service which may result in huge economic losses for companies especially in cloud computing. Aging can be considered at different level; OS, application process, middleware, virtual machine, and VM monitor. In this paper we will only focus on Virtual machine and their monitor in the cloud Iaas.

Definition of Reliability: The probability that the software will not cause the failure of the system for a specified time under specified conditions. System reliability is a measure for the continuity of correct service, whereas availatdity measures for a system refer to its readiness for correct service, as stated by the following definition from [IEEESO]

Definition of Availability: The ability of a system to perform its required function at a stated instant or over a stated period of time. It is usually expressed as the availability ratio, i.e., the proportion of time that the service is actually available for use by the Customers within the agreed service hours. [10]

4.2. Software Rejuvenation

In order to heal software aging Huang et. al. have suggested a complimentary technique which is preventive in nature. It involves periodic maintenance of the software so as to prevent crash failures. They call it Software Rejuvenation [11], and define it as the periodic reset of the system to prevent failures. Software rejuvenation involves stopping the running software and "cleaning" its internal state so they recover its robust state again.

Software rejuvenation can be triggered at intervals derived from analytical system models, or based on aging indicators monitored. Discovering an efficient and effective set of system variables that are the best aging indicators is a variable selection problem. The quality of the aging indicators directly influences the accuracy of the timing with which rejuvenation is triggered. It thus determines the costs (e.g., downtime during rejuvenation) and benefits (e.g., avoided downtime by unexpected failures) of the rejuvenation mechanism.[8]

Figure 2. Classification of rejuvenation strategies.

The use of rejuvenation in Cloud systems is today one of the most hot research topic. Cloud computing offers an infrastructure of shared hardware and software resources which virtualization made possible. Hence the availability of resources in cloud computer should always fit users demand. Software rejuvenation can be successfully applied to Cloud systems to mitigate software aging that may affect the Virtual Machine.

4.3. Software Aging and Rejuvenation in Server Virtualized System

Figure 3. VMM (virtual machine monitor).

Virtualized data center is established on server virtualization technology. Multiple virtual machines (VMs) share the same physical resources. A software virtualized layer called virtual machine monitor VMM is used to implement VM’s. Since both VMs and VMMs stand on software technology, they both face the risk of software aging. Moreover, facing VMM’s Software aging is more critical as it affect all the VM’s on top of it. So as the VMM rejuvenates, hosted VM’s are unavailable. As software reliability in server virtualization has significant impacts on the availability of a virtualized data center, the VMM has to implement different strategies while rejuvenating. [13]

4.3.1. Vmm’s Rejuvenation

We assume that the VMM is in one of three states as shown in (a):

UP: the system is on its robust state running and processing jobs.

Failure Probale (FP) : server may degrade so it enter into state which could be generated for many raisons such as hardware failure, the network, the shared storage system, and the management server; in this paper we will only consider software aging. Degraded VMM rejuvenates and returns to the UP state again.

DOWN: Failure Probale VMM can enter a down state where failure can be detected by the monitoring tool at a certain monitoring interval and is manually recovered by a system administrator [14].

Rejuvenation means a planned restart of the system even though no failure has occurred. Since it is a planned restart the jobs currently in the queue are saved and are processed after the system has resumed service.[15] Rejuvation trigger can be activated according to either time policy, load policy or time and load policy as presented in [16].

Figure 4. State transition diagram for host(VMM).

4.3.2. VM Rejuvenation

Like VMM’s, VM’s also face software aging so they need to rejuvenate. In addition, hosted VMs can’t operate either when VMM are down. Figure (b) will show the different state of VMs.

VMM Down (VD) state when the underlying host server goes down. While the VMM is down due to either failure or software rejuvenation, all the hosted VMs must remain shut down until the host server becomes available. Actually in this paper, VM’s can migrate to another host if possible or resume their current jobs and continue running them later when the host server is up.

The state Sleep will represent a low consumption of energy as the VM is not busy serving any clients. This means that there are no jobs running on the VM.

Figure 5. State transition diagram for VM.

Cold-VM rejuvenation: Before triggering VMM’s rejuvenation, all the hosted VMs shut down. The VMs are restarted in a robust state. Both VMM and VM aging state are cleared.

Warm-VM rejuvenation: Instead of shutting down the hosted VMs, the VMs are suspended while VMM rejuvenation is triggered and the executions of the VMs are resumed at the completion of the VMM rejuvenation.

Migrate-VM rejuvenation: Before triggering VMM’s rejuvenation, running VM move to another host. Migrated VM rejuvenation depends on the target server capacity to accept other hosts. This technique is already supported in most modern VMM implementations such as Xen. Using live VM migration improves system availability.

5. Comparison Between the Techniques of Rejuvenation

Warm-VM rejuvenation and Migrate-VM rejuvenation have the advantage of conserving the execution states of VM than Cold-VM rejuvenation. The transaction processed by the application on the VM is lost when the VM goes down without preserving the execution states. The expected numbers of transactions lost due to VM restart, VM rejuvenation and VM repair are computed from the throughputs of corresponding transitions and results are summarized in figure I . In Cold-VM rejuvenation case, the VM is forcibly shutdown at the VMM rejuvenation, and hence the number of transactions lost due to VM restart is larger than the other two cases. Although Cold-VM rejuvenation reduces the number of VM rejuvenations, the total number of transactions lost in a year is higher than the other two cases.

There is not much difference in the number of expected transactions losts between Warm-VM rejuvenation and Migrate-VM rejuvenation.

Figure 6. Expected number of transactions losts in year.

6. Conclusion

This paper describes a study on the rejuvenation in virtualized servers in the context of cloud computing.

We presented the main features of virtualization and the state of the art of rejuvenation with comparison and discussion.

Future research should look to optimization of techniques of rejuvenation. Several future challenges must be addressed by industry and academia to ensure the everlasting success of Data center and cloud computing.


References

  1. P. Mell and T. Grance, "The nist definition of cloud computing," NIST Special Publication, Tech. Rep.800-145, Jan. 2011.
  2. Aldevar.free.fr/data/VeilleTechno/VeilleTechno-Virtualisation.pdf
  3. Dependability Differentiation in CloudServices Ameen Chilwan Master of Telematics - Communication Networks and Networked Services (2 year) Jully 2011Norwegian University of Science and Technology Department of Telematics
  4. Performance and Availability of Internet Data Centers. Menascé, Daniel A. 3, 2004, IEEEInternet Computing, Vol. 8, pp. 94-96.
  5. Amazon EC2 Service Level Agreement. Amazon Web Services. [Online] October 23, 2008.[Cited: December 5, 2010.] http://aws.amazon.com/ec2-sla/.
  6. Helvik, Bjarne E. Dependable Computing Systems and Communication Networks: Design and Evaluation. Trondheim, Norway : Tapir Akademisk Forlag, 2009.
  7. Master of Telematics - Communication Networks and Networked Services (2 year) July 2011 Dependability Differentiation in Cloud ServicesAmeen Chilwan
  8. R. Chillarege, S. Biyani and J. Rosenthal, \Mea-surements of failure rate in commercial software",In Proc. of 25th Symposium on Fault Tolerant Computing, June, 1995.
  9. P.A. Lee, \Software-faults: The remaining problem in fault tolerant systems?", In Eds. M. Banatre and P.A. Lee, Hardware and Software Architectures for Fault Tolerance: Experiences and Perspectives, LNCS, Springer Verlag, Vol. 774,pp. 171-181, 1994.
  10. Queueing Networks and Markov Chains
  11. Modeling and Performance Evaluation
  12. With Computer Science Applications Second Edition Gunter Bolch Stefan Greiner Hermann de Meer Kishor S. Trivedi
  13. Y. Huang, C. Kintala, N. Kolettis and N. D. Fulton, \Software rejuvenation: Analysis, Module and Applications", In Proc. of 25th Symposium on Fault Tolerant Computing, June, 1995.In Proc. 1st International Workshop on Software Aging and Rejuvenation/ © IEEE 19th International Symposium on Software Reliability Engineering, 2008. The Fundamentals of Software Aging Michael Grottke* , Rivalino Matias Jr.‡ , and Kishor S. Trivedi‡ * University of Erlangen-Nuremberg, Germany; Michael.Grottke@wiso.uni-erlangen.de ‡ Duke University, USA; {rivalino, kst}@ee
  14. Fumio Machida, Jianwen Xiang, Kumiko Tadano, and Yoshiharu Maeno, « Combined Server Rejuvenation in a Virtualized Data Center »
  15.  F. Machida, D. S. Kim and K. S. Trivedi, Modeling and Analysis of Software Rejuvenation in a server virtualized system. Software Aging and Rejuvenation (WoSAR), IEEE Second Int. Workshop: San Jose, CA, 2010.
  16. Felix Salfner, Katinka Wolter, "A Queuing Model for Service Availability of Systems with Rejuvenation", Software Reliability Engineering Workshops, 2008. ISSRE Wksp 2008. IEEE International Conference Seattle, WA.
  17.  Kishor S. Trivedi, Modeling and Analysis of Load and Time Dependent Software, The Third International Work-shop on Performability Modeling of Computer and Communication Systems. Workshop: Illinois, USA, September 6-8, 1996

Article Tools
  Abstract
  PDF(954K)
Follow on us
ADDRESS
Science Publishing Group
548 FASHION AVENUE
NEW YORK, NY 10018
U.S.A.
Tel: (001)347-688-8931