Towards improving the reliability of live migration operations in OpenStack clouds

Jan 1, 2017·
Armstrong Tita Foundjem
· 0 min read
Abstract
Cloud computing has become commonplace with the help of virtualization as an enabling technology. Virtualization abstracts pools of compute resources and represents them as instances of virtual machines (VMs). End users can consume the resources of these VMs as if they were on a physical machine. Moreover, the running VMs can be migrated from one node (Source node; usually a data center) to another node (destination node; another datacenter) without disrupting services. A process known as live VM migration. Live migration is a powerful tool that system administrators can leverage to, for example, balance the loads in a data center or relocate an application to improve its performance and–or reliability. However, if not planned carefully, a live migration can fail, which can lead to service outage or significant performance degradation. Hence, it is utterly important to be able to assess and forecast the performance of live migration operations, before they are executed. The research community have proposed models and mechanisms to improve the reliability of live migration. Yet, because of the scale, complexity and the dynamic nature of cloud environments, live migration operations still fail. In this thesis, we rely on predictions made by a Random Forest model and scheduling policies generated by a Markovian Decision Process (MDP), to decide on the migration time and destination node of a VM, during a live migration operation in OpenStack. We conduct a case study to assess the effectiveness of our approach, using the fault injection framework DestroyStack. Results show that our proposed approach can predict live migration failures with and accuracy of 95%. By identifying the best time for live migration with MDP models, in average, we can reduce the live migration time by 74% and the downtime by 21%.
Type
Publication
Polytechnique Montreal – Thesis