By now, high availability for the OpenStack control plane is well understood, and to a large extent a solved problem which the community continues to refine.
In stark contrast, most solutions for compute node HA (i.e. where VMs are automatically restarted on a different compute node if there is a failure in the hypervisor or its underlying hardware) are still either relatively immature, experimental, or in the design phase. This is despite the high demand for this feature, which calls into question the older belief held by some that OpenStack should only accommodate "cattle" VMs which have resilience built in at the application layer.
In this talk presented by members of the OpenStack HA community who have been collaborating on this topic since Tokyo, we summarise and compare all known current approaches, from vendors including Red Hat, SUSE, Intel, NTT, AWcloud, ChinaMobile, and ZeroStack, and explain the current thinking, challenges presented, and future directions.
Attendees will gain an understanding of all existing approaches to compute node HA in OpenStack, including ones which are still in early stages of implementation. They will learn the differences between these approaches, and how the OpenStack community is collaborating on this topic, and what future directions are likely.