eBay Classifieds Group has a private cloud distributed in two geographical regions (plans for a tertiary zone), around 1,000 hypervisors and a capacity of 80k cores.
In light of the public disclosed security vulnerabilities Spectre and Meltdown, we needed to patch our hypervisors on four availability zones for each region with the latest kernel, KVM version and BIOS updates. During these updates the zones were unavailable and all the instances restarted automatically.
All this process was automated using Ansible playbooks created internally and using the Openstack API to leverage the operations.
We will present all the work done to shut down, update and boot successfully an infrastructure fully patched and without data loss. Also, we will talk about the Openstack challenges we faced, the features we missed and how we worked around it.
As a final note we will discuss the management of our SDN (Juniper Contrail) and LBaaS (Avi Networks) when restarting this massive number of cores.
- How to build an Ansible framework to restart an Openstack infrastructure
- How to workaround the challenges of Openstack when restarting compute nodes
- How to interact with SDN and LBaaS on an Openstack environment
- How to manage the patching of kernel, KVM and BIOS updates on Openstack