Spectre/Meltdown at eBay Classifieds Group: Rebooting 80k cores

Private & Hybrid Cloud

eBay Classifieds Group has a private cloud distributed in two geographical regions (plans for a tertiary zone), around 1,000 hypervisors and a capacity of 80k cores.

In light of the public disclosed security vulnerabilities Spectre and Meltdown, we needed to patch our hypervisors on four availability zones for each region with the latest kernel, KVM version and BIOS updates. During these updates the zones were unavailable and all the instances restarted automatically.

All this process was automated using Ansible playbooks created internally and using the Openstack API to leverage the operations.

We will present all the work done to shut down, update and boot successfully an infrastructure fully patched and without data loss. Also, we will talk about the Openstack challenges we faced, the features we missed and how we worked around it.

As a final note we will discuss the management of our SDN (Juniper Contrail) and LBaaS (Avi Networks) when restarting this massive number of cores.

What can I expect to learn?

How to build an Ansible framework to restart an Openstack infrastructure
How to workaround the challenges of Openstack when restarting compute nodes
How to interact with SDN and LBaaS on an Openstack environment
How to manage the patching of kernel, KVM and BIOS updates on Openstack

Wednesday, November 14, 11:00am-11:40am (10:00am - 10:40am UTC)

Hall 7 - Level 2 - 7.2a / Helsinki 2

Slides: Spectre/Meltdown at eBay Classifieds Group: Rebooting 80k cores

View video

Difficulty Level: Intermediate

Tags: Ansible Tungsten Fabric Puppet KVM Nova Horizon Keystone Neutron

Event Details