OpenStack scales, right? Just plug it in, crank it to 11, and your cloud can take anything you throw at it. Any Nova core will laugh at this, but what's the truth?
It's common knowledge that OpenStack regions can't really grow beyond 1000 hypervisors without using "cells". That's great, but cells disable some features, don't work with all Neutron configurations, and aren't fully tested in the OpenStack gate. We found this most disturbing, especially since we wanted to build many regions well over 1000 hypervisors and with most of the features on.
In this talk we'll discuss our methodology for testing at scale, the results of our tests (including lots of delicious graphs!) and some of our strategies to get to our desired scale.
Spoiler alert, here's some things we'll discuss:
- Docker Docker Docker Docker
- RabbitMQ vs. ZeroMQ
- MySQL/MariaDB and Galera clustering
- Nova's Scheduler
- Neutron OVN
- CI/CD
- Rally
We'll answer some of these questions:
- At what scale does OpenStack break?
- Hypervisors, networks, vms, etc.
- What strategies are there to push beyond those scale limits?
- What is coming in the future to OpenStack to make this simpler?
- Where are the pitfalls and traps that should be avoided?
- How can we make sure OpenStack continues to scale?
- How can we isolate failures at scale to avoid cascading failure?
There will be discussion of several technologies, including:
- Docker
- RabbitMQ
- ZeroMQ
- MySQL/MariaDB
- Galera clustering
- Nova's Scheduler
- Neutron OVN
- Zuul
- Gerrit
- CI/CD
- Rally