In LINE, we are building/operating multiple large OpenStack Clusters for 2 years.
Recently the scale of biggest cluster in our cloud exceed 700 hypervisors, and started to experience RabbitMQ related outage and learned what we should have done. We believe RabbitMQ operation for OpenStack is one of the most difficulties and paint points of OpenStack Operators. In this talk, we will share what we faced and how we solved based of our failure story.
* Introduce configuration/metrics/architecture we have to consider for large scale OpenStack Cluster
* Introduce oslo.messaging patches which help large cluster
* Introduce our activity about rpc statistics to help user identify problem and tuning