A large OpenStack cloud consists of many moving parts that all need to be operating correctly to ensure a working service for end users.
By starting with existing Open Source tools like Tempest, Nagios, Jenkins, Puppet, Ganglia and tying them together with some custom tools, we can get a holistic view of the health of our systems. Starting from the load balancer at the top of the stack, through the control plane and down to the underlying hardware and then across the availability of all our individial services that we provide for our users, we can have confidence that our systems are operating correctly, but most importantly, we can quickly identify where to look when things go bad.
In this presentation, we look at the monitoring techniques, tools and infrastructure used for monitoring the Nectar Research Cloud and explore some of the custom code we use to make this work for us.
Attendees will learn about various Open Source monitoring tools, how they work, and how they can be made to work together for monitoring at all layers of an OpenStack cloud.