As part of troubleshooting issues in an openstack cloud, admin runs through various checkpoints to find the root cause and spends a lot of time in repeating the same troubleshooting steps for multiple occurrences of the same issue. We have captured the common troubleshooting checkpoints and created ansible playbooks that will run through the troubleshooting steps quickly and helps to narrow down the problem.
For example, one of the common issue in the cloud is VM not getting an IP. To triage, an admin has to go through multiple manual troubleshooting steps like finding whether the qdhcp for the network is created (or) whether the ports are active and so on. By running the triage playbook that already contains these troubleshooting steps, common checkpoints are validated and the root cause is identified quickly.
Agenda:
1. How the playbooks are developed with the checkpoints using Ansible
2. How to add more checkpoints to the playbooks
3. Illustrate with examples
How to use this tool to triage openstack issues.
How to add more troubleshooting playbooks