In telecom operation, fault management is essential to achieve high service availability. This is typically realized by a monitoring system which is specific to hardware and application as they are tightly coupled before introduction of NFV. To realize the same network service availability in NFV, we have to make sure to have the same level of monitoring functionality in a cloud type platform. The OPNFV Doctor project focuses on this requirement and has proposed missing features to the OpenStack community.
We present our implementation of a fault management framework for NFV realized by different OpenStack services and comprising various exemplary use cases of compute/network/storage resource faults. This talk also provides insights on how to adopt this framework to various types of deployment models and operational policies utilizing Congress and Vitrage. The results of multiple PoCs will be shared to show the performance of this framework in particular w.r.t immediate notification.