@WalmartLabs we were tasked to build clouds that would run production workloads to take on the daily site traffic and the peak Holiday traffic. To give you an idea, we get about XXXXX Billion hits over the holiday week, run over XXXX K Applications, XXXXX K nodes, and run collectively about 20+ Production and Non production clouds. With scale came the challenges of Operations and availability of these clouds. We redesigned the way we monitor our clouds, the way we monitor trend analysis, built self-healing, and automation. leveraged Rally to performance test the clouds. We use OneOps as the PaaS layer that manages the VM life cycle, vm auto repair/ replace, code deployment. We manage and operate these clouds by keeping it Simple
P.S- All XXXX numbers will be added for presentation later