Workday is a leader in enterprise human resources software-as-a-service (SaaS) solution and has been active in the OpenStack community for many years. Driven by the demand of rapid customer growth and increased security needs, Workday’s OpenStack cloud has grown from 600 server fleet in 2016 to 4600 server by the end of 2018. We shall have 45 OpenStack clusters hosting more than 22, virtual machines, dispersed across 5 data centers in different geographical regions using 2PB of memory.
This talk presents:
- The architectural evolution that we went through to support business and scalability demands.
- The operational challenges for supporting many clusters and need for federated identity and image services.
- Operation metrics collected from the production environment and future plans to keep improving operational excellence.
- How we achieved performance demands of booting thousands of virtual machines within minutes by extending nova APIs to cache images on the compute nodes.
From this talk, attendees can learn:
- The operational challenges of managing multiple clusters
- Best practices for improving OpenStack API performance
- How to scale an OpenStack cluster without compromising customer SLA (Service Level Agreement).