Using public cloud resources to cover for peak workloads is a practical and economical alternative to over provisioning on-premise resources. This is the case in environments like CERN where its large internal computing infrastructure is usually big enough but where periods prior to big international conferences or large event reconstruction campaigns see a significant spike in the amount of workloads submitted.
We describe early experiences relying on Kubernetes federations deployed and managed by OpenStack Magnum to expand the available capacity to external clouds, while still offering a single entrypoint to our users - using GKE, AKS, Amazon and other clouds. We will cover the internals of the federation integration in Magnum, and some of the issues we had (mainly networking) and how we solved them. We will present how these deployments simplify the integration with our main batch system, and how workloads running on external resources access their corresponding datasets.
Additional use cases where Kubernetes federations are being considered will also be covered: easing integration of new hardware deliveries into the system, setting them as a separate cluster joining the federation only when fully setup and validated; merging heterogeneous resources into one single entrypoint; offering a single entrypoint to the service while provisioning resources in different tenants, useful for accounting.