In recent years, our OpenStack private cloud resources (65k+ VM, 25k vCPU, 350TB memory) that support Yahoo! JAPAN web services require 6x network performance per server compared with traditional server farm as a result of VM aggregation and higher density. Additionally, huge and/or burst network traffic (5~10x than ordinary) we have often received from internet. More internal backend traffic (DNS, RDB, MQ, etc.) happend accordingly. In such situations, we faced a network performance issue in our cloud.
To overcome this, we have adopted Open vSwitch with DPDK as a software L2 switch on HyperVisor instead of LinuxBridge, built new mitaka cluster (serving 8000+ VM scale) on OCP (Open Compute Project) servers, and started operation phase now.
As we found many problems through our activities from OvS/DPDK PoC phase to operation phase, we will present about combination limitation related OvS/DPDK/NIC driver, network architecture/performance, L7 SLA, operation perspective, and future works.