Big data analytics data lake architecture aims to meet the scalability requirement while adopting all-flash Ceph to address high I/O need. Leveraging Ceph RGW flexibility, users can have multiple clusters running different workloads concurrently with single back-end storage, making it suitable for big data analytics. Big data query engines Hadoop Hive and Presto is applied to represent user scenario. A comparable performance result is observed between disaggregated and hyper-converged architecture, providing users an option to ensure cluster flexibility as well as optimal performance. Tests to evaluate NVMe performance is also conducted. By comparing all-flash disks to spinning drives, improvements in performance and resource utilization are observed.
In this session, we will share performance analysis of disaggregated architecture with all-flash Ceph. You will learn how parameters tuning affect performance and a suggested practice to configure Ceph for big data.