Join the OpenInfra Summit Europe in Paris, Oct 17-19! Registration, CFP & Sponsorships are Open!

Building Big Data Analytics Data Lake with All-Flash Ceph

Big data analytics data lake architecture aims to meet the scalability requirement while adopting all-flash Ceph to address high I/O need. Leveraging Ceph RGW flexibility, users can have multiple clusters running different workloads concurrently with single back-end storage, making it suitable for big data analytics. Big data query engines Hadoop Hive and Presto is applied to represent user scenario. A comparable performance result is observed between disaggregated and hyper-converged architecture, providing users an option to ensure cluster flexibility as well as optimal performance. Tests to evaluate NVMe performance is also conducted. By comparing all-flash disks to spinning drives, improvements in performance and resource utilization are observed.

In this session, we will share performance analysis of disaggregated architecture with all-flash Ceph. You will learn how parameters tuning affect performance and a suggested practice to configure Ceph for big data.