Customers are eager to put big data services into Cloud. There are many projects like Sahara, Cloudera Director, or Cloudbreak in the market to help the customers to implement Hadoop in the cloud. But performance is always the most critical issue when considering big data in OpenStack. In this presentation, we would like to teach you how to configure Hadoop/Spark in OpenStack. We will use some real customer cases to point out most of the issues that you may concern when running a real big data workload in OpenStack. We did lots of performance testing and would like to show you the results and the gaps between bare metal and virtualization. We also proposed several efficient ways including both OpenStack and Hadoop/Spark configuration to enhance the performance and reduce the gap to integrate big data services into OpenStack easier.
How to optimize big data workloads in Cloud environment?