Using Big Data environments as analytics engines to gain useful insight on massive amounts of data in a manageable and timely manner can be a complex challenge. Using Sahara as the orchestration tool helps cloud architects provide high-performance production Hadoop and Spark environments enabling private cloud deployments that are easy for data scientists to manage and consume. We will provide a reference architecture for a Hadoop cluster using Intel based systems and Sahara focused on performance and ease of use to answer questions regarding security, storage, and supporting future releases without re-installing.
Attendees will learn:
- Hadoop configurations to best utilize CPU and I/O resources
- Storage Best practices
- New ways to orchestrate a cluster with various Hadoop and Spark workloads
- Published performance benchmark results
- LIfe cycle management
Attendees are expected to have understanding of analytics workloads, some understanding of the Hadoop framework, and knowledge of storage solutions for OpenStack. While attendees with just this knowledge will be able to make use of the information provided, mainly around the return on investment, it is recommended to have some understanding of OpenStack internals.
Attendees will receive a complete overview of:
- Hadoop settings and configuration to best utilize CPU and I/O resources
- Best practices for selection of storage settings
- New ways to submit jobs and orchestrate clusters with various Hadoop distributions
- Spark workloads orchestration
- Published results on different benchmarks used to validate cluster performance
- Life cycle management of deployments