Hadoop was built with bare metal in mind - get your commodity hardware, stick hadoop on it and let YARN do all the hard work managing resources. However, Big Data software deployments on other substrates such as AWS (ec2 and EMR), AZURE, GCE are gaining popularity. We look at the challenges and a relevant solution related to deploying big data software in an OpenStack cloud. Perhaps most interestingly, we discuss and demonstrate what it looks like to run a machine learning job with Nova-LXD in that cloud to address data locality issues in virtualized environments, and to demonstrate that hypervisor overhead does not necessarily hinder Big/Fast Data processing.
How to quickly and easily deploy a big data stack in an openstack cloud.
How we can use Big Data tools to run a machine learning job on OpenStack logs and detect anomalies such as unusual user login location - and scaling to handle increased traffic.
How Nova-LXD mitigates technical concerns about data-locality and hypervisor overhead in a virtualized environment.
Spark anomoly detection.