Deep Neural Network (DNN), a subfield of Machine Learning with structure loosely inspired by the brain, allows us to solve complex problems such as image recognition that has been very difficult to solve using standard programming paradigms.
DNN concepts are not new. However, and until recently, applying them in practice could not be realized due to their high computational demands. With the recent development in parallel computing, especially around GPU acceleration and high speed and efficient networking, DNN has become a reality in modern data centers.
In this talk we will describe the system requirements to effectively run a machine learning cluster with popular frameworks such as TensorFlow. We will discuss how such a system can be deployed in an OpenStack-based cloud without compromises, enjoying high-performance DNN programming paradigm as well as the benefits of cloud and software-defined data centers and show a case-study from Monash University
We will discuss key concept of Deep Learning, focus on distributed training and the technology needed to realize it in OpenStack cloud like GPU pass-through, SRIOV and RDMA.