Since the first bacterial genome was published 20 years ago, there has been an explosion in the production of sequence data, fuelled by next-generation sequencing, catupulting biology to the forefront of data-driven science. As a consequence, there is now huge demand for the physical infrastructure to produce, analyse and share software and datasets. The CLoud Infrastructure for Microbial Bioinformatics (CLIMB) is a national e-infrastructure, distributed over 4 sites, that uses OpenStack to provide bioinformatics infrastructure as a service to the UK microbiological community. The development of this cutting edge cloud infrastructure has been technically challenging, and here we provide an in depth case study encompassing the design and implementation of our system over the last 18 months; highlighting key pitfalls – and solutions – that are of relevance to anyone who is interested in commissioning or implementing a cloud infrastructure of their own.
The session will cover a case study of the implementation of a single OpenStack system, comprised of hardware that is distributed across 4 separate research organisations in the UK.
The session will introduce some of the existing problems that are faced by genome biologists who work on research questions focused around bacterial pathogens, before moving on to outline how Openstack/Cloud has computing has the potential to empower biological researchers, and enable better data sharing and bioinformatics reproducibility.
The attendees will then be introduced to the CLIMB system, with some of our experiences around the implementation and operation of the system.
The attendees should leave with
- An understanding of several of the key challenges for researchers and for those who support data intensive biology
- How we have adressed these specifically within CLIMB
- An overview of some of the major pitfalls that we have encountered implementing our system