Collaboratory is a highly-available OpenStack environment scalable up to 3000 cores and 15 PB object storage, and the talk will discuss the various design goals and trade-offs considered. The Collaboratory currently stores 500 TB of genomic data from the International Cancer Genome Consortium, and the dataset is expected to grow to 5 PB by 2018. Software optimized for Ceph storage was designed to authenticate and provide data access to only authorized users, and one project as an early user of the Collaboratory is the PanCancer Analysis of Whole Genomes, one of the world's largest cancer data analysis initiatives exploring the whole genomes from over 2800 patients across 20 tumor types. The use case further drove the development of Dockstore for sharing workflows as docker containers. The presentation will discuss PCAWG and other use cases on the Collaboratory, performance results and optimization, and lessons learnt from enabling large-scale cancer genomic research on OpenStack.