The Hadoop file system driver for Swift is increasingly important as Hadoop ecosystem deployments on OpenStack grow in scale and demand ever-increasing performance. After several months of experience using the OpenStack Sahara-extra file system driver with Hive, Spark, and Presto, we realized that a clean slate redesign would allow us to address threading and data management considerations that significantly impact performance.
We developed a new Swift file system driver, “Swifta”, featuring thread pools, lazy seeks, caching of identical requests, object listing imrovements etc. We tested our implementation against on-premise Ceph object storage, successfully running large queries that otherwise simply failed, and running other queries with substantial performance improvement.
This presentation will discuss the design, implementation, deployment, and performance characteristics of Swifta, which we plan to open source this year, and its use in our cloud-based big data environment.
Attendees will learn about our months-long experience using the currently-available Swift driver, our motivations for revisiting the overall architecture rather than continuing to make incremental changes to the existing implementation, and performance characteristics of our implementation.