Erasure coded (EC) storage has achieved remarkable gains over current RAID arrays in fault-tolerance and storage efficiency, but the knock against it is performance. Sure, it’s highly available and cheap, but it’s slo-o-w.
Advanced erasure codes – those beyond traditional RAID5 and RAID6 – require a lot more compute cycles to work their magic than the parity calculations RAID uses. With the slowdown in CPU performance gains, waiting for Moore’s Law to rescue us will take years.
But in a recent paper Joint Latency and Cost Optimization for Erasure-coded Data Center Storage researchers Yu Xiang and Tian Lan of George Washington University and Vaneet Aggarwal and Yih-Farn R. Chen of Bell Labs tackle the problem with promising results.
3 faces of storage
The paper focuses on understanding the tradeoffs through a joint optimization of erasure coding, chunk placement and scheduling policy.
They built a test bed using the Tahoe open-source, distributed filesystem based on the zfec erasure coding library. Twelve storage nodes were deployed as virtual machines in an OpenStack environment distributed across 3 states.
Taking a set of files, they encoded each file i into ki fixed-size chunks and then encode it using an (ni, ki) MDS erasure code. A subproblem is chunk placement across the infrastructure to provide maximum availability and minimum latency.
The researchers then modeled various probabalistic scheduling schemes and their impact on queue length and the upper bound of latency.
Joint latency – cost minimization
The 3 key control variables are erasure coding scheme; chunk placement and scheduling probabilities. However, optimizing these without considering cost is a ticket to irrelevance.
It’s not any easy problem, as the paper’s pages of math attest. But one graph shows what is possible with their JLCM algorithm:
The StorageMojo take
If CPUs were getting faster as they used to we could wait a few years for high-performance erasure-coded storage. But unless Intel puts optimized EC co-processors on its chips – similar to its GPUs – we’ll have to do something else.
EC storage faces a higher bar than earlier innovations. Even pathetic RAID controllers could out perform a single disk. Similarly, early flash controllers could as well, thanks to flash performance.
But EC storage is slower than even disk-based arrays. But the financial and availability benefits of cracking this particular nut are huge.
The paper offers a valuable perspective on moving EC storage forward. Let’s hope someone takes this opportunity and runs with it.
Courteous comments welcome, of course. What would it take to accelerate EC performance?