Hyperconvergence – aka aggregation – is pushing scale-out architectures in one direction. But Rack Scale Design (RSD) – aka disaggregation – is pushing scale-out in another direction. And Composable Infrastructure is hoping to split the difference, with the power to define aggregations in software, rather than hardware.
But this continuum is not symmetrical on each end. We have a pretty good idea of what can be done with hyperconvergence – check out the growing vendors – but disaggregation is still mostly in the theory stage.
That’s why the recent paper, Understanding Rack-Scale Disaggregated Storage, by Sergey Legtchenko, Hugh Williams, Kaveh Razavi, Austin Donnelly, Richard Black, Andrew Douglas, Nathanaël Cheriere, Daniel Fryer, Kai Mast,
Angela Demke Brown, Ana Klimovic, Andy Slowey, and Antony Rowstron, of Microsoft Research, is so useful.
For the research, the authors developed an experimental research fabric, dubbed the Flexible Fabric to test four levels of disaggregation based on how often a reconfiguration is needed.
They levels are:
- Complete disaggregation. Assumes any drive can be connected to any server on a per I/O basis. Most frequent reconfig.
- Dynamic elastic disaggregation. Assumes drives will connect to servers for multiple I/Os, but that the number drives connected to any one server will vary over time.
- Failure disaggregation. Reconfigure only on drive or server failures.
- Configuration disaggregation. Reconfigure only during deployment, or if a rack is repurposed. Least frequent reconfiguration.
The team needed a fabric that could reconfigure in a millisecond to even get close to testing the complete disaggregation model. With SSDs capable of hundreds of thousands of IOPS, even a millisecond is much too long, but who can do better?
The paper describes the Flexible Fabric:
The core of the Flexible Fabric is a 160-port switch, which implements a circuit switch abstraction. The switch allows any port to be connected to any other port. When any two ports are connected, we refer to them as being mapped . . . . The switch supports both SAS and SATA PHYs and is transparent to all components connected to it.
The authors take pains to point out that the Flexible Fabric is a research tool and is not intended for production use. They would recommend against even attempting to use the architecture in any kind of production environment.
It’s a research tool, not a stalking horse for a new kind of fabric product.
In their research the team found some anomalies. They couldn’t use a modern PHY like SAS 3.0 because it does link quality scanning – a good thing – which makes set up time last as much as a second – a bad thing.
They also discovered that rapid and frequent drive switching crashed some host bus adapters. For the SATA configuration, they finally selected the Highpoint Rocket 640 Lite 4-port SATA 2.0 PCIe 2.0 controller.
- Complete disaggregation was killed by the overhead of rapid switching. Not a huge surprise.
- Dynamic elastic disaggregation, where drives are connected to servers for minutes to hours at a time, proved to be technically viable, and potentially a boon for variable workloads.
- Failure disaggregation also proved to be technically viable, and its use case – migrating drives from a failed server to minimize the network overhead of rebuilds – is definitely interesting.
- Configuration disaggregation, where configurations are set at deployment, turned out to be a bust, because the flexibility and cost of the fabric didn’t provide a commensurate benefit.
The StorageMojo take
So the extremes aren’t interesting, at least given the issues with current technology. But that leaves a wide swath of possibilities for system architects to explore as RSD/disaggregation/composable infrastructure ideas gain steam.
Of course, now and always, reliability trumps flexibility. And there are, no doubt, many gremlins in dynamic disaggregation scenarios.
But greater disaggregation seems to be a secular trend due to the dissimilar rates of technological change in the underlying CPU, network, and storage technologies. Work like this paper helps sort out the issues.
Courteous comments welcome, of course.