Lots of energy around the concept of Rack Scale Design (Intel’s nomenclature) in systems design these days. Instead of depositing a cpu, memory, I/O, and storage on a single motherboard, why not have a rack of each, interconnected over a high-bandwidth, low-latency network – PCIe is favored today – and use software to define bundles of resources as “servers” on which the usual software runs.

The economic payoff comes in two flavors. First, it should be possible to raise resource utilization, so more work can be done for a given investment. Second, it should be possible to invest more frequently in the specific technologies – GPU, for instance – where progress is most rapid, without forcing new investment in longer-lived tech.

Broken assumptions
But the RSD concept breaks a lot of assumptions that are baked into system architectures. Storage, for example, assumes that the storage stack – software and physical I/O paths – are static. But in a networked rack scale datacenter, why would they?

That’s the problem tackled in a new paper from Microsoft Research, Treating the Storage Stack Like a Network, by Ioan Stefanovici, Microsoft Research,
Bianca Schroeder, University Of Toronto, Greg O’shea, Microsoft Research, and Eno Thereska, Confluent, Imperial College London.

The paper’s

. . . main contribution is experimenting with applying a well known networking primitive, routing, to the storage stack. IO routing provides the ability to dynamically change the path and destination of an IO, like a read or write, at runtime. Control plane applications use IO routing to provide customized data plane functionality for tenants and data center services.

Areas where this could prove valuable include

  • Load balancing writes to less-busy storage.
  • Ensure reads always come from the latest data.
  • Supporting SLAs in multi-tenant systems.
  • Supporting per-tenant cache isolation.

Now the hard part
We’ve been routing networks for decades, so how hard could this be? Pretty hard.

The basic problem is that data networks deal with copies of data, while storage is dealing with originals. Another wrinkle: applications have varying expectations of storage that need to be respected.

So the question of where routing switches are placed in the storage stack has important implications for data integrity, storage latency, and system performance.

What they did
Building on the earlier IOFlow storage architecture, they

. . . designed and implemented sRoute, a system that enables IO routing in the storage stack. sRoute’s approach builds on the IOFlow storage architecture. IOFlow already provides a separate control plane for storage traffic and a logically centralized controller with global visibility over the data center topology.

After examining storage semantics across a number of systems, the team concluded that I/Os can be classified into three types:

  • Endpoint – I/O goes to a specific file.
  • Waypoint – I/O goes to an intermediate destination, such as a cache or specialized processor.
  • Scatter – I/O goes to muliple sites, for replication or erasure coding purposes.

Sounds like data networking, no?

The StorageMojo take
I won’t try to summarize the entire 26 page paper. Suffice it to say that the authors demonstrate important use cases, such as tail latency control, replication, file cache control, and performance debugging – vital in a fully distributed infrastructure.

They also show that routing I/O can offer significant advantages. Nonetheless, a number of open issues remain.

One of the most interesting from an architecture view is that this could force network and storage management integration. If the storage wants to go from A to B, but the network management won’t allow that, you can see the problem.

But that could drive the second coming of the SAN. Take that, hyperconvergence!

Courteous comments welcome, of course.