Tired of Fibre Channel? Bored with Ethernet? Infiniband not enough? EMC has a new idea: a PCIe SAN.
PCIe is part of their DSSD plan. A good question is: will customers buy it? A better question is: what is EMC talking about?
EMC‘s head of product operations, Jeremy Burton, called Barron’s to discuss the DSSD array with Tieman Ray. Burton said the DSSD vision is to replace server PCIe cards – think Fusion-io – with a pool of shared flash storage, accessed through PCIe.
The DSSD guys said, Why not have hundreds of terabytes, why not have shared storage, and serve it up in the API, the format the application expects. That’s the general problem domain they’ve been working in. You can have a shared storage array and then you are going to wire it to the service, rather than cracking open the box. It’s just a big shared block of storage to servers using a PCIe connection. The thing that’s different here is that it looks like something the application would receive natively. If you’re using Hadoop, you’re using resources based on something called HDFS. The DSSD machine will serve up to Hadoop an HDFS interface natively, with none of the intermediary translation layer.
DSSD machine is rumored to run Linux on its flash controllers, which squares with Burton’s claim that it “. . . serve[s] up to Hadoop and HDFS interface natively . . . .”
But there’s more. After taking a swipe at Pure CEO Scott Dietzen’s tech smarts – Scott is a Carnegie Mellon PhD – Burton (formerly EMC’s CMO) said:
If you just go get NAND flash and want to write to it, takes 60 microseconds. Which is a relatively short period of time. The best all-flash arrays, and we think we have the best, will give you a response time of a millisecond. Why does it take that long to respond? It’s got to talk certain protocols. When you move things to the server side, these guys want to drive latency down below 100 microseconds. They write just super efficient software to make that happen.
Vendor spec sheets tell a different story: page writes typically are the 200-300µs range, not 60. With enough buffering anything is possible, but then it’s not a write to flash.
PCIe switches
Lets go with this DSSD for Hadoop idea for a moment. How do you scale to hundreds or thousands of PCIe interconnects? Switches, anyone?
It happens that PLX Technologies makes PCIe switches. They been doing it for years, are profitable, and your PC may have one of their PCIe chips. PLX is building a PCIe ExpressFabric switch as well.
Creating a switch that is transparent to applications is nontrivial: PLX has been working on it for 3 years. The great benefit of a PCIe switch is that the system already knows how to use it – if it is a transparent extension of the onboard PCI bus. But there’s a small matter of programming since today’s VMAX directors support Fibre Channel and Ethernet, not PCIe links to hosts.
The StorageMojo take
Mr. Burton’s comments are pseudo-tech FUD. They are not designed to illuminate but to reassure a restive customer base – and, given the Barron’s audience, restive investors too – that EMC will indeed, someday, have a competitive flash storage solution. Pure, Violin and Fusion-io must be taking a chunk out of EMC’s sales.
One sign of the FUD: Hadoop is designed to scale on shared-nothing architectures. EMC proposes to change that because?
While reducing latency is good, inquiring minds might wonder how much of Hadoop latency is due to storage latency vs everything else? Some Berkeley benchmarks found, roughly, that moving from disk to in-memory – faster than PCIe – cut run time in half across several benchmarks. Good, but not likely to justify a substantial hike in storage cost.
What Mr. Burton didn’t say is most revealing: that the DSSD box could be attached as the backend to VMAX directors through PCIe. As EMC’s Chuck Hollis – a long time StorageMojo reader – wrote back in 2011:
The vast majority of [VMAX] hardware is standard stuff (Intel CPUs, etc.) — with very little use of custom ASICs, for example. As a result, as standardized components get better/faster/cheaper, we can easily incorporate them into the VMAX architecture.
Since “standard stuff” uses PCIe, EMC could use DSSD as a VMAX backend, probably with thrilling performance. So why not mention that? You get all the wonderful software features of VMAX – and a big performance boost!
Why? Because DSSD isn’t ready – and won’t be for at least 18 months – and EMC is desperate to keep selling disk-based VMAX in the meantime. It will be a fine product when it finally comes out, but if VMAX sales tank in the meantime, not many will care.
Courteous comments welcome, of course.
Wait. Doesn’t EMC have XtreemIO? I’m missing something here…
XtremeIO was DOA from the start. EMC bought the wrong co. Project Thunder crashed. Haven’t you heard.
thing is a slightly massaged Linux box running iSCSI target (eg. SCST) or even a flash JBOD using PCIe interconnects (a couple years ago Dell? was offering a solution from a now defunct UK company that used SR-IOV etc. to share FC/10Gb/Infiniband at top of rack) is probably 2 orders of magnitude cheaper than anything EMC would dare sell. People looking for speed above all don’t give a hoot about expensive SAN features. Anyone with any smarts treats all storage as non-persistent so failure of the NVRAM device(s) is no big deal. Processing moves to another node. The comet has already hit EMC and the mushroom cloud to extinguish all such dinosaurs is spreading ever wider. EMC is already dead. I would argue ALL centralized storage vendors (disk or NVRAM) are doomed and quickly. 10 years max. EMC/NetApp/IBM et. al. are under the delusion that their desperate attempts at NVRAM will allow them to outrun their inescapable calamity.