Direct attached storage may catch on
PCI-e DAS is getting traction in the media world. At least a dozen vendors – all smaller – were showing it, and customers were responding.
JMR’s BlueStor is promising over 4 GB/sec with PCI-e attach. In a world where a single 4k frame is almost 50 MB, that is speed production companies need.
More on NAB later.
The StorageMojo take
Beth Pariseau noted the DAS movement at SNW earlier this month. This isn’t just a Hollywood moment. There’s more to this nascent DAS resurgence than the need for speed.
- Multi-core systems. Multi-core, multi-thread systems are like a cluster in a box – only cheaper. DAS looks like a SAN to an 8 core system.
- Management. When you can easily attach several dozen TB of cheap SATA to a physical machine, who needs a SAN? Not to mention the optical PCI-e extension cables.
- Cost. There’s something that looks a lot like worldwide depression going down. DAS is cheap(er) and as long as systems scale inside the box a SAN offers few advantages.
A DAS resurgence. Will wonders never cease.
Courteous comments welcome, of course.
You forget to mention SSDs. When devices like the Intel X25-E have latencies in the low tens of microseconds, the latency induced by Fibre Channel becomes unacceptably high.
Systems like the Sun Thumper (specially when running ZFS) are another incarnation of this trend. The SAN promise of cost reduction through consolidation and management has been mostly offset by price-gouging from FC vendors and the morass of incompatible standards hampering standardized management.
Fazal, good catch! With SSDs providing fast I/O we can use cheap bulk DAS SATA while still getting good performance across a large range of apps. Why worry about utilization for cheap storage?
Robin
I wouldn’t “really” call this a DAS resurgence, just an architecture change. It’s only a matter of time before the big enterprise vendors are forced to moved to e-PCIe (external PCIe) in order to get the throughput from storage trays to their controllers. As Fazal points out, we will soon have more b/w in a tray than SAS busses can handle. This is a manifestation of technological pressure; no surprise it’s happening in the DAS market first, which is more of a “userland” market than enterprise storage.
I’m also very glad to see this technology emergence and likewise not surprised. DAS reduces complexity (in my assessment). Oracle Exadata Storage Server is, in essence, a DAS offering. The plumbing from the database grid to the storage grid is not really a storage protocol. Instead, Exadata uses a database centric-communications protocol to make storage *and* other generically database-centric requests of the storage grid. Once the storage grid has a request in-hand it operates against DAS. To that end, the architecture isn’t really “Networked Storage” as much as it is networked services and today most of those services happen to be storage-related. An example of this is an Exadata “Smart Scan.” Smart Scan is a database-centric request of the storage grid to scan data and return filtered data (or perhaps none at all) to the database grid using database-centric messaging. That messaging happens to look a lot more like messages sent between instances of Oracle Real Application Clusters than blocks of data sent between traditional storage and a traditional Oracle instance running on a server attached to traditional networked storage.
The views expressed in this comment are my own and do not necessarily reflect the views of Oracle. The views and opinions expressed by others on this comment thread are theirs, not mine.
I must admit that I’m suprised to see hard disks being put straight onto PCIe as I would have throught there was limited benefit compared to SATA. However, SSDs are a different matter. As has been pointed out, if you want to optimise the latency offered by these devices you have to get rid of the latency introduced by storage protocols. In addition, I would expect that a considerable saving on CPU used in I/O drivers could be realised through something that “understands” flash behaviour more directly.
However, there will always be a need for network addressable storage for shared access. If direct attach PCIe DAS is available for internal usage in servers, then it will also be available for use in storage arrays. It’s very likely that such a storage array will look very different to some of today’s monoliths – a modular system with fast interconnect using blade type technology could work very well using “open storage” software. Maybe see what Oracle come up with following the purchase of Sun and how companies like Dell, HP and the like will exploit it (although HP have Polyserve to worry about of course and that is architected around shared storage access from nodes, not federated storage).
Steve,
Can you explain your differentiation between “shared storage access” and “federated” in the context of HP/PolyServe?
Robin, apologies if it seems I’m diverting the tread from the OT. I’m just confused by this distinction being drawn by Steve and, as you know, I have a “bit” of a PolyServe background.
The notion of a multi-core machine using DAS as its SAN has some cuteness appeal to it but the reality is that you’ve just created a single failure domain across all of the moving parts. Now if you aren’t looking for enterprise availability that is a reasonable bet, but that is not as useful as you might hope.
The issue with Direct Attached is, and always will be, that by being direct attached when the thing you’ve attached it to is ‘dead’ your data in inaccessible. Dual-port Fibre Channel and some of the monstrous SATA hacks along these lines tries to avoid that without the full expense of SAN by at least shrinking the window to two ‘hosts’, SAN of course allows you make the window arbitrarily small which is important in some applications.
Now cheap and dense gives you other options such as replication but that still begs the question about what you are trying to achieve and which cost are you trying to address (Cost of Acquistion? Maintenance cost? Power costs?)
A typical cable junction box will have 50 – 75 houses connected to it (NAB data point from about 4 yrs ago) and if each of those homes is supporting 2 HD streams from the network then you are looking at 100 – 150 streams or roughly 5 – 7.5 GB/sec (50 – 75 gigabits/sec) of through put, hard even in a DAS world to put that kind of bandwidth out in the wild as it were.
–Chuck
The comment about single point of failure is valid, but since full clustering is the only way to eliminate same (= $$$), JMR’s PCIe attached storage is targeted for high-bandwidth, reasonable cost where performance and 99% uptime are goals. Such systems aren’t intended to provide for national defense, landing jets or keeping the telephone system alive; instead they’re mostly used in non linear editing systems and such. Data is on and off the drives in hours or days, and the occasional system failure is a (sometimes costly) inconvenience but not life threatening. Most post and DI systems are not 100% fault tolerant, as this can be cost-prohibitive.
The real upside to PCI-attached (DAS) is pipeline bandwidth and low cost scalability. With sixteen 7200 rpm rotating disks we can achieve 4GB/sec data throughput rates using only about 80 disk drives (real RAID 5 sustained); with 15K rpm SAS devices, only about 48 drives are required to fill this very wide pipe. That’s blinding performance for low-cost storage.
To address the SAN environment, JMR has launched its new FibreStream product which is either 4Gbps or 8Gbps FC-AL I-O bridged to SAS or SATA disk drives (internal hardware RAID controller, internal SAS expansion) which is also a fairly low-cost, scalable solution.
All these may be found at http://www.jmr.com
-Steve
Chuck, Andy Leventhal of Sun told me the main benefit of SATA-attached SSDs like the Intel X25-E (as opposed to PCIe-attached ones like FusionIO) is backwards compatibility with dual-attach arrays.
I am not sure I agree with your stance on redundancy. In my experience, with FC drives all too often a failure of a single drive paralyzes an entire loop. This was in fact the only thing that caused an entire system failure – the controlers and HBAs themselves were more reliable than the supposedly passive backplane and FC arbitration, despite running in some cases 24/7 uninterruptedly for over 4 years.
The extra circuitry required for dual attachment is necessarily more complex and thus more prone to failure than a simpler single-point attachment like PCIe. I wouldn’t be surprised at all if real-world reliability of dual-port FC, SAS and SCSI solutions is overstated. With components like disk drives where mechanical failure of a drive is orders of magnitude more likely than loop failure, this may not be as visible, but SSDs will shine a harsh light on other subsystems that can’t keep up.
Federated is probably the wrong term. What I mean by that is a configuration where the storage is provided by multiple nodes, each node providing part of the whole in a seamless manner. In order to provide redundancy in such a setup then you need replication between nodes so that there is no single point of failure. If you do it that way, then you can use internal storage only provided that you can deal with the issues of synchronising across nodes, data consistency and so on. It is possible for applications to do this explicitly of course, but I have in mind something that does it transparently and fully synchronously using industry standard components.
Polyserve is, as far as I’m aware, a “classic” cluster file system with each node having access to the storage devices. That means you need a SAN of some sort and it wouldn’t work with internal storage.
Just to add to the previous comment with a link, this link refers to EMC’s new V-Max architecture as being “federated storage” – this is it comprises a lot of nodes with their own storage which are linked together to make a virtual data store. Of course EMC will have a lot of their own proprietary hardware built into this, but the are using something closer to commodity processors in the shape of Intel Xeon multi-core machines
http://virtualization.com/featured/2009/04/15/new-emc-virtual-matrix-architecture-good-news-for-virtual-data-center-storage-scalability/
There are a number of suppliers that could go down this model. Sun (and now, of course, Oracle) have many of the basic software building blocks. However, having seen a presentation on the Sun Unified Storage devices, which uses commodity hardware and standard software, it’s clear they have a long way to go what is needed on a fully scalable storage offering. At the moment, they have a two-node HA (with failover) cluster using ZFS offering iSCSI, NFS, CIFS, snapshots, replication and the like clearly aimed at undercutting NetApp and the like. There is a lot of work Oracle will have to put in before a fully scalable, highly-available, fault-tolerant open storage infrastructure can be put together. Unless there’s an awful lot more been going on in the background, then that must be a couple of years away, although you can see the possibilities. In the meantime, you can expect EMC and the like to be pushing on with products deliveries.
For a multi-core system, DAS indeed would look like a localized SAN. And with virtualization becoming an integral part of such server environments, direct PCIe attached SSDs would offer the best random read and write performance across all VMs (without being bottlenecked by either rotating HDDs or SSDs using legacy storage protocols) Direct attached PCIe SSD up until now has been restricted to the confines of the PCIe slots in the host server and did not support an external storage solution. Steve’s prediction in an earlier post has come true. External direct attached PCI-express storage is now available from Dolphin.
Dolphin has introduced a direct attached PCI-Express storage appliance called StorExpress (http://www.dolphinics.com/products/storexpress.html) This appliance uses PCI Express for native communication to the storage subsystem rather than using PCIe as a bridging interface to storage. It can also be located at distances of up to 300m from the server. A single appliance can also be shared by multiple hosts with each host zoning up the PCIe storage within the appliance. At this time it support up to 4TB. And from a performance perspective, we have seen random read/write IOPS of over 270K and bandwidths of over 2700MB/s.
For Dolphin’s PCI-Express storage appliance are they using their own storage card or someone else’s?