Got interoperability testing?

by Robin Harris on Saturday, 9 August, 2014

At Flash Memory Summit StorageMojo spoke to David Woolf and Kerry Munson of the University of New Hampshire InterOperability Lab. It’s been around for decades and still is.

It is primarily staffed by college students – cheap labor – managed by senior engineers.

Protocol testing is it’s primary function. The IOL tests many protocols such as Ethernet, Fibre Channel, iSCSI, SAS, SATA, PCIe and more. As most protocol “standards” are festooned with options this isn’t as simple as it seems.

For example, NVMe (non-volatile memory express) testing. One of the options in the NVMe spec is reservations for storage resources – pinning a resource to a server or app.

As an option it may or may not be implemented. But if the vendor says it is implemented, then the IOL will test it for correctness.

IOL maintains integrators lists that show who has been tested, and when. But IOL doesn’t make their testing report – where they list features and results – public.

As the property of the vendor they aren’t available unless the vendor makes them public. Or released to you under NDA. To use them you’d still have to match up the tested features with whatever you want to interoperate with.

The StorageMojo take
The IOL is funded by industry – and UNH student labor – so the limited distribution of its results isn’t surprising, though it is sad. This reflects the fact that interoperability – or lack thereof – is a competitive weapon almost as much as it is a useful feature.

This was evident in Fibre Channel’s heyday, when frequent plug fests convinced buyers that FC was open and interoperable. But niggling details made multi-vendor FC networks rare in production.

Despite the limitations, the IOL is a useful tool for ensuring that protocol implementations meet the letter, if not the spirit, of industry standards.

Courteous comments welcome, of course.

{ 1 comment }

Seagate’s Kinetic vision shipping – but not from Seagate

by Robin Harris on Monday, 4 August, 2014

Remember Seagate’s Kinetic open storage vision? Turns out there is a shipping product embodying the same ideas – but not from Seagate.

Surprised?

Huawei’s UDS – Universal Distributed Storage – system launched two years ago with a home-grown smart disk. Each UDS drive has a daughter board with an ARM processor, memory, two Ethernet ports and software.

The software implements a distributed hash table, a key-value store and two clusters, an object-storage service controller node (OSCN) and a universal distributed storage node (UDSN).

The OSCN provides an access interface for object-based storage services, which is mainly used to process and control access requests from clients, establish an object transmission channel, and manage metadata, while the UDSN is for storing and replicating both data and metadata, and guaranteeing data consistency.

Huawei can pack 75 of these drives into a 4u, top-loading box with two switches so each drive can talk to two fabrics and the two logical clusters. Power consumption is about 3.9W per terabyte.

Besides Seagate, HGST is reportedly experimenting with smart disks as well, running Linux directly on the disk’s controller. But there’s a tough trade-off between software richness and predictable latency.

The StorageMojo take
Globally routable storage. Advanced erasure coding across drives. A standardized object storage interface. Organic rolling upgrades without forklifts. All on a mass-produced, low-cost brick: the smart disk drive.

As StorageMojo noted last year:

Getting RAID controllers out of the stack reduces latency and eliminates a major cost and bug pool – a Very Good Thing. It also allows drive vendors to reclaim margin lost with falling enterprise drive sales. . . .

Distributing the needed intelligence to the lowest possible level – the drive – should be more scalable and economic than current DAS-based server models. The tipping point is the value of the aggregation, caching and low-cost disk connectivity – network bandwidth is way more costly than internal DAS bandwidth – of storage servers versus the advantages of removing the storage server tier.

Instead of tens of thousands of RAID controllers, hundreds of millions of smart disks, with all the advantages of object stores. That would go far in reducing operating costs and bugs.

The key is a systems-level – not drive level – approach to the architecture, which drive vendors aren’t well-equipped to provide. Huawei’s approach is promising and shows what can be done – if drive vendors can summon the courage and the smarts to do it.

Courteous comments welcome, of course.

{ 0 comments }

Friday hike blogging

by Robin Harris on Friday, 1 August, 2014

Looking northwest from Hangover Trail on Wednesday. Hangover is a new trail that I’d never been on before. An incredible hike.

Munds-Hangover_7-30-14-4183

I’m going to limit hike blogging to Fridays. But next week I’ll be at the Flash Memory Summit and won’t be hiking – at least not at home.

{ 0 comments }

High performance SSDs: hot, hungry & sometimes slow

by Robin Harris on Friday, 25 July, 2014

Anyone looking at how flash SSDs have revolutionized power constrained mobile computing could be forgiven for thinking that all SSDs are power-efficient. But they’re not.

In a recent Usenix HotStorage ’14 paper Power, Energy and Thermal Considerations in SSD-Based I/O Acceleration researchers Jie Zhang, Mustafa Shihab and Myoungsoo Jung of UT Dallas examine what they call “many-resource” SSDs, those with multiple channels, cores and flash chips.

Their SSD taxonomy divides SSDs in terms of interfaces, cores, channels, flash chips and DRAM size. There’s no one metric that defines a many-resource SSD but the entire gestalt of the device. Here’s their breakdown:

SSD specs

The price of performance
Each flash die has limited bandwidth. Writes are slow. Wear must be leveled. ECC is required. DRAM buffers smooth out data flows. Controllers run code to manage all the tricks required to make an EEPROM look like a disk, only faster.

So the number of chips and channels in high performance SSDs has risen to achieve high bandwidth and low latency. Which takes power and creates heat.

Testing
They ran 3 different real SSD testbeds on a quad-core i7 system with an in-house power monitor and an application to capture detailed SSD info such as temperature. They tested both pristine and aged SSDs, running I/O sequential and random workloads that ran from 4KB to 4MB.

Key findings
The many-resource SSD exhibits several characteristics not usually associated with SSDs.

  • High temperatures. 150-210% higher than conventional SSDs, up to 182F.
  • High power. 2-7x the power, 282% higher for reads, up to 18w total.
  • Performance throttling. At 180F the many-resource SSD throttles performance by 16%, equivalent to hitting the write cliff.
  • Large write penalty. Writes at 64KB and above in aged devices caused the highest temperatures, presumably due to extra overhead for garbage collection and wear leveling.

Performance throttling was not limited to the high-end SSDs. The mid-range many-core drive slowed down at 170F, probably due to thermally-induced malfunction as the drive had no autonomic power adjustment.

The StorageMojo take
This appears to be the first in-depth analysis of the power, temperature and performance of a modern high-end SSD. The news should be cautionary for system architects.

For example, one new datacenter PCIe SSD is spec’d at 25w – higher than the paper found on slightly older drives. That’s twice what a 15k Seagate requires.

The slowdown seen for large writes suggests caution when configuring SSDs for write-intensive apps. Almost by definition the performance hit will come at the worst possible time.

StorageMojo commends the researchers for their work. It’s important that we have good data on actual today’s SSD behavior instead of impressions gained years ago with simpler and slower devices. If high-performance SSDs loom large in your planning the paper is well worth a read.

Courteous comments welcome, of course. What surprises you the most about this research?

{ 1 comment }

Performance: IOPS or latency or data services?

by Robin Harris on Wednesday, 23 July, 2014

Unpacking the data services vs performance metric debate. Why we should stop the IOPS wars and focus on latency.

IOPS is not that important for most data centers today because flash arrays are so much faster than the storage they replace. That’s why the first post was titled IOPS is not the key number.

The point of that post was that in the context of all flash arrays the greater benefit comes from lower latency, not more IOPS. Everyone agrees more IOPS aren’t much use once the needed threshold value is crossed. But lower latency is a lasting benefit.

The second post Data services more important than latency? Not! was more controversial. I was responding to a Twitter thread where an integrator CTO first asserted that customers don’t care about latency (true, but they should) and then questioned the datacenter savings due to flash performance.

My response: where has this guy been for the last 10 years? Hasn’t he noticed what flash has done to the market? Could he not wonder why?

What his tweets underscored is that we as an industry have done a poor job of giving customers the tools to understand latency in data center performance and economics. We clearly don’t understand it well ourselves.

Safety doesn’t sell
Compare this to auto safety. 50 years ago Detroit argued that “safety doesn’t sell” because consumers didn’t care about it. They fought seatbelt laws, eye level brake lights, head restraints, airbags and more because, they said, consumers don’t want to pay for them.

Today, of course, safety does sell. There are easily understood (and sometimes controversial) benchmarks for crash safety that make it easy for concerned consumers to make safety-related choices. Not all do, but clearly safety is a constant in mass-market car ads today, showing how much market sentiment has shifted as consumers understood it meant keeping their children, family and friends safer.

In regards to latency, the storage industry is where Detroit was 50 years ago. People like the CTO, who should know better, don’t.

The VMware lesson
VMware offers a more recent lesson. They offered a simple value proposition: use VMware and get rid of 80% of your servers.

That wasn’t entirely true, but it encapsulated an important point: you can save a lot of money. Oh, and there are some other neat features that come with VMs, like vMotion.

Give people a simple and compelling economic justification and they will change. But it has to be simple and verifiable.

Data services platform?
The rapid rise of the “data services platform” meme is a tribute to EMC’s marketing. Google it and you’ll see that until EMC’s SVP VMAX, Fidelma Russo wrote about it a couple of weeks ago, it wasn’t even a thing. Now we’re debating it.

Likewise, asserting that data services are more important than performance contravenes 30+ years experience with customers. Yes, data services are important – mostly because today’s storage is so failure prone – but give a customer a choice between fast enough and not fast enough with data services and you’ll quickly see their place in the pecking order.

EMC is changing the subject because the VMAX is an overpriced and underperforming dinosaur. Until they get the DSSD array integrated into the VMAX backend, it will remain underperforming.

The StorageMojo take
Is performance – thanks to flash arrays – a solved problem? Those who argue that flash arrays are fast enough for most data centers seem to think so. And they may be correct for a few years.

It’s easy to forget that we’ve had similar leaps in performance before, most notably when RAID arrays entered the scene almost 25 years ago. It took a few years for customers to start demanding more RAID performance.

What happened is what always happens: the rest of the architecture caught up with RAID performance. CPUs and networks got faster; applications more demanding; expectations higher.

Storage is still the long pole in the tent and will remain so for years, if not decades, to come. In the meantime we need to refocus customers from IOPS to latency.

How? A topic for future discussion.

Courteous comments welcome, of course.

{ 8 comments }

Hike blogging

by Robin Harris on Tuesday, 22 July, 2014

Sunday morning took the Brins Mesa trail loop.

Brin_loop_07-20-2014-1266

Sometimes people wonder if I ever get bored with the scenery. Not yet!

{ 1 comment }

Flash Memory Summit 2014

by Robin Harris on Sunday, 20 July, 2014

The entire StorageMojo analyst team will be saddling up and leaving the bone-dry high desert of Arizona to see the fleshpots of Santa Clara for the 2014 Flash Memory Summit. StorageMojo’s Chief Analyst will be chairing Session U-2: Annual Update on Enterprise Flash Storage. That’s on Tuesday, August 5th, at 9:45.

Who knows, maybe there will be discussion of the latency vs data services controversy.

Looking forward to meeting the attendee from the Republic of San Marino, and finding out what they’ve been doing with flash.

Don’t be shy. Feel free to sidle up and say howdy. Since California’s gun laws are stricter than Arizona’s – any gun law would be stricter – some of the boys may be feeling a bit naked. But as long as you don’t look like a rattlesnake they’ll get over it.

The StorageMojo take
If you want product announcements and demos, go to VWworld. But if you want to know what’s happening behind the scenes in the industry, Flash Memory Summit is the place to be.

Courteous comments welcome, of course.

{ 0 comments }

The new storage industry rave

by Robin Harris on Thursday, 17 July, 2014

It used to be so simple: EMC, NetApp, Hitachi and the captive storage businesses of systems companies. Add in some fast running startups, such as today’s Nimble, Nutanix and Avere, to keep things interesting.

But no more. While the startups will require several more years before they make a dent in the big boy’s businesses, the cloud storage vendors are taking the joy out of high-end storage.

But wait, there’s more!
A group of new entrants are moving into the enterprise storage business and they promise to be even more exciting. Why? Because they are already large businesses with other revenue streams that can support an attack on entrenched competitors.

It’s called competition, grasshopper.
SanDisk, Western Digital and Seagate are all moving into the enterprise storage business. Then of course, there is the dark horse: Cisco.

To get a sense of the scale of the struggle it’s useful to compare EMC and NetApp to the newcomers.

EMC and their federated stovepipes have a combined market capitalization of $55 billion. And annual revenue of $23 billion.

NetApp has a current market cap of almost $12 billion based on revenue a little over $6 billion.

EMC’s growth has flatlined lately while NetApp is shrinking – and flailing.

Handicapping the race.
Western Digital and Seagate. Both have discovered the joys of high-margin storage systems. Neither is a marketing company but they do have strong brands.

10 years ago it would’ve been heresy for either company to compete with its major customers. But since its major customers are abandoning hard drives for SSDs and have no one else to buy disks from, Seagate and WD rightly figure they have nothing to lose. And they know how to play the commodity game much better than EMC.

Seagate has a $19 billion market cap on revenue of almost $14 billion. Western Digital has a market cap of $23 billion on revenue of $15 billion.

Due to the falloff in disk drive sales both companies are looking for new revenue sources and have already found success in low-end storage systems. It won’t take them long to realize they can do even better if they move up the food chain. But what to buy, since they aren’t going to build. Exablox? Panasas? Promise?

SanDisk’s market cap is $24 billion on revenue of $6.3 billion. Clearly, investors expect great things from SanDisk and their latest quarterly year-over-year revenue growth of almost 13% is part of the reason.

With their joint venture with Toshiba and their acquisition of Fusion-I/O they are well-positioned to continue to ride a market they helped invent: flash storage. Expect them to purchase an all-flash array company soon.

The Dark Horse rises
It seems inevitable that Cisco (market cap $133B; revenues $47B) will make a major move into storage: they already have servers; storage margins are excellent; and their revenue growth is slowing. They need to do something.

But what? Whiptail is a feature and not a game changer. NetApp is getting cheaper by the day, but retooling their failing product strategy would take years.

Besides, EMC has signaled, through the DSSD/Bechtolscheim acquisition, that they will take the fight to Cisco’s networking business if need be. Yet EMC holds the weaker hand: storage turmoil is greater than the network space; EMC is the smaller company; and Cisco’s market penetration is higher.

Cisco can also choose the time and place to start the fight. EMC has bigger issues.

The StorageMojo take
The fundamental dynamic is simple: the advent of commodity-based scale-out storage is killing the current high-gross margin storage business. Think the server business from 1985 to 1995.

Squeeze the gross margin dollars from 60% to 35% and something large has to give. In the mini-computer space it was Data General, Wang, Prime and DEC.

In storage?

Courteous comments welcome, of course.

{ 1 comment }