High performance SSDs: hot, hungry & sometimes slow

by Robin Harris on Friday, 25 July, 2014

Anyone looking at how flash SSDs have revolutionized power constrained mobile computing could be forgiven for thinking that all SSDs are power-efficient. But they’re not.

In a recent Usenix HotStorage ’14 paper Power, Energy and Thermal Considerations in SSD-Based I/O Acceleration researchers Jie Zhang, Mustafa Shihab and Myoungsoo Jung of UT Dallas examine what they call “many-resource” SSDs, those with multiple channels, cores and flash chips.

Their SSD taxonomy divides SSDs in terms of interfaces, cores, channels, flash chips and DRAM size. There’s no one metric that defines a many-resource SSD but the entire gestalt of the device. Here’s their breakdown:

SSD specs

The price of performance
Each flash die has limited bandwidth. Writes are slow. Wear must be leveled. ECC is required. DRAM buffers smooth out data flows. Controllers run code to manage all the tricks required to make an EEPROM look like a disk, only faster.

So the number of chips and channels in high performance SSDs has risen to achieve high bandwidth and low latency. Which takes power and creates heat.

Testing
They ran 3 different real SSD testbeds on a quad-core i7 system with an in-house power monitor and an application to capture detailed SSD info such as temperature. They tested both pristine and aged SSDs, running I/O sequential and random workloads that ran from 4KB to 4MB.

Key findings
The many-resource SSD exhibits several characteristics not usually associated with SSDs.

  • High temperatures. 150-210% higher than conventional SSDs, up to 182F.
  • High power. 2-7x the power, 282% higher for reads, up to 18w total.
  • Performance throttling. At 180F the many-resource SSD throttles performance by 16%, equivalent to hitting the write cliff.
  • Large write penalty. Writes at 64KB and above in aged devices caused the highest temperatures, presumably due to extra overhead for garbage collection and wear leveling.

Performance throttling was not limited to the high-end SSDs. The mid-range many-core drive slowed down at 170F, probably due to thermally-induced malfunction as the drive had no autonomic power adjustment.

The StorageMojo take
This appears to be the first in-depth analysis of the power, temperature and performance of a modern high-end SSD. The news should be cautionary for system architects.

For example, one new datacenter PCIe SSD is spec’d at 25w – higher than the paper found on slightly older drives. That’s twice what a 15k Seagate requires.

The slowdown seen for large writes suggests caution when configuring SSDs for write-intensive apps. Almost by definition the performance hit will come at the worst possible time.

StorageMojo commends the researchers for their work. It’s important that we have good data on actual today’s SSD behavior instead of impressions gained years ago with simpler and slower devices. If high-performance SSDs loom large in your planning the paper is well worth a read.

Courteous comments welcome, of course. What surprises you the most about this research?

{ 0 comments }

Performance: IOPS or latency or data services?

by Robin Harris on Wednesday, 23 July, 2014

Unpacking the data services vs performance metric debate. Why we should stop the IOPS wars and focus on latency.

IOPS is not that important for most data centers today because flash arrays are so much faster than the storage they replace. That’s why the first post was titled IOPS is not the key number.

The point of that post was that in the context of all flash arrays the greater benefit comes from lower latency, not more IOPS. Everyone agrees more IOPS aren’t much use once the needed threshold value is crossed. But lower latency is a lasting benefit.

The second post Data services more important than latency? Not! was more controversial. I was responding to a Twitter thread where an integrator CTO first asserted that customers don’t care about latency (true, but they should) and then questioned the datacenter savings due to flash performance.

My response: where has this guy been for the last 10 years? Hasn’t he noticed what flash has done to the market? Could he not wonder why?

What his tweets underscored is that we as an industry have done a poor job of giving customers the tools to understand latency in data center performance and economics. We clearly don’t understand it well ourselves.

Safety doesn’t sell
Compare this to auto safety. 50 years ago Detroit argued that “safety doesn’t sell” because consumers didn’t care about it. They fought seatbelt laws, eye level brake lights, head restraints, airbags and more because, they said, consumers don’t want to pay for them.

Today, of course, safety does sell. There are easily understood (and sometimes controversial) benchmarks for crash safety that make it easy for concerned consumers to make safety-related choices. Not all do, but clearly safety is a constant in mass-market car ads today, showing how much market sentiment has shifted as consumers understood it meant keeping their children, family and friends safer.

In regards to latency, the storage industry is where Detroit was 50 years ago. People like the CTO, who should know better, don’t.

The VMware lesson
VMware offers a more recent lesson. They offered a simple value proposition: use VMware and get rid of 80% of your servers.

That wasn’t entirely true, but it encapsulated an important point: you can save a lot of money. Oh, and there are some other neat features that come with VMs, like vMotion.

Give people a simple and compelling economic justification and they will change. But it has to be simple and verifiable.

Data services platform?
The rapid rise of the “data services platform” meme is a tribute to EMC’s marketing. Google it and you’ll see that until EMC’s SVP VMAX, Fidelma Russo wrote about it a couple of weeks ago, it wasn’t even a thing. Now we’re debating it.

Likewise, asserting that data services are more important than performance contravenes 30+ years experience with customers. Yes, data services are important – mostly because today’s storage is so failure prone – but give a customer a choice between fast enough and not fast enough with data services and you’ll quickly see their place in the pecking order.

EMC is changing the subject because the VMAX is an overpriced and underperforming dinosaur. Until they get the DSSD array integrated into the VMAX backend, it will remain underperforming.

The StorageMojo take
Is performance – thanks to flash arrays – a solved problem? Those who argue that flash arrays are fast enough for most data centers seem to think so. And they may be correct for a few years.

It’s easy to forget that we’ve had similar leaps in performance before, most notably when RAID arrays entered the scene almost 25 years ago. It took a few years for customers to start demanding more RAID performance.

What happened is what always happens: the rest of the architecture caught up with RAID performance. CPUs and networks got faster; applications more demanding; expectations higher.

Storage is still the long pole in the tent and will remain so for years, if not decades, to come. In the meantime we need to refocus customers from IOPS to latency.

How? A topic for future discussion.

Courteous comments welcome, of course.

{ 5 comments }

Hike blogging

by Robin Harris on Tuesday, 22 July, 2014

Sunday morning took the Brins Mesa trail loop.

Brin_loop_07-20-2014-1266

Sometimes people wonder if I ever get bored with the scenery. Not yet!

{ 0 comments }

Flash Memory Summit 2014

by Robin Harris on Sunday, 20 July, 2014

The entire StorageMojo analyst team will be saddling up and leaving the bone-dry high desert of Arizona to see the fleshpots of Santa Clara for the 2014 Flash Memory Summit. StorageMojo’s Chief Analyst will be chairing Session U-2: Annual Update on Enterprise Flash Storage. That’s on Tuesday, August 5th, at 9:45.

Who knows, maybe there will be discussion of the latency vs data services controversy.

Looking forward to meeting the attendee from the Republic of San Marino, and finding out what they’ve been doing with flash.

Don’t be shy. Feel free to sidle up and say howdy. Since California’s gun laws are stricter than Arizona’s – any gun law would be stricter – some of the boys may be feeling a bit naked. But as long as you don’t look like a rattlesnake they’ll get over it.

The StorageMojo take
If you want product announcements and demos, go to VWworld. But if you want to know what’s happening behind the scenes in the industry, Flash Memory Summit is the place to be.

Courteous comments welcome, of course.

{ 0 comments }

The new storage industry rave

by Robin Harris on Thursday, 17 July, 2014

It used to be so simple: EMC, NetApp, Hitachi and the captive storage businesses of systems companies. Add in some fast running startups, such as today’s Nimble, Nutanix and Avere, to keep things interesting.

But no more. While the startups will require several more years before they make a dent in the big boy’s businesses, the cloud storage vendors are taking the joy out of high-end storage.

But wait, there’s more!
A group of new entrants are moving into the enterprise storage business and they promise to be even more exciting. Why? Because they are already large businesses with other revenue streams that can support an attack on entrenched competitors.

It’s called competition, grasshopper.
SanDisk, Western Digital and Seagate are all moving into the enterprise storage business. Then of course, there is the dark horse: Cisco.

To get a sense of the scale of the struggle it’s useful to compare EMC and NetApp to the newcomers.

EMC and their federated stovepipes have a combined market capitalization of $55 billion. And annual revenue of $23 billion.

NetApp has a current market cap of almost $12 billion based on revenue a little over $6 billion.

EMC’s growth has flatlined lately while NetApp is shrinking – and flailing.

Handicapping the race.
Western Digital and Seagate. Both have discovered the joys of high-margin storage systems. Neither is a marketing company but they do have strong brands.

10 years ago it would’ve been heresy for either company to compete with its major customers. But since its major customers are abandoning hard drives for SSDs and have no one else to buy disks from, Seagate and WD rightly figure they have nothing to lose. And they know how to play the commodity game much better than EMC.

Seagate has a $19 billion market cap on revenue of almost $14 billion. Western Digital has a market cap of $23 billion on revenue of $15 billion.

Due to the falloff in disk drive sales both companies are looking for new revenue sources and have already found success in low-end storage systems. It won’t take them long to realize they can do even better if they move up the food chain. But what to buy, since they aren’t going to build. Exablox? Panasas? Promise?

SanDisk’s market cap is $24 billion on revenue of $6.3 billion. Clearly, investors expect great things from SanDisk and their latest quarterly year-over-year revenue growth of almost 13% is part of the reason.

With their joint venture with Toshiba and their acquisition of Fusion-I/O they are well-positioned to continue to ride a market they helped invent: flash storage. Expect them to purchase an all-flash array company soon.

The Dark Horse rises
It seems inevitable that Cisco (market cap $133B; revenues $47B) will make a major move into storage: they already have servers; storage margins are excellent; and their revenue growth is slowing. They need to do something.

But what? Whiptail is a feature and not a game changer. NetApp is getting cheaper by the day, but retooling their failing product strategy would take years.

Besides, EMC has signaled, through the DSSD/Bechtolscheim acquisition, that they will take the fight to Cisco’s networking business if need be. Yet EMC holds the weaker hand: storage turmoil is greater than the network space; EMC is the smaller company; and Cisco’s market penetration is higher.

Cisco can also choose the time and place to start the fight. EMC has bigger issues.

The StorageMojo take
The fundamental dynamic is simple: the advent of commodity-based scale-out storage is killing the current high-gross margin storage business. Think the server business from 1985 to 1995.

Squeeze the gross margin dollars from 60% to 35% and something large has to give. In the mini-computer space it was Data General, Wang, Prime and DEC.

In storage?

Courteous comments welcome, of course.

{ 1 comment }

Hike blog: Airport loop

by Robin Harris on Tuesday, 15 July, 2014

Walked over to the Sunset trail and thence up to Airport Mesa. This view is looking south from the east side of the mesa.

Airport_loop_07-15-2014-4026

This is my favorite time of year due to the monsoon – rainy season – clouds. Probably have a thunderstorm later today. And yes, there’s an airport on top of the mesa.

{ 2 comments }

Quasi-NVRAM using built-in battery backup

by Robin Harris on Monday, 14 July, 2014

Note, a version of this post appeared this morning on ZDnet.

Given that roughly a billion battery-powered computers are sold each year, you’d think that the engineers would be busy rearchitecting the storage stack to take advantage of built-in – and usually non-removable – battery backup. But no-o-o!

Recently three researchers, Hao Luo, Lei Tian and Hong Jiang of the University of Nebraska, asked why we don’t treat our mobile device DRAM as if it were nonvolatile? They designed, implemented and tested a prototype Android phone and, not surprisingly, got promising results.

Their paper, qNVRAM: quasi Non-Volatile RAM for Low Overhead Persistency Enforcement in Smartphones was presented at the latest Usenix HotStorage conference last month.

Background.
Typically Android mobile devices rely on SQLite, a shared preference key value store or filesystem APIs to save persistent data to local flash. These employ journaling or file level double writes to ensure persistency.

The motivation behind these techniques is that memory – DRAM – is volatile. Thus the multiple writes to storage, incurring substantial system overhead in devices that are already performance and power constrained.

For example, they found that more than 75% of Twitter data was being written for persistency reasons. Looking at a group of common mobile apps they found that anywhere from 37 percent to 78 percent of the data writes were for atomicity.

Here’s a graph of what they found:

cost_of_persistency
Screen Shot 2014-07-14 at 2.48.07 PM

Of course, battery-backed hardware does no good if the system software is often crashing. They found that Android kernel reliability is quite good, based on bug fixes and user support calls.

They analyzed Android issue reports and found that only 10 reports or 0.05% of all 19,670 reported issues related to Android defects were unexpected or random power-off. That implies a small chance that unexpected power failure may occur.

Proto design.
The researchers constructed a prototype test system with several innovations.

Quasi-NVRAM. They set aside a portion of system DRAM to act as a battery backed up nonvolatile DRAM.
Device driver. A new device driver and library that manage I/O between the qNVRAM and system flash memory.
Persistent Page Cache. A new data structure in SQLite using quasi-NVRAM to perform in-place updates to the database files.
Relaxed data flushing. LazyFlush absorbs repeated writes to table files to further reduce I/O.

Here’s the architecture:

qnvram_architecture

Results.
Implemented on an Android smartphone they found that if performed entirely in-memory,

. . . qNVRAM speeds up the insert, update and delete transactions by up to 16.33x, 15.86x and 15.76x respectively, using both.

Furthermore, the amount of data committed to flash was reduced by about 40 percent. Given how common constant feed updates are on mobile devices, this is a significant result.

The StorageMojo take
Give the many complaints about smartphones that don’t have removable batteries, you’d think someone would have looked at this before. This research shows the upside of non-removable batteries: all DRAM can be treated as NVRAM.

Of course, qNVRAM can’t replace flash. DRAM is more power-hungry and costly than flash.

But by reducing I/O overhead qNVRAM shows significant gains in performance – and presumably battery life – can be achieved at little cost. It also simplifies the problem of extending flash endurance which may have important knock-on effects – such as enabling wider use of three-level cells.

I wish the researchers had also documented the overall impact of qNVRAM on system performance and battery life. Hopefully that’s next on the list.

It was obvious five years ago that the advent of non-removable batteries on phones and notebooks suggested a clean-sheet approach to persistency mechanisms. Congratulations to the researchers for taking a rigorous look at the problem.

Courteous comments welcome, of course.

{ 0 comments }

EMC buying TwinStrata to put on VMAX.

The StorageMojo take
First of all I hope Nicos and John made out like bandits. Not likely without a bidding war – I haven’t heard of one – but they did a good job building TS.

One of the attractions for EMC is that TS supports many clouds. The master of lock-in gives customers a choice when it won’t get the business anyway.

The deeper message: we can’t compete with cloud storage, so we’ve decided to make money supporting it. Expect value-added features that only work on VMAX, probably in cooperation with Google and Microsoft.

But what about the rest of the industry? It’s a good thing. EMC validation means that customers who wouldn’t consider the idea will now have to go out and look at competitors.

Six months from now cloud gateway will be on every checklist. Why not?

Thus companies who already offer gateways, like Avere, stand to gain. Their value proposition is so much more than cheaper cloud storage. If they can get a prospect to sit still for 15 minutes they have an opportunity.

No doubt about it, cloud storage is in the mainstream. In case you hadn’t heard.

Courteous comments welcome, of course. I’ve done work for TwinStrata in the past and am working on a project for them now. At least I hope I still am.

{ 1 comment }