StorageMojo





Robin Harris    


The top storage stories of 2008

The world of data storage is changing faster than it has since the mid-90’s amid the rise of hardware arrays and storage networks. Looking back 2008 will be seen as a pivotal year. The big news, in rough ascending order:

FCoE
Though production-ready products are still in the future, the broad vendor embrace of Fibre Channel over Ethernet signaled the beginning of the end for the Fibre Channel physical layer. The storage companies who profited from a decade of Balkanizing the storage network market will have Cisco calling the shots.

Brocade, in particular, needs good strategy advice. Maybe one of these days they’ll get it.

Blu-ray tanks
Call me old-fashioned, but I have a soft spot for removable media. So I’m sorry to see Sony screw the pooch with Blu-ray’s big-studio-friendly licensing and hate-the-customer DRM. And sinking the PS3 as well.

The good news: you can put HD content on a standard DVD - just not as much; and there’s an upscaling dvd player - the Oppo Digital DV-983H that upscales ordinary DVDs to near Blu-ray quality. Yes, even better than the upscaling on a Blu-ray player. One less reason to pay the Blu-ray tax.

2.5″ drives
Rumor has it that Seagate is designing its last generation of 3.5″ drives, which augurs the switch to SFF in desktop and enterprise systems. 3 years ago 2.5″ drives were 1/5th the capacity; today the gap is 1/3 the capacity and a much smaller price differential.

At some point it will occur to Seagate’s top management that 1.8″ and 2.5″ drives are the disk industry’s best answer to flash. Now, if Seagate were in the I/O business, it would be a different story.

Zero-maintenance storage
Xiotech and Atrato introduced storage boxes that guarantee capacity, performance and uptime with no maintenance for 5 and 3 years respectively. These are storage game-changers.

That Seagate sold ISE to Xiotech after spending years developing it has to be one of their biggest blunders ever, several notches above buying Xiotech in the first place. The ISE is, in effect, a super disk that Seagate could have sold to all its enterprise disk customers.

Flash
2008 is the year that every major vendor - with the laudable exception of laser-focused WD - announced alliances and/or plans to enter the flash drive market. High-end SSDs will displace 15k high-end disks in the next 3 years.

But flash-in-disk-clothing is the near/medium-term solution. Fusion-io and Violin are on the winning architectural track. Flash belongs between the CPU and disk layers: that’s where we’ll get the most benefit for the added cost.

Hey, disk vendors: want to stick it to Intel, Micron and Samsung? Buy one of them. You are in the I/O business, not the disk business.

Commodity-based cluster storage
EMC’s Atmos, HP’s Extreme Storage 9100 and IBM’s XIV are commodity-based cluster storage. The important thing is the storage mainstream has embraced storage clusters based on commodity hardware and mostly open-source software. That’s what Google did years ago and soon many companies will.

Yes commodity hardware saves real money, as I and Bill Mottram of Data Mobility Group found out when we ran the numbers on HP’s 9100 vs Isilon, NetApp and Sun. We’ll see if Atmos is on the latest EMC price list when I do the updates later this month.

The StorageMojo take
2009 will be a great year for the hungry and flexible. The ongoing financial train wreck is trouble for Big Iron fans in the data center.

Fortunately, help is on the way. Look for my 2009 forecast before the end of 2009.

Courteous comments welcome, of course. Of the companies mentioned I’ve done work for HP and Fusion-io.

Flash and the new storage pyramid

December 4th, 2008 by Robin Harris in Architecture, Enterprise, Future Tech, SSD/Flash Disk

I got a note from David Flynn, co-founder and CTO of Fusion-io (disclosure: I’ve done work for them) in response to The new storage pyramid. He makes several points about the nature of the array model that I wish I’d made.

Well worth the read.

David Flynn’s note:
Geat analysis Robin.

And, great comments.

My $.02 ….

I think it’s not just about the proprietary nature, the somewhat better performance and features, and the high markups that differentiates “storage arrays” from “clustered storage”.

It’s actually more to do with the vertically integrated nature of the business model of the companies in the array building business. This leads to proprietary architectures, higher margins and, true, somewhat better performance and features.

Let me explain through an analogy…

We used to get graphics workstations from SGI, Apollo, and other vertically integrated vendors, who sold everything end-to-end, down to the monitors and their own proprietary OS’s. These guys commanded HUGE margins - partly to reward their risky investment in solving a worthy, complex problem.

Similarly, the military (and other few others who could afford a million dollar price-tag) used to get flight simulators from Evans&Sutherlands who were also vertically integrated and insanely expensive. You even had niche vendors like Intergraph doing 3D graphics information systems who could justify their own proprietary architectures.

At least for a while.

They were all doing 3D graphics in one form or another. And, now, they are all GONE - thanks to the emergence of a component, the 3D graphics card.

With enough capability to be applicable across all of these different verticals, the 3D graphics accelerator has now shattered the benefit of running a vertically integrated business.

Today, there are myriads of “integrators” who make graphics workstations, flight simulators, GIS systems, etc. at very low margin by comparison. And, they do it by pulling together off-the-shelf components - all commoditized down to the software that provides even the high-value features.

They might have been inferior to the proprietary solutions at first, but not anymore.

Now, what happens when you introduce to the storage industry a component that commoditizes and trivializes the linch-pin reason for expensive proprietary disk arrays, namely the caching tier - using NAND flash.

Once anyone can easily get the performance across any use case (OLTP, OLAP, Data Warehousing, BI, VOD, content caching, etc. etc.) you no longer need vertical specific, highly tuned, proprietary solutions from vertically integrated companies.

Every capability that doesn’t migrate into the component itself becomes nothing but commoditized software to be layered on top by any number of interchangeable integrators. Things like replication, disaster recover, backup, dedup, and so on just become commoditized software that can run anywhere.

This is a classic Adam Smithian market evolution. What used to be a single, vertically integrated provider becomes a layered market where some people build the components, others integrate them (with some bit of value add), and you go to having many players competing on many levels.

And prices go down.

But, thankfully, (for those of us in the business of creating this componentized building-block) volume, productivity, and efficiencies all go up.

So, actually everyone wins. Including society as a whole.

Well, almost everyone wins. Everyone, that is, except for the proprietary array vendors who get caught by the innovators dilemma and a business model that used to be the correct one, but no longer is.

This generally makes them the slowest to simplify their proprietary infrastructures around the commoditized component - to help justify their investment into their heroic proprietary solutions.

In an effort to protect their margins, they endeavor to make things seem as complicated as possible. They do this, say, by preferring that NAND be forced to pretend to be an HDD and be put into HDD drive bays behind HDD protocols, where it has little ability to simplify things or get much additional performance.

They are the last to come out and say it can be simplified. Instead they’ll tell you you must have features X, Y, Z. And, see, those aren’t as good as with our proven architecture.

Let’s take high availability as an example. They aren’t going to tell you that a “shared nothing” strategy - where two separate RDBMS servers with terabytes of direct attached NAND inside of each use off-the-shelf log-shipping for asynchronous replication (or query replication to do it synchronously) to get fault tolerance.

No, they aren’t going to tell you that it’s actually simpler, more cost effective, and, here’s the real kicker… more fault tolerant to share nothing, than to use shared storage - no matter how fault tolerant they claim their monolithic storage array is, it’s still shared.

I’m not saying this market transformation is going to happen by tomorrow. But, given the geometric growth of the performance gap between processors and storage, and the geometric decline in cost of NAND flash - leading to a “Moore’s Law Squared” effect in the benefit to cost ratio - it is going to happen faster than people would think. Even considering the “stodgy” nature of storage folks who are in the business of obsessively caring for precious bits.

It doesn’t hurt that in this global recession companies are looking for ways to reduce costs while still needing to grow throughput. So, there’s more of a willingness to look at different, innovative ways to skin the cat.

I agree with you Robin. It will be a fait accompli by 2015.

David Flynn
CTO, Fusion-io

The StorageMojo take
Technology diffusion is a complex mashup of secular trends, technology development, individual creativity and happenstance. But the current direction of the high-end storage market points to the greatest change we’ve seen since the early 90’s and the advent of arrays.

The “Moore’s Law Squared” effect is particularly intriguing. Humans are terrible at estimating the impact of power functions, so this one is likely to be even more surprising than we dream.

Courteous comments welcome, of course.

Stupid storage failures

November 25th, 2008 by Robin Harris in Architecture, Disk, SSD/Flash Disk

Valiant but doomed
The ZFS discussion thread had an interesting comment from Sun’s Jeff Bonwick, architect of ZFS, on storage device failure modes. How do you know a disk or a tape has failed?

You don’t. You wait, while the milliseconds stretch into seconds and maybe even minutes. Jeff states the problem - and Sun’s solution - this way:

. . . we’re trying to provide increasingly optimal behavior given a collection of devices whose failure modes are largely ill-defined. (Is the disk dead or just slow? Gone or just temporarily disconnected? Does this burst of bad sectors indicate catastrophic failure, or just localized media errors?) . . . there’s a lot of work underway to model the physical topology of the hardware, gather telemetry from the devices, the enclosures, the environmental sensors etc, so that we can generate an accurate FMA [Fault Management Architecture] fault diagnosis and then tell ZFS to take appropriate action.

With all due respect to Jeff, that solution seems iffy: how will you ever keep up with all the devices and firmware levels needed to make that work?

A community of prima donnas
There are lots of messy failure modes in computer systems. The literature around the Byzantine Generals Problem (Wikipedia - for a rigorous treatment download The Byzantine Generals Problem by L. Lamport et.al) tackles the problem of the malicious server in a community of network servers. That is a hard problem.

Knowing whether a storage device is alive, dead or only sleeping shouldn’t be so hard. They have powerful 32-bit processors - more powerful than a VAX 780 - and lots of statistics on what the drive is doing.

It seems like a disk could give a modulated heartbeat signal to drivers - “ready” “reboot” “caught in retry hell” “dead” - to decrease uncertainty.

The StorageMojo take
Drive vendors may think that non-standards for drive condition reporting are a form of lock-in, but that misses the bigger picture: the quality and timeliness of condition reports - even with a standard format - would be a competitive differentiator.

At the margin it would help slow the move to commodity-based cluster storage by enabling array vendors to improve their error handling and perceived reliability. It would also help disks versus flash SSDs, whose perceived reliability is partly due to the gap between user-judged drive “failures” and vendor “no trouble found” test results.

Storage systems all know how to deal with disk failures - they have to. So drive vendors, how about getting together to help make knowing a drive’s status a lot easier? Hey, IDEMA, make yourself useful!

Courteous comments welcome, of course.

Flash isn’t tier zero

August 13th, 2008 by Robin Harris in Architecture, Disk, SSD/Flash Disk

A panel discussion on enterprise SSDs at the Flash Memory Summit came to an almost unanimous conclusion: NAND flash is best seen as an extension to DRAM and a layer between DRAM and disk - not as the guts of a disk drive replacement.

I don’t think the guy from Seagate agreed.

Since I was on the panel, my recollections have to be taken with grain of salt. But I was trying to resist the group think that too many panels fall prey to. Yet I agreed with the result.

Price changes everything
StorageMojo has reported at length on the problems of making a big, quirky EEPROM look like a disk. Flash doesn’t look much like DRAM either, but the two are cousins.

In the last few years price has altered the landscape. On today’s spot market a Gbit of DRAM is 7-10x of a Gbit of MLC NAND.

That wasn’t the case 3 years ago, so substituting flash for DRAM made no sense.

The market resistance to flash drives is because flash costs more than disk. Not a problem when augmenting DRAM.

The performance fit
Disks are millisecond devices; DRAM DIMMs are nanosecond devices; and NAND chips are microsecond devices.

More than once it was suggested that maybe it is time to bring back the 3600 RPM drive. Optimized for capacity, power and long life, it would be a good complement to servers with several hundred GB of flash.

The StorageMojo take
Flash as a new storage layer between DRAM and disk just sounds more logical than flash-as-a-disk-like product. Let disks be disks!

And flash be flash.

Courteous comments welcome, of course. More on this topic later. Stay tuned.

StorageMojo at Flash Memory Summit

August 9th, 2008 by Robin Harris in SSD/Flash Disk

If you are attending the Flash Memory Summit in Santa Clara on Tuesday and Wednesday please say hello. Tuesday morning I will be sprinting between my two concurrent sessions.

In Forum F1B: Laptop Design session I’ll be giving a 25 minute presentation titled “Can The Flash Consumer SSD Be Saved?” In “Flash in Enterprise Storage Systems” a panel will hold forth on the promise of enterprise/solid state disks.

For reasons regular readers will appreciate, the latter should be more interesting.

The StorageMojo take
The summit will also have vendors showing their wares. I’m hoping to see some creative work.

The first thought with new technologies is to replicate what we already have. The real benefits from flash will come as we rethink the old architectures.

Courteous comments and questions welcome, of course.

Samsung follows StorageMojo’s lead - finally!

July 24th, 2008 by Robin Harris in SSD/Flash Disk

Samsung announced a 500,000 R/W cycle on their server-grade NAND flash. I thought that was pretty smart - even though the “several month” project didn’t sound like it involved a lot of engineering.

Then Flash analyst extraordinaire Jim Handy, who runs Objective Analysis, saw my post on ZDnet where I talked about the announcement. He reminded me that in October ‘06 I’d written

. . . the cells are actually good for closer to a million read/write cycles. If true, Samsung is silly not to adjust their spec upwards, even to 250k. Engineers can be their own worst enemies sometimes when it comes to promoting a cool new product.

A mere 21 months later Samsung got with the StorageMojo program.

The StorageMojo take
Better late than never.

Samsung knew sooner than I did that flash has some serious deficits as a storage medium. The 500k “server-grade” moniker is a way of attacking one of those deficits - longevity - in a way that should reassure customers and increase margins.

What Samsung has lost is 2 years in building customer awareness of server-grade flash. Now it looks more reactive than proactive. Not a bad move, but sooner would have been better.

Comments welcome, of course.

Design Tradeoffs for SSD Performance

July 15th, 2008 by Robin Harris in Architecture, Future Tech, SSD/Flash Disk

A new Usenix paper looks at NAND flash SSD performance. From a team at Microsoft Research and the University of Wisconsin, including Ted Wobber who worked on last year’s A Design for High-Performance Flash Disks [see Flash chance for the StorageMojo take on that excellent paper - a post Ted was kind enough to review and comment on].

Design Tradeoffs for SSD Performance (by Nitin Agrawal, Vijayan Prabhakaran, Ted Wobber, John D. Davis, Mark Manasse and Rina Panigrahy) makes a deep dive in flash translation layer (FTL) issues. As the authors note, flash vendors keep their FTL designs secret, so the team developed a NAND flash simulator to look at how design choices affected performance.

What they found
They ran several workloads on their trace-based simulator, including TPC-C, Exchange and some file system benchmarks. They found several critical issues in SSD design.

  • Data placement Needed for wear leveling and load balancing.
  • Parallelism Single flash chips aren’t very fast so they need to work together.
  • Write ordering Small random writes are a killer.
  • Workload management You can optimize for sequential or random workloads, but managing both well is hard.

Canonical part
The paper’s discussion of flash memory is based on the spec for Samsung’s K9XXG08UXM 4 GB Single Level Cell (SLC) package. Other parts may differ, but NAND physics are the basic challenge.

The Samsung part has 2 2 GB dies (chips) in the package. Each die has 8192 blocks - a block is 64 4 KB pages - organized into 4 planes of 2048 blocks. The dies can be addressed independently, while cross-plane operations are limited to planes 0 & 1 or 2 & 3. Each page has 128 bytes for metadata.

Cross-plane operations are a form of parallelism. The Samsung part also provides a copy-back operation so one page can be copied to another without transporting the data off of the die. Copy-back is limited to copies within the same flash plane of 2048 blocks.

Expensive writes
NAND flash is a type of EEPROM. About the only characteristics it shares with disks are block structure and persistence. To write - or as the flash guys say program - it must first be erased. And you can’t just erase a 4 KB page - you have to erase an entire block.

An erase operation takes 1.5ms, making it considerably more expensive than a read or a write. To maintain a supply of empty blocks a cleaning process - garbage collection - runs when the free block supply gets low.

SLC flash is good for about 100,000 writes, so not only do you have to manage the full block erasure problem, but you also have to manage the life span of each block - the wear-leveling problem.

[Wear-leveling will become even more acute with next-gen 3 and 4 level cells. Speculation is that the write spec could drop as low as 1,000 per cell.]

Here is a table of the operational flash parameters for the Samsung part from the paper:

SSD controller architecture
The flash packages of course are only the building blocks of an SST. Much of the magic comes from the architecture and optimizations of the SSP controller logic. This is a generalized block diagram for an SSD controller:

Key elements:

  • Host interconnect SATA, USB, FC, PCI-e
  • Buffer management for pending and satisfied requests.
  • Multiplexer to manage instruction and data transport along the serial connections to the flash packages.
  • Processor to manage request flow and mappings from the logical block address to physical flash locations.
  • RAM for the processor.

On a cheap USB thumb drive all these elements may be integrated into a single chip. On a high-performance fiber Channel SSD these elements may be separated on their own PC board.

The size of the flash packages also has an impact on cost and architecture. A 32 GB SSD build with the Samsung parts would require 136 pins at the controller. Larger SSDs may not have enough pins for full interconnection between the controller and the flash packages, requiring additional engineering trade-offs.

Faking it
Borrowing a simulator, DiskSim from Garth Gibson’s Parallel Data Lab at CMU, the team modified it to reflect SSD latency and architecture. Features unique to SSDs, such as multiple request queues, logical block maps, cleaning and wear-leveling states were added.

Workloads
They used a collection of workload traces they named TPC-C, Exchange, IOzone and Postmark, as well as a group of microbenchmarks generated by DiskSim.

The TPC-C trace came from a large-scale configuration comprising 14 HP MSA1500 FC controllers supporting 28 36 GB disks. Exemplifying the current high-end OLTP problem, each controller had over a terabyte of disk, but the benchmark used only 160 GB of that capacity.

The Exchange server was similarly over-configured with 6 RAID controllers each running 1 TB capacity, while the 15 minute trace utilized only 250 GB of that with a 3 reads for every 2 writes workload.

Microbenchmarks
These were run using 4 KB I/Os. With cleaning enabled the write operations include the extra overhead. Sequential I/Os have less cleaning overhead. Note cleaning has a ~30% hit to the random write rate.

Trade-off summary
The researchers looked at several design techniques:

  • large allocation pool
  • large page size
  • over provisioning
  • ganging
  • striping

These deserve some explanation.

A large allocation pool is convenient for achieving performance, but there is a cost. If the page size is small, there is more overhead of managing the pages.

If the page size is large, it is easier to manage the pages, but writes smaller than the page size require a read-modify-write operation, which kills performance.

Over provisioning reduces the cleaning overhead, at the cost of more expensive storage.

Ganging requires more explanation. A flash package is made of one or more dies or chips. The serial interface to the flash packages is a primary bottleneck for SSD performance. Spreading a write across multiple serial interfaces is an obvious way to improve performance. The cost comes in the interconnect density between the packages and the dies.

If a write can be interleaved across multiple flash packages, read or write bandwidth can be substantially improved. The ability to place multiple packages in an SSD, and to interleave operations across those packages, is key to the performance improvements that SSD vendors have been advertising.

The StorageMojo take
This paper is too rich in detail to summarize well. If understanding SSD controller design is important there is no substitute for a careful read.

The net is that engineers have many options in configuring and managing flash devices inside a solid state disk. The interaction of these design choices with applications is likely to remain a fruitful area of study for years to come.

Expect to see many performance oddities as new solid state disk designs are released. This is a different world than disk drives. There is much innovation and much to learn.

A macro longer-term trade-off is the extent to which SSD vendors should attempt to alter operating system behavior to better match SSDs. In the short term designers must conform to today’s disk I/O oriented operating systems. In the long term however, there must be major opportunities to tweak operating systems to enhance solid-state disk performance.

For this reason SSDs is may find their best short term market to be inside storage arrays where array vendors have complete control over the interface to the array software. This will be no small advantage as array vendors struggle to remain relevant in a world where high performance solid state disks have the potential to replace midsize arrays.

Comments welcome, of course.

Update:
Ted Wobber kindly wrote in with a comment I’m reproducing in full, since he does a better job of getting to the heart of the matter than I did:

I think the bottom line is that flash devices are a lot more complicated than you might think they would be. At first glance, the conventional wisdom is that something constructed out of solid-state circuitry should be fundamentally simpler than a device with very small parts moving at high speed. However, you have to remember that NAND-flash is built on quantum tunneling, and while the software layers that build up from there don’t involve advanced physics, the properties of the medium create complexities and tradeoffs that might not be expected.

We don’t talk with SSD vendors at a great level of detail since we’d prefer not to be under NDA unless there is a good reason. However, informal discussions and other materials I’ve seen have convinced me that our evaluation of the state of affairs isn’t far from the truth. It’s my opinion that most manufacturers are well aware of these sorts of tradeoffs, and they carefully consider them along with the requirements of their target markets and cost structures. The point of our article was to talk about these tradeoffs in an academic forum unconstrained by IP issues, and to begin to tease apart the tangle of related issues.

In sum, SSDs constitute a marvelous step forward and are really useful in many applications. However, they are not a panacea, at least not yet.

/Ted

Thank you, Ted.

Testing, testing, 1 2 3 . . .

July 7th, 2008 by Robin Harris in Architecture, Disk, SSD/Flash Disk

George Ou weighs in
Many good points have been made about the problems with the Tom’s Hardware flash SSD tests. My former colleague George Ou, late of ZDnet, weighed in with an excellent summary of the TH testing problems:

The tests are very flawed.  If you read the results, the SSDs with the worst power consumption aren’t the ones getting the worst battery life.  The ones with great performance and above average power consumption turn out to be the worst on battery life WITH THE TEST THEY RAN.
 
What this says is that Tomshardware’s measurements weren’t wrong, but what they were measuring was wrong.
 
The load test was not well controlled.  The SSDs with great performance allowed the benchmark to run faster which cranked the CPU more.  The difference in the CPU state is what explains the discrepancy in their data.
 
A proper measurement would have done a fixed amount of CPU work and a fixed amount of storage work and then you can see how long the battery lasts.  They could have simply played a movie off the storage system and let it play until the battery died.  Videos are great because they’re fixed computational workload and fixed storage workload.
 
This is yet another example of bad science by Tomshardware.

I don’t buy the “play a movie” test - that only tests playing a movie - but I do accept that Tom’s Hardware didn’t do a great job of testing. So what?

I’ll be returning to the testing issues shortly - after pausing for this disclosure.

Disclosure: I’m biased towards notebook flash drives
Unlike, AFAIK, any of the commenters - pro or con - I used a flash-based Windows notebook every day for 5 years and loved it. It had a 10 hour battery life, a full-size keyboard and a sleep mode that really worked. Bliss!

I also paid an extra 20% - $400 back when the dollar was worth something - for the dinky 10 MB CF card it used. It was worth every penny.

Based on my sample size of 1 (me) here’s WHY it was worth an extra 20%:

  • Battery life. The Omnibook 300 went from 5 hours to 10 hours of battery life with flash.

Factors that didn’t matter:

  • Performance: I never compared the disk to the flash, but the performance was “good enough” with either.
  • Durability: nobody gets 5 years out of a notebook drive, but crashing wasn’t a liability since all docs were copied to an external system.
  • Boot up time: sleep mode worked perfectly, so I’d reboot once a month at most. I did not care about boot time.
  • Multi-media workloads: while I agree with George that a video provides a good fixed workload, notebook SSDs are aimed at business travelers whose workloads commonly allow drives to spin down. But this is a topic that deserves a deeper look.
  • Capacity. The Omnibook had a compression utility that effectively doubled capacity to 20 MB. But it was easy to copy stuff off the ‘book - Laplink - so it never felt cramped.

Those are my biases. They may or may not be the biases of Mr. Road Warrior - but I suspect they are close. End disclosure.

Testing, testing, testing
Performance testing is a black art. That’s why test driving applications remains popular: there are so many variables that predictions based on benchmarks are close to useless.

Because of that I prefer to look at the preponderance of evidence rather than a single benchmark or set of tests. More data points paint a clearer picture.

For example, the single most positive SSD test I’ve found is Anandtech’s MacBook Air SSD. The similar results of another test is here.

Battery Life Test (H:MM) 80GB 4200RPM HDD 64GB SSD % Improvement
Wireless Internet + MP3 4:16 4:59 16.8%
DVD Playback 3:25 3:56 15.1%
Heavy Downloading + XviD + Web Browsing 2:26 2:42 11.0%

Bottom line best case: a 17% improvement. Not zero but not, as most reviewers concluded, enough to justify the price.

Ars Technica also reviewed the MBA SSD and had mixed results. They concluded:

. . . I had high hopes for the battery life on the SSD model. Unfortunately, I was met with only moderate gains when there were any at all.

More Anandtech
Anandtech also tested a high-end Memoright SSD in a high-end MacBook Pro. Here are their results:

Battery Life in Hours (Higher is Better) MacBook Pro (Hitachi 5400RPM) MacBook Pro (Memoright SSD)
Wireless Internet Browsing + MP3 Playback 5.13 hours 5.0 hours
DVD Playback 3.88 hours 3.58 hours
Heavy Downloading + XviD Playback + Web Browsing 3.38 hours 3.37 hours

The StorageMojo take
All workload testing is a compromise - but the preponderance of the evidence is clear: significant - i.e. 40% or better - notebook power advantages just aren’t there. UMPCs that can’t afford a disk - flash will win. Notebooks? Hasta la vista, baby.

The one SSD advantage that is yet to be debunked is durability. Someone made a case that just the maintenance advantages alone justify SSDs for enterprise notebooks. And it may be that simple.

Yet even there, the issues of hard CapEx dollars against softer expense dollars will work against SSDs.

Maybe the next gen of flash controllers will solve all the problems and usher in the age of flash storage everywhere. But piddly 20-30 minute gains for an extra $300 bucks won’t do it.

Comments welcome, of course. Just so everyone knows: I haven’t done any work in the last few years for either flash drive or disk drive vendors. I wish them both the best.

Notebook SSDs are dead

July 2nd, 2008 by Robin Harris in Disk, Future Tech, SSD/Flash Disk

It’s all over but the shouting
The scoop: the gap between notebook SSD promise and performance has been growing steadily. Now a review in Tom’s Hardware puts the final nail in the coffin. The title says it all:

The SSD Power Consumption Hoax : Flash SSDs Don’t Improve Your Notebook Battery Runtime – they Reduce It

By as much as an hour. A winner with the stupid high-end notebook demographic. The Paris Hilton market.

Ouch. Oops. Who knew?

Or who should have known?

Details
There’s a longer piece with some detail at Storage Bits but here’s the summary:

  • A Crucial SSD - costing $25/GB - used more power - 1.6 W at idle - than any 2.5″ notebook drive requires.
  • A Memoright 32 GB drive used a full 2 W at idle
  • An Mtron 32 GB flash drive reduced battery life by almost an hour.
  • The slowest drive - a year old Sandisk SSD 5000 - almost equaled the Hitachi 7200 RPM Travelstar’s energy use. But the SSD offers fewer IOPS than the hard drive!
  • They tested against a 200 GB Hitachi Travelstar 7k200, but other 2.5″ 7200 RPM drives have similar power envelopes.

And, of course, a 5400 RPM drive is more efficient. And a 160 GB 1.8″ drive is even more efficient, roomier and cheaper than any of the SSDs TH tested.

My guess on the not-easily-or-quickly-fixed culprit? The flash control logic - disk translation layer - needs cycles for wear leveling, garbage collection, buffer and cache management, flash mux/demux and the SATA interface - with frequent background operations even when the drive is idle.

And don’t forget the 20 volts required to write a cell.

Tom’s singles out Crucial for special mention:

Users who purchase this drive because of Crucial’s statements such as “low power consumption” and the product being ideal for “users who want longer battery life” will most likely be disappointed. While the total battery runtime certainly depends on the workload — we used Mobilemark 07 — the minimum and maximum power consumption measurements prove that Crucial’s statements of low power consumption are in fact wrong: 1.6 W idle power is more than any 2.5” notebook hard drive requires.

Did anyone even think to check the facts? At least one engineer had to know - and he told someone.

What’s the dynamic?
Some will say I’m premature, like when I said HD DVD was dead a year ago. But think about the market dynamic:

  • Cool but costly new technology needs early adopters
  • Based on the marketing, hip high-end adopter spring for costly status symbol with claimed road-warrior features
  • But the supposed advantages don’t exist, so the early adopters feel like chumps
  • Word of mouth stops. Who wants to admit they were suckered?
  • Notebook SSDs slip into obscurity as enterprise and very low-end SSDs move into the spotlight

Making early investors/adopters look stupid is not a winning strategy.

The StorageMojo take
The notebook SSD vendors have dug themselves a very deep hole. How to fix?

  1. Stop digging. A month in detox would help. Some encounter group time with the HD DVD folks.
  2. Form a serious performance consortium and get real about performance, power and longevity.
  3. Do the hard work of getting notebook operating systems better optimized for flash. Use Linux and OS X to beat Microsoft into some semblance of cooperation. Do the engineering for Apple - they’re open source, right? If Apple does it, it’s cool - and you need cool.

What the SSD guys will do:

  • Deny and obfuscate. “Not representative. Slanted. Unfair. Conspiracy.”
  • Claim next gen will fix all problems.
  • Performance, performance, performance. Which is a weak reed as well.
  • Point to cost curves show that, without a doubt, flash overtakes disk in 5 years.

And then hope the smart, techy, affluent road warrior demographic has a short memory. Good luck with that.

Comments welcome, of course.

EMC: flash replaces high-end disks in 2010

May 19th, 2008 by Robin Harris in Disk, SSD/Flash Disk

Greetings from Las Vegas
And EMC World 2008.

Dave Donatelli, president of EMC’s storage business, presented to the press room this morning. His most interesting statement was that flash drives will have cost-parity with, and therefore replace, high-end rotating magnetic disks, by the end of 2010.

Let’s run some numbers
Dave said that EMC has measured STEC’s flash drives at 30x the IOPS of a high-end disk with sub-millisecond access times. That alone would justify a premium over existing drives. He also said that the performance of the flash drive was better under load. A double win.

A 15k 74 GB Seagate SAS drive is about $175 or roughly $2/GB. A 2 GB Single Level Cell (SLC) flash chip is currently about $8/GB on the flash spot market. If flash keeps dropping at 50% a year they’ll be where the current disk price is in mid-2010.

But that’s raw chip vs finished disk
The remaining question is how much does the chip controller and other infrastructure cost? STEC isn’t selling its 74 GB flash drives for $8/GB - $80/GB is closer to the mark. Volume should amortize their engineering costs. PC boards are cheap.

That leaves the flash translation layer. That should fit nicely on an FPGA and, once the bugs are out, on an ASIC. The 1st ASIC is expensive; the 100,000th is cheap.

The StorageMojo take
Flash drives don’t need absolute price parity to win against high-end FC drives. Getting within 30% should do it for most people. Their performance advantages are worth at least that.

Of course the drive vendors aren’t going to sit still. They can pull several levers before breaking the glass for the big red one labeled “margin.” Many have claimed disks are dead and they’re all gone.

But this looks serious. High-end drives are a small piece by units, but their high margins would be sorely missed.

Comments welcome. This isn’t about notebook disks which are currently less than $0.40/GB and headed down much faster than FC and SAS drives.

This is StorageMojo’s 500th post! Thank you, thank you.

NAND - an engineer’s perspective, pt zwei

May 12th, 2008 by Robin Harris in Architecture, SSD/Flash Disk

Herewith continues NAND - an engineer’s perspective.

Any you thought marketing guys were wordy! The quoted bits are from the earlier StorageMojo post Notebook flash SSD market: fantasy or mirage?. Teil eins ist hier.

Begin part zwei

. . . tested application performance hardly changes either . . . .

Actually, this makes sense.  If you are accessing 4k of data, then both HDD and SSD are both fast enough and you don’t care.  If you are accessing a 1MB file, then that is 256 x 4k sector accesses, and the sectors will be laid out one after the other, which is where HDDs perform well.  SSDs will shine when you need to do 256 x 4k sector accesses, and the sectors you are accessing are scattered across the disk, but as far as I know this access pattern is not common except on servers.

And what about the 4-bit MLC that Toshiba is counting on to drive costs down?

I’m a NAND flash fan, but this is scary stuff for me.  To store 1 bit in a bit cell, you need to distinguish between two voltage levels.  To store 2 bits, you need to distinguish 4 levels.  For 3 bits, 8 levels.  For 4 bits, 16 levels.  I think at the 4 bit/16 level point, we’re down to where 10-20 individual electrons can make the difference in the bits read out.

This will less durable than current SLC. How do you explain that to consumers?

The answer is easy, but doing it is hard.  You have to make it so that the issues are completely invisible to consumers.

Note that this has been done successfully with flash for years.  Most of the memory cards (SD, MMC, etc) that people have been buying for years use MLC flash.

Flash has read errors - that’s why vendors implement error detection.

NAND chips are generally organized in write pages, with a spare area for each page - typically 2kB page, with 64B of spare area.  The spare area is used to store ECC parity data, and meta data (more about this shortly).

HDDs have read errors as well, they also write their data to the platter using ECC, and other algorithms that make it easier to recover the bit clock and align the heads when reading the data back.

But flash has a problem disks don’t: flash drives move your data around a lot more often than disks do. Every time a flash drive writes a page, it has to erase the entire block that page is in.

Not quite right.  Generally, a page can only be written once, and has to be erased before it can be written again.  And unfortunately, erases can only be done on an erase block, which is usually 64 write pages.  If you have to erase a page, then you might have to move 63 other pages to free up the erase block - yuck!  It happens sometimes, but the FTL (flash translation layer) software that manages all of this is usually optimized to avoid this situation as much as possible.

The normal scenario is that you write a page, and the FTL just puts the new data in a new page somewhere, and marks the old page as obsolete.  Once you the FTL runs low on space, it needs to do garbage collection, but if you put a little extra NAND in your system so that even a full filesystem has some empty pages, you can make that pretty rare.

No hard numbers from the vendors - depends on how good their signal processing algorithms are - but it could easily be 5,000 writes - down from 10,000 today.

Actually, some of the NAND vendors are already at 5k erase/write cycles today.  This, and slow write speeds are definitely the weak links for MLC NAND.

I believe that it is possible to do a good enough job with caches in the computer DRAM, and in the FTL to make a system built from 5k endurance work for a very long time.

Note that the 5k number is a statistical thing - this is the number of cycles at which about x% of the blocks will have failed (I think x% = 50%, but I didn’t look it up).  This means that some blocks might fail when the part is new, and some might last a lot longer.  If the software is done right, then the amount of available storage space will gradually shrink as blocks fail, and the entire drive won’t suddenly fail.

The map that keeps track of where your data is rapidly gets very complex - and itself is regularly read and rewritten. How well protected is this critical data structure? If it isn’t bulletproof you can kiss your data good bye.

All true.  But you can also write metadata information in the spare area, to allow you to rebuild the FTL map if something goes horribly wrong.

Also, HDDs have the same problem with their FAT tables, or the modern equivalent.  This is normally stored on the disk, and in the computer’s RAM, with the disk copy being a little out of date.  Lose power at the wrong moment, and bad things can happen.

The StorageMojo take
Many thanks to the anonymous contributor. Net/net this points again to the suitability of flash drives for servers - and not so much for notebooks - the original subject.

The larger issue is the lack of transparency on the part of NAND SSD vendors. Until their architectures can be independently reviewed, we all have to rely upon marketing assurances - not! - and the useful but skimpy testing provided by sites like Anandtech.

The server-side SSD market can work with those limits. After all, the vendor of the complete system has to stand behind it.

But that is a tiny fraction of the total available market. The big win is on the consumer side: 100+ million units; if the product delivers.

Samsung, Toshiba: your current strategy is doomed. You need to engage at the consumer’s level instead of relying on the usual marketing hype. Your product is too costly, now and 3 years from now, to succeed without delivering real benefits.

You aren’t there yet.

Comments welcome, of course.

Notebook flash SSD market: fantasy or mirage?

April 27th, 2008 by Robin Harris in Architecture, SSD/Flash Disk

Fresh off the HD-DVD fiasco, Toshiba execs are stepping up to pursue another expensive flop: notebook SSDs. Memo to Toshiba: people won’t pay huge SSD premiums for nothing. And almost nothing is what flash SSDs provide today - and for the foreseeable future.

Please sir, may I have another!
Given the multi-billion dollar cost of semiconductor fabs, getting the notebook SSD market wrong would make Toshiba’s $250 million HD-DVD loss look cheap. The president of Toshiba semi, Shozo Saito, recently opined that flash drives will be in 25% of notebooks by beginning 2011.

He is so-o-o wrong.

Hand me the back of the envelope, please
Guessing 200M notebook sales in 2011, 50 million flash drives of, say 250 GB, for total sales of 12.5 million TB of flash. Assuming a cost reduction curve of 50% annually from today’s spot market MLC $2500/TB to ~$320/TB in 2011 . . . hmm-m . . . $4 billion in chip sales.

Give or take. Yummy!

If Toshiba projects winning 20% of the market, $800 million in sales would justify over $1 billion in flash factory capacity. And if the market doesn’t appear, a billion dollar write off.

Same power, same performance and way more costly - I’m sold!
If flash drives delivered what proponents claim there would be no problem. But they don’t and they won’t.

Power: no SSD notebook has gained more than 10 minutes battery life over disks. Since flash is already power-efficient that won’t change. Disks have multiple opportunities to improve power use - and with over a $1 billion a year in R&D behind them - they will.

Performance: tested application performance hardly changes either - even with a $3,800 flash drive. Notebook I/O doesn’t favor flash drives - and the engineering contortions needed to fix flash aren’t cheap.

The one big win for flash performance: boot and app load times. It makes the system feel a lot snappier - if you often reboot. Sleep mode makes that much less important.

Reliability/durability: flash vendors tout 2 million hour MTBFs and superior shock & vibe specs. Yet Dell reports that their SSD infant failure rates are about the same as disks. And the return rates are higher.

So where, exactly, is the flash advantage? Plus, it is only conjecture that flash drives will prove to be more reliable in actual notebook use. Only time will tell.

And what about the 4-bit MLC that Toshiba is counting on to drive costs down at 40-50% per year? This will less durable than current SLC. No hard numbers from the vendors - depends on how good their signal processing algorithms are - but it could easily be 5,000 writes - down from 10,000 today.

How do you explain that to consumers?

Data integrity: the unasked question Of all the questions about flash drives, this is the biggest. I have yet to see an SSD read error spec.

Flash has read errors - that’s why vendors implement error detection.

But flash has a problem disks don’t: flash drives move your data around a lot more often than disks do. Every time a flash drive writes a page, it has to erase the entire block that page is in.

So what happens to the data in the block? It gets read - almost always correctly - and rewritten along with the new page. The new location must be tracked by the drive.

The map that keeps track of where your data is rapidly gets very complex - and itself is regularly read and rewritten. How well protected is this critical data structure? If it isn’t bulletproof you can kiss your data good bye.

If FTL’s are like every other storage product, catastrophic failure modes are hiding in the statistical weeds. Enterprise IT is rightly suspicious of storage that “auto-magically” moves data around. Consumers have no idea. SSD vendors better have their act together or the class action suits could be as big a problem as the empty fabs.

The StorageMojo take
The further I wade into flash issues, the worse it gets. My sense is that the flash industry close to creating a multi-billion dollar fiasco. Why?

  • Over-promising on performance, reliability, battery life and data integrity. Take a systems level perspective, folks. Consumers do.
  • Over-broad positioning of flash drives as a general replacement for notebook hard drives - when pricing clearly says they aren’t.
  • Relying on system OEMs like Dell to market SSDs to consumers is a freeway to failure. They don’t have the bandwidth. The flash vendors need to market flash SSDs directly to consumers. Not sell them - market them.

The flash guys are caught in a vise: big expensive fabs that need to run all year; and seasonal demand that whipsaws their pricing all year.

Notebook flash drives can help even out demand - but only if consumers accept them for the right reasons. Otherwise Toshiba’s new fabs will build chips for a non-existent market.

Update: Flash has a place in one notebook niche: below the $40-$50 minimum cost of a disk. As we’re already seeing with the Asus Eee, replacing $50 of disk with $10 of flash makes a big price difference. But those units won’t solve the seasonality problem and may even make it worse. End update.

Comments welcome, of course.

Flash futures

March 11th, 2008 by Robin Harris in Enterprise, Future Tech, SSD/Flash Disk

How flash is really going to affect the storage industry is becoming clear. The short take: not as big a deal as flash vendors hoped. The longer take: There won’t be much of a mid-range flash market; instead we’ll see either costly fast flash or cheap slow flash.

There are lots of theories about how flash will alter the mass storage landscape. This is mine.

The flash write problem
The fundamental flash problem is the slow writes. There are 3 elements to the slow write problem.

  • Flash has to be erased before it can be written. Every write operation is really 2 write operations.
  • The writes are large. Typical block sizes are 128KB to 256KB. Writing a single page requires writing - after erasing it first - the entire block.
  • The write bandwidth to a single block is less than a slow disk. High bandwidth writes requires parallel paths to multiple blocks.

These problems can all be engineered around.

  • Garbage collection-like algorithms can be extended to enable a supply of erased blocks
  • RAM backed by a small battery or capacitor can buffer writes for later re-writing to flash
  • Controller chips can be built in high volume with multiple data paths

But at what cost? The first two require well-engineered software and some sort of CPU to run it. Since it is software it will have bugs. Can it be any more reliable than current drive firmware?

The dilemma
For enterprise use, flash-based SSDs need to be rock-solid, which implies a lot of careful and costly engineering. For consumer use, they need to be very high volume, which means low-cost.

It is a similar problem to RAID controllers: very low-end RAID controllers aren’t reliable enough for enterprise use. They also aren’t cheap enough - or easy enough - for consumers to buy in volume. RAID controllers have engineering problems similar to flash translation layers.

Making flash drives look like disks makes them easy to integrate, but if you really need performance it also makes them costly - like the $10k for the flash drive EMC is using in the Sym.

Flash in the disk controller?
As I’m writing this a NetApp exec says that flash will be disruptive because by placing flash in a disk controller they will reduce the need for the costly and highly profitable fibre channel disks. That could be correct. It sounds smarter than sticking flash on a disk.

The StorageMojo take
Despite the miracles of cost-reduction and integration the industry regularly performs, some things, like power provisioning, don’t get cheaper. High-quality software engineering doesn’t either. That is what high-performance flash drives require.

The high-performance consumer flash drive seems to be a mirage. I’d like to be proven wrong, but today’s notebook SSDs don’t offer superior application performance and cost 10x as much. Hardly a recipe for success.

Update: Intel is planning to offer “high-performance” flash drives with partner Micron. I saw an impressive demo - is there any other kind? - at the Storage Visions conference. But with the early marketing missteps of Samsung, it looks like the consumer flash drive may fall off the hype cycle into a deep ditch. Flash drive marketers: now is the time for precision marketing if you ever hope to establish a mass market. Consumers remember unkept promises. Until you are cheaper. End update.

Comments welcome, as always. Also check out BPLRU: A Buffer Management Scheme for Improving Random Writes in Flash Storage by two Samsung researchers, Hyojun Kim and Seongjun Ahn for a nice intro to flash issues.

Flash talking - and a wee DRAM - with Texas Memory Systems

March 7th, 2008 by Robin Harris in Enterprise, SSD/Flash Disk

I ran into Woody Hutsell, EVP at Texas Memory Systems, last week. He graciously agreed to a talk on camera about their experience with flash and DRAM-based solid state storage.

TMS sells both: a DRAM-based SSD with multiple FC and Infiniband ports; and a 2 TB flash box with 128 GB of DRAM cache. Woody offered some interesting insights. For example, workloads with a large number of writes - even if they are a small percentage of the total workload - may not be suitable for flash-based storage.

Here’s the video:

Blame me for the shaky camera work.

Disclosure: I taped and edited this gratis.

Comments welcome, as always. BTW, Google now accepts files up to 1 GB. Seagate and WD should be happy.

Set phasers to “change”

February 5th, 2008 by Robin Harris in Future Tech, SSD/Flash Disk

Flash may be getting all the attention, but the boffins are working hard to ensure we have options to flash. We need those options because flash has some serious limitations, like random write performance and density, that we may not be able to overcome.

On the other hand it is easy to underestimate the power of sustained investment in R&D. Disk drives have successfully fended off numerous would-be usurpers thanks to their incredible areal density growth.

Now flash is facing a challenger
Intel and STMicroelectronics’ new phase-change memory threatens flash, but not any time soon. Unveiled at ISSCC, the thermal phase-change device can store 2 bits per cell, like MLC flash.

I asked Jim Handy, of Objective Analysis, a semi-conductor market research firm, for his take on the Intel-STM announcement. He responded:

The big question is “When does Phase Change stand a chance?” I doubt that it could be competitive today because a wafer with a new material is bound to be much more costly than a pure silicon wafer until volumes get up, and the volume won’t get up until the price is competitive with flash. The number of chips per wafer is about the same for PCM as for NOR flash, so there’s no cost advantage from die size.

Once flash reaches its scaling limit brick wall (which was expected to be 2006 at 65nm, then at 25nm in 2012, now looking like 10nm around 2016) then PCM will zip right past it. Trouble is - that brick wall has legs and keeps zipping ahead of us.

Until then PCM should have trouble competing on cost, and cost is everything in the semiconductor memory markets.

One advantage PCM has is that it has a fast write, so in many cases a PCM chip can replace a flash and a RAM. This means that the cost target is something higher than simply matching a NOR price.

The significance of an MLC PCM is that it puts PCM on the same footing as MLC NOR. Had that not happened, then there would have been a longer delay before PCM replaced NOR, since a PCM die size would have been twice as large as an MLC NOR of the same density on the same process.

As for PCM replacing NAND, wellllll…… that’ll take longer since NAND’s about 1/3 the cost of NOR. The same brick wall impacts both technologies, though, so it will happen!

Thanks, Jim. I hadn’t realized that PCM could replace a NOR and a RAM chip.

The StorageMojo take
Jim’s take is better informed than mine. My take away is that the growth of PCM will depend on how broad a market niche it can build for itself over time. That won’t be easy.

Comments welcome, of course.



Next Article »
StorageMojo RSS Feed January 2009 December 2008 November 2008 October 2008 September 2008 August 2008 July 2008 June 2008 May 2008 April 2008 March 2008 February 2008 January 2008 December 2007 November 2007 October 2007 September 2007 August 2007 July 2007 June 2007