From the category archives:

SSD/Flash Disk

TMS announces 450 GB PCI-e SSD

by Robin Harris on Tuesday, 10 March, 2009

Fusion-io had the PCI-e flash card market all to themselves for the longest time – but no more. Texas Memory Systems, a stalwart in the DRAM-based SSD market, has announced a new product, the RAMSan-20, a 450 GB SSD on a full height, full length PCI-e card.

With 450 GB of usable single level cell (SLC) flash onboard, the company is aiming at enterprise-class users who put a premium on reliability and availability. SLC is typically spec’d at 10x the writes of cheaper multi-level cell (MLC) flash. Like Fusion-io, TMS provides a thin block-device driver.

With an ~$18k list price, the RAMSan card isn’t for gamers. But the large capacity offers a lot of go for servers large enough to drive it at 120k random reads. That should appeal to the TMS customer base and reseller channel.

The StorageMojo take
This won’t be the last PCI-e flash card announcement this year – expect another tomorrow. And maybe next month. The obvious rightness of putting flash on PCI-e has several vendors panting to get to market.

Vibrant competition will develop this year. Given the economic tide the ability to make a paid-for server act like a new and bigger server will be very attractive. In 3 years we’ll all be wondering “how did we manage before flash card SSDs?”

Courteous comments welcome, of course. I shot a complimentary video of TMS president Woody Hutsell discussing flash and DRAM SSD last year. I’ve done some work for Fusion-io, including this video 4 months ago. Both worth a look.

{ 5 comments }

TransFlash

by Robin Harris on Saturday, 17 January, 2009

Truism: flash is not the same as disk. So why don’t we take advantage of that – rather than hiding it?

Partly is it is the human SOP: first build the old thing out of the new stuff. Not to mention the commercial allure of hundreds of millions of SATA interfaces in the wild.

Helping us move on is a paper by researchers Vijayan Prabhakaran, Thomas L. Rodeheffer and Lidong Zhou Transactional Flash (pdf) of Microsoft Research. Vijayan has also co-authored flash papers with Ted Wobber et. al. noted elsewhere on StorageMojo.

Flash is a good fit.
The authors note that the essence of all transactional constructs is to avoid in-place data modification – enabling roll back to a known state. Since flash SSDs can’t re-write data in place, TransFlash makes a virtue of flash necessity.

Flash SSD architectures also have much parallelism, due to the use of many flash chips, each including multiple planes and blocks, with multiple I/O paths to support garbage collection and wear-leveling – and now, WriteAtomic.

Finally, the data scattering caused by avoiding in-place data rewrites – typically through copy-on-write strategies – is not the problem for flash that it is for disks: flash excels at fast random reads.

What is TransFlash?
TransFlash is a flash SSD with 3 important enhancements:

  • It exports a transactional interface WriteAtomic.
  • The flash controller implements a cyclic commit that uses flash’s per-page metadata storage – typically 128 bytes – instead of the common independent commit record.
  • Both of these features are implemented in the flash translation layer controller firmware – no hardware engineering required.

The authors named their invention TxFlash, but I like TransFlash better since Tx also abbreviates transmit. It also sounds sexier, a rare quality in computer science naming. Really guys, it will help commercial adoption.

WriteAtomic model
The key API construct is described thusly:

TxFlash exports a new interface, WriteAtomic (p1 . . . pn), which allows an application to specify a transaction with a set of page writes, p1 to pn. TxFlash ensures atomicity, i.e., either all the pages are written or none are modified. TxFlash further provides isolation among multiple WriteAtomic calls. Before it is committed, a WriteAtomic operation can be aborted by calling an Abort. By ensuring atomicity, isolation, and durability, TxFlash guarantees consistency for transactions with WriteAtomic calls.

The authors compare 3 commit protocols – traditional commit, simple cyclic commit, and back pointer cyclic commit – and evaluate their resource requirements. The table shows that the new commit protocols reduce I/O overhead, differing in their treatment of aborted transactions.
transflash_commit_protocols
The simple cyclic commit has to erase aborted transactions before any new writes can be written to the same page. This could slow response times if aborted transactions are common.

Compared to traditional commits, the new protocols double transaction throughput because they don’t require additional commit writes and write ordering. This is most important with small transactions, as transfer times affect large transactions.

End-to-end benefit
The author’s simulations with a pseudo-device driver under various workloads found that TransFlash adds minimal overhead. The big win is in file system complexity, that:

. . . can be reduced by using the transactional primitives from the storage system. For example, the journaling module of TxExt3 contains about 3300 LOC when compared to 7900 LOC in Ext3. Most of the reduction were due to the absence of recovery and revoke features and journal-specific abstraction.

The StorageMojo take
TransFlash works on multiple levels:

  • It simplifies a longstanding problem with little required device investment.
  • It creates a high-value storage interface – with its attendant margin enhancement opportunities – for an industry whose current margin cows will soon die.
  • It reduces file system complexity – an under-appreciated issue – while improving performance for small write transactions.

History will favor BPCC as Moore’s Law drives flash translation layer controller performance up and flash storage costs down. Unless someone comes up with something even better.

Whether or not TransFlash ever sees the light of day, the paper is a welcome reminder of the benefits of pushing the envelope. With all the new storage technologies coming online we’ll have many opportunities to change the I/O landscape in coming years.

Courteous comments welcome, of course.

{ 4 comments }

Watkins walks – can Seagate fly?

by Robin Harris on Tuesday, 13 January, 2009

He was pushed
Seagate’s Bill Watkins, CEO since 2004, apparently lost the confidence of the Seagate board and was replaced by former CEO and current Chairman Stephen Luczo. Why?

Not Watkins’ sometimes bizarre public statements: that would’ve gotten him the boot years ago. A slow reaction to the worst recession since the 1930s forced the issue.

The bigger problem
In the short-term global demand for disk drives is down. But managing production cuts is easy compared to Seagate’s real problem: creating new product demand.

That is Seagate’s long-term challenge. The turmoil created by the rapid rise of flash and cluster-based storage means there will be more changes in storage in the next 5 years than in the last 15.

Suggestions for the new CEO
Every company faces tension between the “business as usual” guys and the forces of change. At Seagate inertia has overwhelmed innovation for too long.

The confusion around flash strategy is one symptom. The inability to sell the Integrated Storage Element, now a Xiotech product, is another.

Seagate needs to climb down from the minimum 1 million unit order mentality and open itself to investing in a variety of small businesses that may grow large. Even those that don’t will be educational.

A few ideas:

  • 10K SATA drives. WD has built a successful consumer franchise with their Velociraptor 10K SATA drives. But system vendors won’t touch it without a second source of supply. How hard is it to put a SATA interface on the Savvio 10K SAS drives?
  • Cold archive disk. Disk is the new tape — so why not make a business out of it? In less than a year Seagate could have an archive drive that would meet consumer needs for long-term personal data storage. Long-lfe lubricant, reduced RPM, a nifty USB docking station and a 7 year archive life and within three years you’d be selling tens of millions of drives into a new segment.
  • Get serious about flash. This has two elements: reducing disadvantages in speed and durability — see 10K SATA drives above – while continuing capacity growth; and rethinking flash-based storage. Use flash’s unique capabilities. Seagate needs a flash skunk works/biz dev team with the charter to incubate new concepts.
  • Get serious about the home. Seagate may never become a system supplier to the home — although it could — but it certainly can become a leading supplier of storage components for the home. The cold archive disk is just one idea.
  • Do a root cause analysis on the ISE failure. Xiotech is turning the ISE into a real business where Seagate failed. That’s just wrong. Why couldn’t Seagate market a Superdisk that doesn’t compete with existing customers? Understand that problem and you will position Seagate for more decades of success.

The StorageMojo take
The nimble, innovative Seagate of years past has lost its mojo. Focused on million drive orders the company has ignored tectonic shifts in the storage market.

The good news is that digital storage demand is on a long-term growth trend. Disk drives aren’t buggy whips – the company’s best days are ahead of it.

Investing in new products while cutting production of old ones is painful. But the alternative is worse.

I hope Seagate’s new management will rise to the occasion. Seagate people are looking for leadership who will challenge their creativity and unmatched technological depth.

Courteous comments welcome, of course. Especially from Seagaters, of course!

{ 11 comments }

The top storage stories of 2008

by Robin Harris on Sunday, 4 January, 2009

The world of data storage is changing faster than it has since the mid-90’s amid the rise of hardware arrays and storage networks. Looking back 2008 will be seen as a pivotal year. The big news, in rough ascending order:

FCoE
Though production-ready products are still in the future, the broad vendor embrace of Fibre Channel over Ethernet signaled the beginning of the end for the Fibre Channel physical layer. The storage companies who profited from a decade of Balkanizing the storage network market will have Cisco calling the shots.

Brocade, in particular, needs good strategy advice. Maybe one of these days they’ll get it.

Blu-ray tanks
Call me old-fashioned, but I have a soft spot for removable media. So I’m sorry to see Sony screw the pooch with Blu-ray’s big-studio-friendly licensing and hate-the-customer DRM. And sinking the PS3 as well.

The good news: you can put HD content on a standard DVD – just not as much; and there’s an upscaling dvd player – the Oppo Digital DV-983H that upscales ordinary DVDs to near Blu-ray quality. Yes, even better than the upscaling on a Blu-ray player. One less reason to pay the Blu-ray tax.

2.5″ drives
Rumor has it that Seagate is designing its last generation of 3.5″ drives, which augurs the switch to SFF in desktop and enterprise systems. 3 years ago 2.5″ drives were 1/5th the capacity; today the gap is 1/3 the capacity and a much smaller price differential.

At some point it will occur to Seagate’s top management that 1.8″ and 2.5″ drives are the disk industry’s best answer to flash. Now, if Seagate were in the I/O business, it would be a different story.

Zero-maintenance storage
Xiotech and Atrato introduced storage boxes that guarantee capacity, performance and uptime with no maintenance for 5 and 3 years respectively. These are storage game-changers.

That Seagate sold ISE to Xiotech after spending years developing it has to be one of their biggest blunders ever, several notches above buying Xiotech in the first place. The ISE is, in effect, a super disk that Seagate could have sold to all its enterprise disk customers.

Flash
2008 is the year that every major vendor – with the laudable exception of laser-focused WD – announced alliances and/or plans to enter the flash drive market. High-end SSDs will displace 15k high-end disks in the next 3 years.

But flash-in-disk-clothing is the near/medium-term solution. Fusion-io and Violin are on the winning architectural track. Flash belongs between the CPU and disk layers: that’s where we’ll get the most benefit for the added cost.

Hey, disk vendors: want to stick it to Intel, Micron and Samsung? Buy one of them. You are in the I/O business, not the disk business.

Commodity-based cluster storage
EMC’s Atmos, HP’s Extreme Storage 9100 and IBM’s XIV are commodity-based cluster storage. The important thing is the storage mainstream has embraced storage clusters based on commodity hardware and mostly open-source software. That’s what Google did years ago and soon many companies will.

Yes commodity hardware saves real money, as I and Bill Mottram of Data Mobility Group found out when we ran the numbers on HP’s 9100 vs Isilon, NetApp and Sun. We’ll see if Atmos is on the latest EMC price list when I do the updates later this month.

The StorageMojo take
2009 will be a great year for the hungry and flexible. The ongoing financial train wreck is trouble for Big Iron fans in the data center.

Fortunately, help is on the way. Look for my 2009 forecast before the end of 2009.

Courteous comments welcome, of course. Of the companies mentioned I’ve done work for HP and Fusion-io.

{ 14 comments }

Flash and the new storage pyramid

by Robin Harris on Thursday, 4 December, 2008

I got a note from David Flynn, co-founder and CTO of Fusion-io (disclosure: I’ve done work for them) in response to The new storage pyramid. He makes several points about the nature of the array model that I wish I’d made.

Well worth the read.

David Flynn’s note:
Geat analysis Robin.

And, great comments.

My $.02 ….

I think it’s not just about the proprietary nature, the somewhat better performance and features, and the high markups that differentiates “storage arrays” from “clustered storage”.

It’s actually more to do with the vertically integrated nature of the business model of the companies in the array building business. This leads to proprietary architectures, higher margins and, true, somewhat better performance and features.

Let me explain through an analogy…

We used to get graphics workstations from SGI, Apollo, and other vertically integrated vendors, who sold everything end-to-end, down to the monitors and their own proprietary OS’s. These guys commanded HUGE margins – partly to reward their risky investment in solving a worthy, complex problem.

Similarly, the military (and other few others who could afford a million dollar price-tag) used to get flight simulators from Evans&Sutherlands who were also vertically integrated and insanely expensive. You even had niche vendors like Intergraph doing 3D graphics information systems who could justify their own proprietary architectures.

At least for a while.

They were all doing 3D graphics in one form or another. And, now, they are all GONE – thanks to the emergence of a component, the 3D graphics card.

With enough capability to be applicable across all of these different verticals, the 3D graphics accelerator has now shattered the benefit of running a vertically integrated business.

Today, there are myriads of “integrators” who make graphics workstations, flight simulators, GIS systems, etc. at very low margin by comparison. And, they do it by pulling together off-the-shelf components – all commoditized down to the software that provides even the high-value features.

They might have been inferior to the proprietary solutions at first, but not anymore.

Now, what happens when you introduce to the storage industry a component that commoditizes and trivializes the linch-pin reason for expensive proprietary disk arrays, namely the caching tier – using NAND flash.

Once anyone can easily get the performance across any use case (OLTP, OLAP, Data Warehousing, BI, VOD, content caching, etc. etc.) you no longer need vertical specific, highly tuned, proprietary solutions from vertically integrated companies.

Every capability that doesn’t migrate into the component itself becomes nothing but commoditized software to be layered on top by any number of interchangeable integrators. Things like replication, disaster recover, backup, dedup, and so on just become commoditized software that can run anywhere.

This is a classic Adam Smithian market evolution. What used to be a single, vertically integrated provider becomes a layered market where some people build the components, others integrate them (with some bit of value add), and you go to having many players competing on many levels.

And prices go down.

But, thankfully, (for those of us in the business of creating this componentized building-block) volume, productivity, and efficiencies all go up.

So, actually everyone wins. Including society as a whole.

Well, almost everyone wins. Everyone, that is, except for the proprietary array vendors who get caught by the innovators dilemma and a business model that used to be the correct one, but no longer is.

This generally makes them the slowest to simplify their proprietary infrastructures around the commoditized component – to help justify their investment into their heroic proprietary solutions.

In an effort to protect their margins, they endeavor to make things seem as complicated as possible. They do this, say, by preferring that NAND be forced to pretend to be an HDD and be put into HDD drive bays behind HDD protocols, where it has little ability to simplify things or get much additional performance.

They are the last to come out and say it can be simplified. Instead they’ll tell you you must have features X, Y, Z. And, see, those aren’t as good as with our proven architecture.

Let’s take high availability as an example. They aren’t going to tell you that a “shared nothing” strategy – where two separate RDBMS servers with terabytes of direct attached NAND inside of each use off-the-shelf log-shipping for asynchronous replication (or query replication to do it synchronously) to get fault tolerance.

No, they aren’t going to tell you that it’s actually simpler, more cost effective, and, here’s the real kicker… more fault tolerant to share nothing, than to use shared storage – no matter how fault tolerant they claim their monolithic storage array is, it’s still shared.

I’m not saying this market transformation is going to happen by tomorrow. But, given the geometric growth of the performance gap between processors and storage, and the geometric decline in cost of NAND flash – leading to a “Moore’s Law Squared” effect in the benefit to cost ratio – it is going to happen faster than people would think. Even considering the “stodgy” nature of storage folks who are in the business of obsessively caring for precious bits.

It doesn’t hurt that in this global recession companies are looking for ways to reduce costs while still needing to grow throughput. So, there’s more of a willingness to look at different, innovative ways to skin the cat.

I agree with you Robin. It will be a fait accompli by 2015.

David Flynn
CTO, Fusion-io

The StorageMojo take
Technology diffusion is a complex mashup of secular trends, technology development, individual creativity and happenstance. But the current direction of the high-end storage market points to the greatest change we’ve seen since the early 90’s and the advent of arrays.

The “Moore’s Law Squared” effect is particularly intriguing. Humans are terrible at estimating the impact of power functions, so this one is likely to be even more surprising than we dream.

Courteous comments welcome, of course.

{ 6 comments }

Stupid storage failures

by Robin Harris on Tuesday, 25 November, 2008

Valiant but doomed
The ZFS discussion thread had an interesting comment from Sun’s Jeff Bonwick, architect of ZFS, on storage device failure modes. How do you know a disk or a tape has failed?

You don’t. You wait, while the milliseconds stretch into seconds and maybe even minutes. Jeff states the problem – and Sun’s solution – this way:

. . . we’re trying to provide increasingly optimal behavior given a collection of devices whose failure modes are largely ill-defined. (Is the disk dead or just slow? Gone or just temporarily disconnected? Does this burst of bad sectors indicate catastrophic failure, or just localized media errors?) . . . there’s a lot of work underway to model the physical topology of the hardware, gather telemetry from the devices, the enclosures, the environmental sensors etc, so that we can generate an accurate FMA [Fault Management Architecture] fault diagnosis and then tell ZFS to take appropriate action.

With all due respect to Jeff, that solution seems iffy: how will you ever keep up with all the devices and firmware levels needed to make that work?

A community of prima donnas
There are lots of messy failure modes in computer systems. The literature around the Byzantine Generals Problem (Wikipedia – for a rigorous treatment download The Byzantine Generals Problem by L. Lamport et.al) tackles the problem of the malicious server in a community of network servers. That is a hard problem.

Knowing whether a storage device is alive, dead or only sleeping shouldn’t be so hard. They have powerful 32-bit processors – more powerful than a VAX 780 – and lots of statistics on what the drive is doing.

It seems like a disk could give a modulated heartbeat signal to drivers – “ready” “reboot” “caught in retry hell” “dead” – to decrease uncertainty.

The StorageMojo take
Drive vendors may think that non-standards for drive condition reporting are a form of lock-in, but that misses the bigger picture: the quality and timeliness of condition reports – even with a standard format – would be a competitive differentiator.

At the margin it would help slow the move to commodity-based cluster storage by enabling array vendors to improve their error handling and perceived reliability. It would also help disks versus flash SSDs, whose perceived reliability is partly due to the gap between user-judged drive “failures” and vendor “no trouble found” test results.

Storage systems all know how to deal with disk failures – they have to. So drive vendors, how about getting together to help make knowing a drive’s status a lot easier? Hey, IDEMA, make yourself useful!

Courteous comments welcome, of course.

{ 14 comments }

Flash isn’t tier zero

by Robin Harris on Wednesday, 13 August, 2008

A panel discussion on enterprise SSDs at the Flash Memory Summit came to an almost unanimous conclusion: NAND flash is best seen as an extension to DRAM and a layer between DRAM and disk – not as the guts of a disk drive replacement.

I don’t think the guy from Seagate agreed.

Since I was on the panel, my recollections have to be taken with grain of salt. But I was trying to resist the group think that too many panels fall prey to. Yet I agreed with the result.

Price changes everything
StorageMojo has reported at length on the problems of making a big, quirky EEPROM look like a disk. Flash doesn’t look much like DRAM either, but the two are cousins.

In the last few years price has altered the landscape. On today’s spot market a Gbit of DRAM is 7-10x of a Gbit of MLC NAND.

That wasn’t the case 3 years ago, so substituting flash for DRAM made no sense.

The market resistance to flash drives is because flash costs more than disk. Not a problem when augmenting DRAM.

The performance fit
Disks are millisecond devices; DRAM DIMMs are nanosecond devices; and NAND chips are microsecond devices.

More than once it was suggested that maybe it is time to bring back the 3600 RPM drive. Optimized for capacity, power and long life, it would be a good complement to servers with several hundred GB of flash.

The StorageMojo take
Flash as a new storage layer between DRAM and disk just sounds more logical than flash-as-a-disk-like product. Let disks be disks!

And flash be flash.

Courteous comments welcome, of course. More on this topic later. Stay tuned.

{ 19 comments }

StorageMojo at Flash Memory Summit

by Robin Harris on Saturday, 9 August, 2008

If you are attending the Flash Memory Summit in Santa Clara on Tuesday and Wednesday please say hello. Tuesday morning I will be sprinting between my two concurrent sessions.

In Forum F1B: Laptop Design session I’ll be giving a 25 minute presentation titled “Can The Flash Consumer SSD Be Saved?” In “Flash in Enterprise Storage Systems” a panel will hold forth on the promise of enterprise/solid state disks.

For reasons regular readers will appreciate, the latter should be more interesting.

The StorageMojo take
The summit will also have vendors showing their wares. I’m hoping to see some creative work.

The first thought with new technologies is to replicate what we already have. The real benefits from flash will come as we rethink the old architectures.

Courteous comments and questions welcome, of course.

{ 5 comments }