Venturing forth from the Shire
I went back to Silicon Valley last week to see if I could still deal with more than three people in a room. Living in a remote mountain valley and working at home, I don’t get out much. Visited Cisco and some clients. A good trip.

The best surprise was the charming and efficient Sara Delekta Galligan emailing me to say that Steve Hetzler had some free time and would I care to meet with him? Of course! I’d come across some of Steve’s iconoclastic thinking on the web and emailed him to see if we could talk

Steve works in the gorgeous IBM Almaden research facility up in the hills south of San Jose. And his office has a way nicer view than yours.

Steve Hetzler, storage rock star
Steve’s write up on the IBM site:

Steven R. Hetzler is an IBM Fellow at IBM’s Almaden Research Center (San Jose, Calif.), where he manages the Storage Architecture Research group.

He is currently focusing on new architectures for creating highly fault tolerant storage systems, iSCSI data storage systems and markets and applications for nonvolatile memories. iSCSI is a protocol for managing storage over IP networks that he initiated within IBM Research and also named. His group wrote the first draft specification, developed the first working iSCSI demonstrations, including the first direct network-attached DVD movie multiplex, and was active in helping develop iSCSI into an industry standard.

A prolific inventor, Hetzler has been issued 35 patents for inventions in a wide range of topics — including data storage systems and architecture, optics, error correction coding and power management. His most notable patents include split-data field recording (issued in 1993) and the No-ID(TM) headerless sector format (issued in 1995), which have been used by nearly all magnetic hard-disk-drive manufacturers for a number of years. He also pioneered the first adaptive power technology for disk drives, which is also widely used in disk drives for mobile computers.

Hetzler has received numerous IBM awards for his work, including three Corporate Awards, and a Corporate Environmental Affairs Excellence Award. He is a member of the American Physical Society, a senior member of the Institute of Electronics and Electrical Engineers and a member of the IBM Academy of Technology.

A native of Red Wing, Minnesota, Hetzler was educated at Carleton College (Northfield, Minn.), where he received a Bachelor of Arts in Physics in 1980, and California Institute of Technology (Pasadena, Calif.), where he received his Masters and Ph.D. degrees in Applied Physics in 1982 and 1986, respectively. He joined IBM Research in November 1985 and was named an IBM Fellow in 1998.

He’s an enthusiastic guy who likes thinking about storage
My note taking skills are pretty bad, so I can’t even begin to transcribe an interview. And it wasn’t really like an interview anyway. Steve just started rolling on storage issues and I tried to hang on.

Here are a few of the topics Steve broached. He thinks and talks fast, and I listen and type slow, so please consider these my impressions rather than quotes. Also, I’ve organized my random notes into topics and inserted what I thought fit. Bottom line: if something sounds stupid it is my fault.

About the industry:

  • People frightened that Moore’s law won’t continue
  • Storage companies tend to ship not what customers want but what the company can deliver
  • Storage requirements:
    • Cheap
    • Reliable
    • Simple

That sounded about right to me.

IP-based storage

  • IP-based storage is about sharing the closet rather than sharing the clothes – the file sharing capability is secondary to having a big convenient place to put stuff
  • Is popular because it is on the IP network, i.e. cheap, reliable, simple

Steve used to work on disks and had some thoughts on the AFR controversy

  • Disk AFR measurements are flawed
  • Accelerated test methodologies are focused on a specific failure mechanism – temperature – which, as Google found, isn’t a critical issue
  • Weibel plots are simply descriptive – no underlying mechanism or ratio is assumed – so they have limited value as a tool for understanding disk behavior

On storage systems

  • Design challenge: highly unreliable components making a highly reliable system
  • RAID 5 & 6 don’t scale well to petabyte systems. One reaon: rebuilds are, in effect, Denial-of-Service attacks: the rebuild typically cuts I/O performance by almost half for the affected RAID
  • Disk capacity up 4 orders of magnitude while hard error rate hasn’t changed
  • Logical incremental improvements of the RAID concept have gotten us to a point where we wouldn’t choose to be if were were designing today from a clean sheet of paper

But wait! There’s more!
Steve graciously slowed down long enough to try to educate me on some of the finer points of NAND flash problems as a solid state disk and some issues with 2.5″ drives. I didn’t get most of it but I’ve got a big yellow note on my monitor to “figure it out!” so maybe, one of these days, I will.

After 90 minutes of geek speak the patient Ms. Galligan suggested we might want to wrap it up. Good thing too: I made my flight with only five minutes to spare.

What is Steve and his team working on today?
Steve wouldn’t talk about his current project other than to say it is something cool and maybe, just maybe, he might have something to demo later this year. I’m pretty sure this will be a prototype rather than a product, so don’t start saving your pennies to buy one just yet.

Steve wouldn’t say what he is working on, but that doesn’t stop me from guessing. My SWAG: a cheap (by array standards), highly reliable, block-based storage system, running over IP and managed by a clustered software layer that protects data at the block and file level, rather than using RAID. I can hardly wait to see what it really is.

The StorageMojo take
Data storage and information management are, IMHO, the biggest problems in computer science today. The exponential growth in stored data creates huge challenges and opportunities for scale-out storage architectures. It is good to see smart folks like Steve working on these hard problems.

And kudos to IBM for supporting this kind of basic research. I think it beats the heck out of “checkbook innovation” as widely practiced in the industry. But that’s another post.

Comments welcome, of course. Steve, Sara, hope I didn’t get anything too wrong. If I did I’m happy to go back and update.

Update: Sara sent me a couple of modest suggestions, which I’ve updated, as well as a more current bio for Steve.