But not that kind of science project. This is the real deal, already running near a petabyte, needing to upgrade and looking for answers. Sounds like they’ll be spending real money real soon.

I edited for brevity and asked the writer to monitor the comments to help answer any questions.

The neuroscience institute:

I work for a large neuroscience institute. We’re big data generators, now using Sun’s SAM-FS for data migration between FC-AL tier1, SATA tier2 and large LTO4 tape silo stores for tier3. We use LTO4 because the cost/benefit in STK’s T10K B drives just didn’t add up!

We’ve run this for the last couple of years, nearing a PB between disk and tape.

It’s been a bumpy road, as HSM can be, if implemented with “end user touching” in mind. It’s taken a couple of years, a lot of development between us and the SAM-Q engineers, and many sleepless nights to make it work near seamlessly for end users.

Our problem:

  1. Our meta data slices (for various reasons) on 15k RPM FC-AL disk in the STK 6140 arrays are barely able to keep up with their workload.
  2. We need to expand our disk infrastructure for both front end high performance disk and backing space commodity disk archive (SATA and lots of it!).
  3. Tape is all good. Cool, calm and collected ;).

HDS tells me their AMS2500 is great: beautiful SAS backplane; amazing cache partitioning; and brilliant scalability. I worry that 3Gbit/sec SAS spindles will not have the same kind of concurrency that my traditional FC-AL 6140 chassis does (not that it’s helping much!).

Sun tells me their monster HPC array STK 6780 will blitz anything. And they’re willing to qualify SSD’s inside the FC-AL connected array shelf to fix metadata I/O constraints.

Further, Sun is suggesting their F5100 (SAS connected only, if I recall correctly), will be the answer to my meta data latency and IOPS woes. But it’s only SAS “direct connected” and doesn’t support the FC-AL loop for fail-over between hosts.

My take, so far:

  1. Flash and SSD are being pushed hard. It *seems* sensible to win a small IOPS war (I need thousands of IOPS, not hundreds), and the latency war of mechanical disk – but I worry about Sun’s roadmaps, future and overall strategy. I’m currently in talks with Fusion-io.
  2. With regard to the OpenStorage promise of “changing storage economics”, I am unconvinced the price is as sharp as it could be, considering I could white box it myself, with some large supermicro JBOD arrays + OpenSolaris!
  3. Between HDS and Sun, it’s a tough choice. I’ve not seen HDS play in this space before, and am unfamiliar with their market strategy, their gearing towards more HPC/big data mover scenarios and “science” in general. Are they really geared for these kinds of workloads? I’ve only ever seen them employed in the generic enterprise “Let’s run MS Exchange and Oracle” space.
  4. With Sun, I like the promise of the big 6780 chassis – everyone says it is the “monster” array that will handle anything, but with the premium price of FC-AL compared to SAS….

I’d love your thoughts!

The StorageMojo take
Seems like a job for scale-out cluster storage man – like IBRIX, Parascale or Scaleout. It may also be a fit for some of the new tiering-in-a-box vendors like Avere Systems or Storspeed.

But if we stick with the big boys – Sun and HDS -how should they sort this out? Is there anyone else they should look at? Update: Readers suggested Gluster, a very cool scale-out file system and Isilon, the easy-to-manage cluster storage system. Oh, and Bycast which is quite big – through OEM deals with IBM and HP – in the medical imaging space. End update.

Vendors are encouraged to respond. Please do us the favor of identifying yourself as such.

Courteous comments welcome, of course. I did work for IBRIX and Parascale at one time.