A reader writes:

I found your blog after searching for storage alternatives. I have to say, its really impressive and has helped me a lot so far. I was wondering if you could offer some advice.

We run an online version control service. Currently we are hosted on a VMware environment using FC SAN (SAS and SATA).

We’re growing into the 3 TB+ range and looking for alternatives, since we’re paying $2.50/GB for FC SAN (crazy). We looked at NetApp, but with all the stuff going on these days I have to think there is something less expensive and more creative.

Basically, our needs are:

  • Fast read and write performance (500+ r/w iops – we have over 13,000 commits per day)
  • Shared across many machines. We are currently using NFS.
  • Something that won’t require a team to manage. Although, we already manage our entire Linux environment.

I noticed a post about Gluster, ParaScale, and Nexenta. They look promising, but my fear is that they will require too much maintenance. SAN and NFS are pretty simple and if we get NetApp from our hosting provider they manage it for us. Although, they want to charge us $8,000/mo for it (two shelf, 28 450 GB 15k SAS).

As I dive into storage I think I get more confused 🙂 Any advice is greatly appreciated.

When I asked if I could publish the note – which has been edited for clarity and anonymity – I had my own questions:

Why do you think that Gluster, ParaScale & Nexenta will require too much maintenance? Also, when you say SAN, are you referring to Fibre Channel or simply a dedicated Ethernet storage network?

The reply illustrated a facet of the marketing problem that new technologies face: uncertainty.

Not sure really, I just have not had experience with any of those solutions yet. Nexenta looks pretty impressive. I’ve also heard some great results from DRBD.

We have Fiber Channel with HBA cards. It’s still shared storage, but really fast.

BTW, DRBD is the name of an open-source software product:

DRBD® refers to block devices designed as a building block to form high availability (HA) clusters. This is done by mirroring a whole block device via an assigned network. DRBD can be understood as network based raid-1.

The StorageMojo take
My first thought is that anyone who manages a technical hosted service that costs several $K per month should be able to manage a fairly modest scale-out cluster whose capital cost may be only 2-3 months of rental. And 28 15k drives seems like overkill on both the IOPS and the capacity.

But I don’t know much about version control I/O profiles. Maybe the problem is harder than that.

Readers, what say you?

Courteous comments welcome, of course.