A reader writes:
I found your blog after searching for storage alternatives. I have to say, its really impressive and has helped me a lot so far. I was wondering if you could offer some advice.
We run an online version control service. Currently we are hosted on a VMware environment using FC SAN (SAS and SATA).
We’re growing into the 3 TB+ range and looking for alternatives, since we’re paying $2.50/GB for FC SAN (crazy). We looked at NetApp, but with all the stuff going on these days I have to think there is something less expensive and more creative.
Basically, our needs are:
- Fast read and write performance (500+ r/w iops – we have over 13,000 commits per day)
- Shared across many machines. We are currently using NFS.
- Something that won’t require a team to manage. Although, we already manage our entire Linux environment.
I noticed a post about Gluster, ParaScale, and Nexenta. They look promising, but my fear is that they will require too much maintenance. SAN and NFS are pretty simple and if we get NetApp from our hosting provider they manage it for us. Although, they want to charge us $8,000/mo for it (two shelf, 28 450 GB 15k SAS).
As I dive into storage I think I get more confused 🙂 Any advice is greatly appreciated.
When I asked if I could publish the note – which has been edited for clarity and anonymity – I had my own questions:
Why do you think that Gluster, ParaScale & Nexenta will require too much maintenance? Also, when you say SAN, are you referring to Fibre Channel or simply a dedicated Ethernet storage network?
The reply illustrated a facet of the marketing problem that new technologies face: uncertainty.
Not sure really, I just have not had experience with any of those solutions yet. Nexenta looks pretty impressive. I’ve also heard some great results from DRBD.
We have Fiber Channel with HBA cards. It’s still shared storage, but really fast.
BTW, DRBD is the name of an open-source software product:
DRBD® refers to block devices designed as a building block to form high availability (HA) clusters. This is done by mirroring a whole block device via an assigned network. DRBD can be understood as network based raid-1.
The StorageMojo take
My first thought is that anyone who manages a technical hosted service that costs several $K per month should be able to manage a fairly modest scale-out cluster whose capital cost may be only 2-3 months of rental. And 28 15k drives seems like overkill on both the IOPS and the capacity.
But I don’t know much about version control I/O profiles. Maybe the problem is harder than that.
Readers, what say you?
Courteous comments welcome, of course.
I’d probably go for this type of setup:
With good commodity hardware. This is a GlusterFS based solution. Of course, you’ll want to plan a larger cluster probably but can POC quite easily in your VMWare environment.
DR:BD is great, but it’s probably not the right choice for this application.
I really like Nexenta too and the underlying ZFS capabilities would be very useful. But, this would be a significant shift relative to what you are used too. But, it’s a great setup. ZFS is very powerful indeed.
Of course, the network becomes very important in any distributed storage setup like this so you’ll want to think about that but there are a number of possible solutions to that as well depending on what version of VMWare you are using and what your budget is for getting where you want to go.
Source Code management tends to do TONS of small file IO. Think changes to 100’s of little text files. So you need lots of IOps to keep up.
I’d look at the Sun 7410/7310. For just simple NFS it will give you the performance of the NetApp at less cost.
Being a version control service, I can only imagine that the duplication of data throughout the storage would be very high! Having said that, I think NetApp would be great for your de-duplication needs. It could save you more than you think in terms of the capacity needed to be installed. What you think would occupy 3 TB might only need 1 TB with NetApp de-duplication.
First, constraints: I suppose that you have many disjoint projects so you could partition storage without any problem, your users can tolerate some kind of disconnect as most protocols are TCP based and if any server goes down they will have to reconnect anyway (so we get 99’9% ish availability)
Next, possible answer: I would stick with out of the box storage (SAN/NAS) to allow some pretty nice optimizations like thin provisioning or dedup (the most common libraries will be only stored 1 time, and also you can be generous allocating space to projects/servers and only use as needed). So some solution could be getting rid of “expensive” FC and using iSCSI with SATA based Equallogic arrays that could get 3000 iops with small files and as they have a write-back cache by default, 1 or 2 ms. of latency writing or if you are scared of iSCSI, IBM has also the XIV that could provide many more iops at a higher cost. Then I will assign various volumes to the servers (2 or 3 “small” volumes to have quicker fsck) and then create a cluster of every 2 or 3 servers, so when any server goes down, the others get their share of volumes. If you need more resiliency you could replicate via SAN or configure some kind of synchronization at server level (DRDB or plain rsync)
I wouldn’t recommend some kind of cluster FS like GFS/OCFS as they have some problems with large directories, small files and high number of writes (as every write gets their lock that has to be honored for all the servers in the cluster) and more cloudy systems work at object level with very big namespaces and optimized to medium to large files, so to store a 2kB file you get maybe a 1MB used
Hope this helps
I’d recommend a smaller FAS2020 SATA based NetApp Solution.
You will easily reach 500 i/o per second. Get an onsite service contract with it and you are still below what you would pay for two months.
Maintainance will most likely require a minimum amount of time.
Combined with a remote hands service from your hosting provider you should be good to go.
P.s I’m not a NetApp employee, though my company does sell NetApp
systems (as well as many other vendors)
Looks to me like a massively overquoted Netapp system – 500R/W IOPs is an almost trivial load for a storage system. There are unfortunately not recent benchmarks for Netapp 2xx series – see http://www.spec.org/sfs2008/results/sfs2008nfs.html for 3xxx series where they start in the 40,000 IOPs range.
A FAS2050 with 20x300GB gives 20,000 IOPS see http://www.spec.org/sfs97r1/results/res2007q3/sfs97r1-20070827-00293.html
Your performance need is low so very few disks needed. The problem is how to configure – You need a sensible raidset – say 8-14 data plus 2 parity for Raid-DP (raid6)
(8+2) 1TB Sata will provide approx 6TB space – plenty to grow in – only issue with the SATA is the latency is higher than FC/SAS so you may perfer the faster disks anyway.
using a standard NetApp disk shelf of 14 gives a raid array of (12+2) which will provide over 3TB of usable space using 450GB disks. If you examine your environment and look at the use of dedupe for your data (and expecially the VMWare) this may be the most appropriate size.
If your provider is simply adding disk for hosting you the above is probably appproriate, else buying a FAS2040 should be appropriate (but be careful about adding in all the parity and hot spare disks which do not count towards your space utilization) so it is often better to use more small disks (blowing 1TB sata drives for spares and parity adds up very quickly)
Disclaimer – I do not work for NetApp – and please understand the IOPs in the Specbench may not compare in any way with the IOPS you are measuring or expecting.
This sounds like a good case for storage with automated tiering. With most modern SCMs, they “never forget” a revision. So a lot of the data is very cold (old revisions). There is some complication with files that have not changed much, and deltas need to be retrieved based on the original.
Still, with the IOPS mentioned, a bunch of SATA drives seems to be all that is required. HTTP caching should also be investigated (depending on the SCM and its behavior over HTTP).
Of course the exact SCM involved matters: Git/Bazaar with their DAG storage have very different IO patterns than “per file” sorts of systems like CVS, Subversion, and even Mercurial.
Our bias up front: We design/build/sell hardware and one (and hopefully two) of the software packages mentioned here.
From what I can see, the real issues are a) reliable storage, b) ability to handle 500+ r/w IOPs, c) 3+TB.
None of this is hard/expensive with our hardware, and it is certainly not hard/expensive with GlusterFS.
For the sizes you are looking at, yes, $8000 USD/month doesn’t make much sense. In all likelihood, you could completely solve your capacity, performance and file systems problem for well under 4 months cost at those rates, and quite likely well under 3 months cost. Depends upon what you want to do for backup.
Parascale does look like it would work in this scenario. Nexenta is more on the storage management side … it could present out CIFS/NFS/… to your client machines.
Nothing wrong with this, probably you want your version control to have local and central repositories … git/mercurial both allow for this. If you are using SVN or CVS, there is less that can be done, SVN is centralized no matter what, and CVS is less flexible than SVN.
We use mercurial in house, along with a little git, and use tcp clients to our storage server. Our storage server supplies NFS, CIFS, and iSCSI targets to our users.
We’d be happy to talk more about this if you need. This is a very solvable scenario, one we are covering with application loads on our siCluster offering.
IMO there isn’t much creative stuff being done at that low amount of capacity or IOPS. $2.50/GB is a pretty good price for a FC SAN depending on what SAN it is.
With only 500 IOPS I would say that they may good candidates for this “cloud” thing I keep hearing about. Check out Terremark (never used them, impressive service though). http://vcloudexpress.terremark.com/
$0.25/mo/GB of space which would be $9200/yr for 3TB? I’m horrible at math. But that gives you top tier storage, stuff that has an entry level cost of over $300k typically. Another cloud provider we are looking at charges in the range of $40k/yr for 3TB of space.
I much prefer to own my own stuff then send it outside, but it depends on the size and what growth might be. If you think you’ll be fairly static and are only using a few TB and a tiny amount of IOPS then sending it to the cloud may make sense as you can likely get better service then buying cheap crap hardware and doing it yourself.
However if you do plan to grow larger, then it probably makes sense to invest in a better system and host on site, something that can scale incrementally. Myself I am partial to 3PAR of course, but Compellent makes good mid range boxes. Xiotech has a very interesting design as well. Equallogic may even be a good fit for a *low* end system. And likely any netapp your looking for in your price range is likely to be bottom of the barrel. With most systems there is a decent up front investment in the controllers but the disks and stuff are cheaper as you add them, just make sure the system can scale out dynamically, as in re-stripe existing data online, not many systems can, and even fewer do it well.
A local vmware consulting shop here loves to pitch netapp to people because of the VMware features though they lean towards Equallogic for the lower end stuff because NetApp is much more $$(when you start bolting on the features anyways).
I wouldn’t touch DRBD with a 5 mile pole myself, just a matter of personal preference though, the whole idea behind DRBD scares me.
If it were me and I was really trying to be cheap I would just use local storage, forget SAN, forget NAS. Use local storage and build your environment to be redundant enough(e.g. at least two of everything) so if a box goes down you don’t lose much. But this depends on what kind of applications your running. I have a couple dozen VM boxes in the field that rely exclusively on local storage (vSphere), if a box dies no big deal.
I don’t think IOPs will be a concern as long as the solution isn’t based on a handful of TB Sata drives. Without any requirements for availability it is a bit hard say if a Brand X server with a bunch of internal disks running NFS would be sufficient, or if a commercial array with dual controllers, etc would be the best fit. Dell / Equallogic have some really easy to manage iSCSI based arrays that are pretty reasonably priced and offer full redundancy and seamless expansion. Just expose the iSCSI network and Luns to your NFS server.
NetApp gives you a combination of software and hardware in a box.
Has its advantages & disadvantages.
If you wanna go the open system way, you have to decide on hardware (JBODs) & file-system that will run on top of this JBOD.’
Hardware wise I would say go with something like the g-speed (g-technology.com) or a dothill (dothill.com).
With this capacity & IOps performance, you obviously need an entry level small/medium business kind of storage.
You likely want functionality (snapshots, snmp, replication, etc..)
500 IOps is no brainer, and 28 SAS disks is a real overkill. 10 will be more than enough.
As for file system, If you are looking at creating a dynamic scale-out file system than some file system with the functionality you need & scale-out abilities can be nice.
If you want NAS access (CIFS/NFS) there are several options like Nexenta (commercial) or open source (openfiler.com, OpenNAS, FreeNas) that you would need to administrate & test by yourself.
Either-way, think about future growth, think Open, cause you don’t wanna be tied to one vendor & think dynamic scale-out as you grow.
Decide on the Value you need & than look for the right price.
With such a small amount of data, $/GB may not be the appropriate metric; perhaps the reader should look at $/IOPS. I also question using a cluster when a single controller could easily satisfy the requirements. Heck, if there is locality you might be able to fit this in four hard disks and three SSDs…
ParaScale is really simple to manage. They offer a free 4 TB download that can be installed on your own selected commodity hardware running Linux. It can be installed and configured in a morning and does not need much maintenance.
Also once you have a cloud running you can scale it online without any service disruption. They have Thin Provisioning so you can provision a multi TB file system, and throw in additional storage only when needed.
$8,000/mo over the lifetime of the data can really add up. However, if you repurpose some supermicro or dell boxes, or purchase some new ones, you can build a storage cloud for much less.
The download is available at http://www.parascale.com
Hosted version control is an embarrassingly horizontal problem. Assuming that no one repository is represents a major portion of their load, simply partitioning the repositories horizontally across machines should allow them to use stupid-simple commodity storage in each node.
For a great perspective on the practical issues in scaling an entire large web + version-control service, cf. http://github.com/blog/530-how-we-made-github-fast
Ah, prices from an unnamed major North American managed hosting company who is based in Texas. I know them well.
At the type of scale that your reader describes, there are a variety of viable options, particularly with the fairly low 500 IOPs numbers stated. That could very easily be handled by a Dell MD3000 with 15 x 600GB 15k RPM drives connected to a pair of front-end hosts to act as redundant NFS heads connected via SAS to redundant controllers. At RAID10, you’d wind up in the neighborhood of 4TB of usable space and at least a couple of thousand IOPs. Admittedly, that’s a roll your own solution that does have a little bit of complexity involved in the clustering of the NFS heads.
The next step from there would be something like a smaller Isilon cluster. You get NFS out of the box and scale-out capability for both IOPs and storage. Isilon has gotten much more aggressive on pricing in recent months as well, so the price point can be very competitive.
That’s my $0.02 anyway.
Thanks so much for all of the detailed responses. I sent the question in to Robin late at night after researching some options. Let me clarify some of the items:
* The VCS is SVN
* The NetApp was a 2040
* 500 iops is the usual, but we experience peaks during a given period of 800 for writes and 500 for reads
* 3T is our current usage, but the NetApp would give us around 5T usable.
* We did some tests with dedupe on netapp. Interesting thing is that vmdk was about 50% savings, but directly on disk it was almost no savings.
* We used to use GFS, which caused all sorts of problems.
One advantage we have is sharding. Our app is already setup for this, so we are considering a completely “shared nothing” environment with local disks. Getting away from a shared file system would be ideal. While NFS works well, it will eventually be a bottleneck (I think).
If we go with local disks, we just need to figure out the failover/replication scenario. Any suggestions there would be great. So far ZFS/Nextenta looks great for its flexible management and snapshots.
Btw, I am not an architect or engineer, just trying to personally explore high level options for the biz moving forward.
Hmn, that’s REALLY not that many IOPS. I would look more closely at the Sun offerings — the Unified Storage Systems mentioned above. Google around a bit for ‘fishworks’ — you’ll get an ear/eye/mind-full. The NetApp system you have is massive overkill for what you’re doing with it.
for failover and replication, DRBD is used at many high-profile sites with great success. I can definitely recommend using DRBD for replication. Used together with heartbeat is DRBD a great foundation for having highly-available shared-nothing block device.
Regarding the hardware used: we are about to launch a storage product especially for *aaS/cloud-providers.
If you would like to learn more about it, please contact me for more: ms [at] mystoragepod.com
What is the ratio of the ‘dead’ files (files that are not frequently used) versus the ‘live’ files (files that are often requested)? The SUN 7000 series have SSD’s that are used as read cache, and write SSD’s to take all IOPS as buffer in front of the SATA disks. If only 500G of your data is used frequently, you will be very happy with a SUN 7000 system with 500G SSD read cache, and some SSD’s as write cache to take the IOPS. The SUN 7000 has built in dedup as well. We use some 7410 clusters (which are overkill for your requirements), and the NFS performance is quite impressive for the price. We’ve tested with Netapp but their best price was still far more expensive.
Chris, instead of DRBD I would suggest two servers with a dual-ported SAS JBOD for HA.
It sounds to me like you need to forget your storage subsystem and switch to a better VCS. Instead of trying to use modern storage to host the crusty old Subversion, why not switch to a modern, distributed, and therefore self-parallelizing, self-backing-up VCS like Mercurial, Git, or Bzr?
how about getting a drobo?
There’s more than “a little bit” of complexity involved in clustering your NAS heads if you want them to present a single namespace. Fortunately, at 500 or even 800 IOPS you can serve that very easily with a single pair of servers in an active/standby arrangement. For metadata-heavy workloads *any* cluster/parallel filesystem (I’ve worked on several) will struggle to keep up with a single-active-server setup for workloads where the latter suffices. If it were me, I’d just go buy a couple of Joe L’s boxes and be done with it. 😉
Thanks everyone for such great advice and feedback. There are a lot of options and it helps a lot. Now we have to choose a good path 🙂
Once we figure it out I will post with an update.
First time I have made a comment. This is a great blog and love what you write. I work as a senior sales engineer for Rackspace (NYSE:RAX). These comments are my own. It sounds like Chris might be a customer. I would be happy to work with him directly if he wants and you can share my email with him.
comment for the post, feel free to edit for length-
there are really two ways to this answer this. You see that kind of through the other comments. there is the hardware based approach (netapp) or the web scale approach (the github example is good). Notice the difference in level of expertise required to admin the solution. there are tradeoffs along a couple of different axis. cost vs complexity. complexity and ease of use.
The iops here are really small, so the cheapest option will be to use commodity server running NFS with a bunch of TB sized drives, then shard the users so that X% go to one server after another. I assume the service has enough users that this can work.