ZFS Performance Versus Hardware RAID

by Robin Harris | Tuesday, August 15, 2006 | Enterprise, Future Tech, SAN, FC | 16 comments

Over at Home Â» OpenSolaris Forums Â» zfs Â» discuss Robert Milkowski has posted some promising test results.

Hard vs. Soft
Possibly the longest running battle in RAID circles is which is faster, hardware RAID or software RAID. Before RAID was RAID, software disk mirroring (RAID 1) was a huge profit generator for system vendors, who sold it as an add-on to their operating systems. With the advent of hardware RAID systems the battle was joined until the hardware array emerged victorious. Software RAID has been relegated to low-end, low-cost applications where folks didn’t want to spend even a few hundred dollars for a PCI RAID controller.

It’s All Software RAID
Yet the fact is that it is all software RAID – it is just a question of where the software runs. Throw a lot of hardware (and cash) at a problem and even dodgy code runs acceptably. Yet the investment that requires is also the Achilles heel of hardware RAID: once you get everything working right on a specific platform you want to just keep selling it, even as the hardware becomes technologically obsolete. It no accident that EMC’s capacity-based pricing tiers made it uneconomic to fully expand a Symmetrix. The platform would max out well before capacity limits were reached because it was running on microprocessors that might be five years old.

Let The Battle Begin, Again!
So I’m excited to see the battle joined again. Server processors usually advance much faster than the add-on co-processors – with the major exception of graphics processors where gamer demand has driven incredible progress – so host-based RAID has a lot of built-in hardware investment behind it. ZFS offers a fundamentally re-architected RAID that is designed to overcome the traditional limitations of host-based RAID – which lacks non-volatile cache – by smart engineering.

So Does It Work, Already?
Short answer: yes. It is still early, both in ZFS development and in testing, but some highly suggestive numbers have been published here and here.

Robert tested against a modern, modular storage array, the Sun StorageTek3510 FC Array, which offers a gigabyte of cache and 2Gb FC. Not an HDS Tagma, but I’d guess that in performance it is pretty close and that it is mostly the scalability of the larger, enterprise systems, it lacks.

Results:

With Hardware RAID
Robert ran these tests on a Sun Fire V440 Server. He first ran the filebench and varmail tests using ZFS on the hardware RAID LUNs the 3510 provides, and ran each test twice:

IO Summary: 499078 ops 8248.0 ops/s, 40.6mb/s, 6.0ms latency
IO Summary: 503112 ops 8320.2 ops/s, 41.0mb/s, 5.9ms latency

Then he ran the same tests using the 3510’s as Just a Bunch of Disks (JBODs) and got these results:

IO Summary: 558331 ops 9244.1 ops/s, 45.2mb/s, 5.2ms latency
IO Summary: 537542 ops 8899.9 ops/s, 43.5mb/s, 5.4ms latency

Net Net
A strong showing by ZFS: ~10% more IOPS; ~10% lower latency; ~10% more bandwidth. Equivalent performance at a much lower cost. Promising news for ZFS adopters and those of us cheering from the sidelines.

16 Comments

Mark on Tuesday, 15 August, 2006 at 11:40 am

“but Iâ€™d guess that in performance it is pretty close and that it is mostly the scalability of the larger, enterprise systems, it lacks.”

Really? Show your work.
Robin Harris on Tuesday, 15 August, 2006 at 1:21 pm

Mark, good question!

Roughly:
-Latency is usually lower with a smaller array, since you don’t have millions of lines of code and multiple switch operations to traverse
-IOPS scale for large arrays mostly as a function of parallelism – more I/O ports, more I/O processors, more cache, more interconnect bandwidth, more spindles – not because each individual I/O unit is blindingly fast
-there are only a few vendors of most of these components, so the big arrays are built out of commodity parts. Architecture and firmware are the major differentiators. So, for example, cache access times are fairly constant unless using expensive static RAM. FC chips come from what, two vendors? Microcontrollers from four? Disks from three? What do you expect?

Of course the price-point engineered stuff will be slow. But I bet there is little difference in per-port performance between an enterprise modular array and the big iron Sym’s and Tagmas.

I’ve never seen a direct comparison of single FC port performance across big iron and modular arrays, which also suggests that it isn’t all that different. If you have data that suggests otherwise I encourage you to post it. I’d love to be proven wrong.
David Worrall on Wednesday, 16 August, 2006 at 2:58 am

I’m very interested in these figures. I realise that this is not comparing like for like as such, but I’ve seen other benchmarks on the net showing performance figures for 3ware hardware RAID controllers giving local data performances of 220 mb/s which appears much quicker than the above figures for ZFS.

Can ZFS replicate such performance using just software? The 3ware hardware controller is making parallel read/writes to multiple RAID 5 discs simultaneously. Something you can not do with most disc controllers when accessing them via the standard OS using ZFS.

I’m wondering if I’m missing something? Does this assume multiple disc arrays?

Thanks.
Mark on Wednesday, 16 August, 2006 at 9:21 am

I’ll second David’s observation. The throughput numbers are low indicating Robert encountered a bottleneck other than the RAID implementations, SW or HW. 8K to 9K IOPs and 40 MB/s is roughly equal to or less than one disk drive.

You also need to add specifics about the storage configurations before the data can be judged. How many disk per RAID volume, RAID level, stripe size, etc.
Robin Harris on Wednesday, 16 August, 2006 at 9:31 am

David,

To paraphrase, there are lies, damn lies and storage performance numbers – I hesitate to call them statistics for fear of giving statistics an even worse reputation than they’ve already got.

I didn’t dig into the benchmarks Robert used for his test to see what the mix of I/O sizes are. The typical strategy for describing array performance is to run a test that will give the absolute best possible number for the attribute one is measuring.

For bandwidth that means reading and writing really big files – which is fine if you are doing video production or 3D seismic analysis – and totally irrelevant for almost all common workloads. For IOPS numbers that means the smallest possible I/Os as fast as possible – which usually means everything is sitting in cache. While that is nice when it happens, that is also an unlikely event in the real world.

So other than storage marketing people being lying scum, what is the point of benchmarks that only reflect un-real-world performance? Consider all storage benchmarks as simply telling you what the absolute maximum you could ever see in that metric – the vendor’s guaranteed absolutely “will never exceed” number. If you have good reason to believe you’ll need more than that then be afraid – be very afraid.

In the real world, with a mix of I/O sizes and rates, you’d be shocked at what “performance” looks like. Running 2k I/Os on the biggest Sym or Tagma you can imagine – ’cause you certainly can’t afford it! – on dozens of servers across multi-dozen FC’s and I suspect you’d see, maybe, with luck, 100MB/sec of bandwidth. That 3ware controller would probably do single digits. No bad guys here, this is just the nature of the storage I/O problem.

To me, the point of Robert’s benchmarks is not the absolute numbers, which are respectable for either case, but that the money spent for the RAID controllers bought nothing. I’d argue that even if the software were 20% slower, you’d still want to lose the hardware RAID and its associated bugs, power consumption, cost and maintenance.
David Worrall on Thursday, 17 August, 2006 at 11:28 am

Ah. OK. Guess the only way to nail this down is to try it out for myself. I guess thats the only way I’ll get a more realistic view of performance. Going with a better software architecture such as ZFS is much more attractive than a hardware RAID solution. Well for me anyway. But, performance is a key issue as well as I’m having to deal with large video streams. Thanks for the feedback.
Chris Samuel on Saturday, 20 January, 2007 at 6:06 am

I’d really be interested to see what figures were got using Bonnie++ which does a series of I/O tests and dynamically sizes the files it creates based on the systems memory to try and avoid the OS caches distorting the results.
Anthony Cull on Thursday, 3 May, 2007 at 2:54 pm

Well why not just stick to IDE then?
The reason ZFS has slow performance is because of the fsync() and it is a safty feature since ZFS thinks you will have a power outage. Personaly I would like to see per disk ZFS fsync() options or a setting to fsync() every so seconds. I have 1 sata and 2 IDE on ZFS and I see performance of around 34-6MB/s in RAIDz obviously the slower disks drag down performance numbers. zpool iostat -v will give a good indication of how ZFS performance is going.

If i had a 2 raid controllers I would put Raid0 on and mirror the stripes giving best performance and a Raid10 setup although this does have the problem of one disk failure breaking the raid just fix it and bring it back online. Different disks require different settings to get best performance/reliability ZFS normaly for me has around 128k stripe in RAIDz.
Qlus on Sunday, 6 May, 2007 at 12:55 am

Hardware vendors do skew results towards their hardware when possible (I know one vendor in particular that can hit 45000iops but only if you benchmark the cache and load never hits the drives), and I have very high hopes for ZFS, but the performance number provided seem awefully anemic, even for 6 drives.

I have been looking for other filebench results to make a comparison but I am having a hard time finding sample benchmarks of other vendors using filebench and varimail, online. Does anyone have numbers available for hardware in the same league? I am not asking for numbers on a sym 6 but maybe a cx500, ps100e, a LH setup with 3 NSMs and maybe an x4500 with 12-14 drives so we have some idea of how zraid compares with hardware solutions and scales in comparison to the 6 drive benchmarks.
Tim Priest on Thursday, 21 June, 2007 at 2:48 pm

Could someone tell me what CPU usage ZFS is demanding to get this performance. While, don’t get me wrong, I think ZFS is very cool, I have yet to see numbers published for the amount of CPU the filesystem will suck! And in what way (bursty, sustained etc.) The simple fact is, that like Software RAID, which ZFS is really just a way more advanced implementation of, ZFS is gonna need processing power from somewhere, just as this article points out at the beginning.

So, can someone give me some numbers?
Anonymous on Wednesday, 21 January, 2009 at 9:47 am

This is me, but has anyone looked at the numbers without comparing them? Random I/O performance is very low considering this is a 12-drive RAID-10 array hooked up single channel on 2GBit Fibre…The performance is less than that of the boot drive…
Wichita Data Centers on Sunday, 16 August, 2009 at 9:40 am

I will be doing a thorough performance testing of ZFS as an iSCSI target and with some direct benchmarks with 12 1.5TB SATA drives and a pair Intel SSD. I will compare RAIDz vs raid5 in several configurations.

http://communities.vmware.com/blogs/WichitaDataCenters/2009/08/10/san-on-the-cheap
Paul Thomson on Monday, 28 December, 2009 at 9:39 am

I have performed a comprehensive set of tests on a 6 disk RAID0 (mirror) ZFS pool. The test was run on a T5240 using the tool ‘VDbench’ and a complex patern (60% read/40% write & 20% sequential/80% random & 1 to 64 threads). Results were impressive…

Max throughput of 140.8MB/sec (with 65k blocks)
Max IO rate of 4743.6IOPS (with 8k blocks)

The T5240 does have a Hardware RAID controller on-board but as yet I have not been able to test this 🙁
Angel Genchev on Saturday, 27 November, 2010 at 7:21 am

Hi, guys, you`re testing but what ?
Since all we know that the IO subsystem is to serve the applications, I think that it`s not very wise to benchmark the IO subsystem for itself. For me, the most realistic tests are to benchmark tuned for the system resources application which performance is limited by all related resources of the hardware (both CPU(threads), RAM, and disk IO) like database with typical usage case. In other words: we do not expect as final result IOPs from the storage, we expect to dump/recompress/mux our video faster, to query/update/populate our database faster, to serve more users for the same time. E.t. we expect app./user ops from the whole computer system.
Angel Genchev on Saturday, 27 November, 2010 at 7:34 am

Something I forgot to mention: When you remove a hardware RAID controller with say 512MB ram to test the disks connected elsewhere as jBODS, you lower the cache memory available to the system, so you should add 512M RAM if the test has to be “fair”.
Todd Johnston on Wednesday, 4 January, 2012 at 6:30 am

ZFS is a threat today 1/2012. The appliance scale and maturity of Solaris/ZFS 11 has proven effective and viable for Global Namespace, large file systems w/ IB, 10Gb, FC and un-matched RAM, Flash, Storage capacity, performance.

Lastly – The blog is dated – I wanted to encourage a revisit of the web interface as ZFS appliance and/or Solaris running atop of SAN backend (pick your favorite) Will surely advance the game by sheer integration with Oracle Database, HCC etc. and it’s muscle.

Food for thought-
TAJ