Comments on: De-duplicating primary storage

By: Storage Nuts & Bolts

Storage Nuts & Bolts — Mon, 11 Jul 2011 23:01:10 +0000

Shooting Fish in a Barrel…

According to a figure posted in September by Robin Harris over at StorageMojo , NetApp has 13,000 systems running deduplication. That’s an impressive number especially for a feature that was officially announced in May 2007 . In fact, the NetApp……

By: HPCC - HPCC - DELL COMMUNITY

HPCC - HPCC - DELL COMMUNITY — Wed, 15 Apr 2009 13:31:34 +0000

[…] after a small window of time, data is rarely reopened or touched. Robin Harris then wrote another blog that talked about some implications for this study on de-duplication. In particular, he made a […]

By: dirkmeister.de » Blog Archive » Deduplication as Primary Data Storage

dirkmeister.de » Blog Archive » Deduplication as Primary Data Storage — Wed, 05 Nov 2008 08:29:48 +0000

[…] Blog StorageMojo schreibt in einem Artikel: So what percentage de-dup compression of unstructured data is feasible? That is the key to […]

By: Jeremy

Jeremy — Mon, 27 Oct 2008 17:21:34 +0000

I was involved in a project evaluating dedupe for backup but we ended up moving in the direction of DataDomain’s inline deduplication. In a proof of concept using DataDomain and we were able to get their advertised 1TB/hr rate. We experimented with direct database backups even though DataDomain usually seems to target VTL solutions. We chatted about deduped primary storage but I haven’t personally been involved in any projects yet to actually try it. And NetApp probably has a better proposition for that; I’m just guessing but inline dedupe is probably too computationally expensive at the moment to be feasible.

By: Joe Kraska

Joe Kraska — Sun, 12 Oct 2008 02:21:26 +0000

The guarantee is mostly there to provide comfort to buyers. Most of our virtual machine volumes are at or near 80% recoup rates from NetApp’s dedup.

Joe.

By: NetApp’s 50% Guarantee : techmute.com

NetApp’s 50% Guarantee : techmute.com — Mon, 06 Oct 2008 03:38:00 +0000

[…] Robin Harris (Independent Analyst):Â Robin didn’t discuss the guarantee, other than use it as a jumping-off point for primary storage de-dup.Â “If the feature is free, de-duping some primary storage will be standard practice in most data centers within 5 years. As the de-dup technology improves and Mooreâ€™s Law drives performance, more and more unstructured data will be de-dupâ€™d as a matter of course.” […]

By: Joe Kraska

Joe Kraska — Sun, 05 Oct 2008 14:48:04 +0000

We have NetApp systems running dedup on primary storage in our environment. This doesn’t slow things down in any appreciable manner at all. I believe NetApp is saying that the 7.2.4 release will contain changes to facilitate dup’s and cache hits, which could very well end up providing performance *increases* in a highly duplicative environment, as with VMWare.

I only wish I’d known about the 2TB limit long ago. We have some >2TB volumes, and migrating off of them would be… painful.

Joe Kraska

By: Ausmith1

Ausmith1 — Fri, 03 Oct 2008 00:18:38 +0000

Here is the sanitized output from ‘df -s -h’ on one of our filers, it houses about 250 Windows based ESX development VMs on VMFS volumes.
Filesystem used saved %saved
/vol/vol0/ 648MB 0MB 0%
/vol/vol1/ 731GB 1230GB 63%
/vol/vol2/ 356GB 299GB 46%
/vol/vol3/ 9639MB 10GB 53%
/vol/vol4/ 108GB 1302GB 92%
/vol/vol5/ 158GB 500GB 76%
/vol/vol6/ 176GB 903GB 84%
/vol/vol7/ 186GB 290GB 61%
/vol/vol8/ 148GB 36GB 20%
/vol/vol9/ 71GB 53GB 43%
/vol/vola/ 150GB 236GB 61%
/vol/volb/ 268GB 397GB 60%
/vol/volc/ 146GB 42GB 22%

That makes 2.5TB of disk space used and 5.3TB saved by my count.

There are some volumes that ASIS is not enabled on, therefore I have not included them in this output. The only reason that ASIS is not enabled on them is that they are large (>2TB) volumes created before ASIS was freely available. Enabling ASIS on a volume is dependent on the size of the volume relative to the RAM available in the filer. i.e. the largest volume this particular filer can handle is 2TB. A 6000 series filer can handle 16TB ASIS volumes.

By: Are you Content Aware? « Storage Optimization

Are you Content Aware? « Storage Optimization — Thu, 02 Oct 2008 18:10:43 +0000

[…] October 2, 2008 Tags: NetApp, Robin Harris, StorageMojo Storage analyst Robin Harris commentedÂ on the storage story of the week–NetApp’s Guarantee that virtualization will mean a 50% gain in storage capacity for its […]

By: max

max — Wed, 01 Oct 2008 22:30:45 +0000

FWIW

Have ASIS running w/ ESX on a (primary storage w/ ASIS) In my experience, the 50% is a very low bar for netapp with this setup in a hosted ESX environment (~400 VMs.)

By: open systems storage guy

open systems storage guy — Wed, 01 Oct 2008 20:10:19 +0000

I’ve used it- it’s not for all workloads, but it’s a nice feature for low use file systems and whatnot. I wouldn’t suggest it on anything that really hits the controllers heavily because every time a write is done, a process running on the filer hashes the data, which creates something like a 5% processor overhead. During idle times, it goes up considerably as the algorithm will do a byte to byte comparison of all suspected duplicate data chunks before pointing both sections of volume to the same chunk.

Netapp filers use the overhead everyone’s been complaining about to save space in the end. If you have to clone databases, can thin provision, take snapshots, and have heavily duplicated files, you’ll probably end up with more data stuffed into your filer than you could get in an equivalent traditional disk box. If you don’t, however, then you’ll need more disks in your filer than you would otherwise.

By: Cinetica Blog » Deduplication sullo storage primario

Cinetica Blog » Deduplication sullo storage primario — Wed, 01 Oct 2008 12:36:25 +0000

[…] ho conferma, per l’ennesima volta, anche da un post su storagemojo che ho letto […]

By: Steven Schwartz

Steven Schwartz — Tue, 30 Sep 2008 22:26:14 +0000

Come on Robin, did you read the NetApp release? Everyone has written about it already, they never claim 50% reduction in storage required due to Deduplication, it is claimed on several things…I posted a silly but funny corollary on my blog.

http://thesantechnologist.com/?p=122

By: TylerB

TylerB — Tue, 30 Sep 2008 22:12:20 +0000

Robin-
(disclaimer: I work for an NTAP Partner)
This does work and we have a ton of customers using it. While unstructured data is decent (30% is common), VMware is THE killer app for primary storage dedupe. We have plenty of customer at 70, 80, and even 90% dedupe rates. The beauty of it is since its post process, it has no noticeable effect on the live data.
Basically we’ve either been installing new NetApp arrays or fronting older ones with v-series all over the place.