<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: How Yahoo can beat Google</title>
	<atom:link href="http://storagemojo.com/2007/07/05/how-yahoo-can-beat-google/feed/" rel="self" type="application/rss+xml" />
	<link>http://storagemojo.com/2007/07/05/how-yahoo-can-beat-google/</link>
	<description>Data storage info &#38; analysis</description>
	<pubDate>Fri, 21 Nov 2008 16:17:50 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.3</generator>
		<item>
		<title>By: Larry</title>
		<link>http://storagemojo.com/2007/07/05/how-yahoo-can-beat-google/#comment-136953</link>
		<dc:creator>Larry</dc:creator>
		<pubDate>Thu, 25 Oct 2007 07:51:32 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=488#comment-136953</guid>
		<description>Here is a Idea please look at this 
http://www.computerworld.com/action/article.do?command=viewArticleBasic&#038;articleId=9043942 

For $4,000 or so, I can get eight PS3s that can do the same task that I'd do on a supercomputer


A different approch $ maybe cost affective ?

Larry</description>
		<content:encoded><![CDATA[<p>Here is a Idea please look at this<br />
<a href="http://www.computerworld.com/action/article.do?command=viewArticleBasic&#038;articleId=9043942" rel="nofollow">http://www.computerworld.com/action/article.do?command=viewArticleBasic&#038;articleId=9043942</a> </p>
<p>For $4,000 or so, I can get eight PS3s that can do the same task that I&#8217;d do on a supercomputer</p>
<p>A different approch $ maybe cost affective ?</p>
<p>Larry</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Chris Ball</title>
		<link>http://storagemojo.com/2007/07/05/how-yahoo-can-beat-google/#comment-108631</link>
		<dc:creator>Chris Ball</dc:creator>
		<pubDate>Wed, 22 Aug 2007 15:37:56 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=488#comment-108631</guid>
		<description>I really think that the discussions about the various merits of different hardware miss the real crux of the problem. Google have a clearly defined, burning desire to be and stay number 1. That is the bottom line, whereas Yahoo seams very centred on their own internal politics. Well from my outside view point anyway.
So I think it is a cultural thing rather than a technology thing. Get the culture right and the technology will follow. (Just my thoughts)
http://www.smtnet.co.uk/</description>
		<content:encoded><![CDATA[<p>I really think that the discussions about the various merits of different hardware miss the real crux of the problem. Google have a clearly defined, burning desire to be and stay number 1. That is the bottom line, whereas Yahoo seams very centred on their own internal politics. Well from my outside view point anyway.<br />
So I think it is a cultural thing rather than a technology thing. Get the culture right and the technology will follow. (Just my thoughts)<br />
<a href="http://www.smtnet.co.uk/" rel="nofollow">http://www.smtnet.co.uk/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Russ</title>
		<link>http://storagemojo.com/2007/07/05/how-yahoo-can-beat-google/#comment-103288</link>
		<dc:creator>Russ</dc:creator>
		<pubDate>Fri, 03 Aug 2007 12:39:22 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=488#comment-103288</guid>
		<description>Distributed storage using commodity components (peer-to-peer) architectures have consistently proven to be less expensive than big-iron centralized storage.  There are multiple companies that have taken advantage of this cost differential over the years most notably in the research and academic circles where money is always tight.  While Google has become the most widely known user of grid/distributed storage many companies benefit from the cost savings in hardware and data management every day.</description>
		<content:encoded><![CDATA[<p>Distributed storage using commodity components (peer-to-peer) architectures have consistently proven to be less expensive than big-iron centralized storage.  There are multiple companies that have taken advantage of this cost differential over the years most notably in the research and academic circles where money is always tight.  While Google has become the most widely known user of grid/distributed storage many companies benefit from the cost savings in hardware and data management every day.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: LVE</title>
		<link>http://storagemojo.com/2007/07/05/how-yahoo-can-beat-google/#comment-95768</link>
		<dc:creator>LVE</dc:creator>
		<pubDate>Sat, 14 Jul 2007 08:17:31 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=488#comment-95768</guid>
		<description>Richard, you made 3 important errors when doing the cost comparison between the
raid and google solutions. After fixing these errors, the google solution
appear 20% cheaper instead of 3x more expensive.

1) In the raid solution, you need to buy a total of 480 disks, not 400 as you
assumed in your reasoning, to fill up ten 48-disk chassis.

2) In the google solution, you over estimate the cost per diskless chassis by a
factor of almost 10x ! Today you can buy components to build a dual-core 2.0
GHz 2-GB diskless machine for about $230 from newegg ($40 psu, $60 am2 socket
mobo, $60 athlon 64 x2 3600+ 2.0 GHz, $70 ddr2 ram). Google buy similarly
priced components and strap them with velcro on sheets of insulated material
(they used to use cork sheets, but had to change the material because it turned
out to be a fire hazard, I don't know what they use today). So, let's round
this $230 to $250/chassis. This is much less than the $2k/chassis figure you
mention.

3) You assume google use 1-TB disks. As Robin correctly pointed out, they are
on the contrary buying what is more cost effective. Robin estimates $160-$180
per raw TB, I'll be more conservative and assume $220 per raw TB (500-GB disks
are sold for $110 on newegg). Again, this is much cheaper than your figure of
$500 per TB. To take into account these half smaller disks, you need to double
the number of disks (2400 instead of 1200) and chassis (400 instead of 200).

You found the google solution to be 3x more expensive.
But with these errors now fixed, it is, in fact, 20% cheaper:

raid: 10 chassis x ($20k/chassis) + 480 disks x ($500/disk) = $440,000
google: 400 chassis x ($250/chassis) + 2400 disks x ($110/disk) = $364,000

Additionally there is an even more important reason about why they don't raid
but instead prefer to do 3x replication: raid won't protect you if the whole
server fails. Whereas they can do 3x replication on 3 different servers on
3 different racks and take down a whole rack (for maintenance for example)
whithout impacting the availability of the data.

That said, I agree that the raid solution offers higher densities and is
probably more power efficient, but it just doesn't offer the same level of
reliability than 3x replication...</description>
		<content:encoded><![CDATA[<p>Richard, you made 3 important errors when doing the cost comparison between the<br />
raid and google solutions. After fixing these errors, the google solution<br />
appear 20% cheaper instead of 3x more expensive.</p>
<p>1) In the raid solution, you need to buy a total of 480 disks, not 400 as you<br />
assumed in your reasoning, to fill up ten 48-disk chassis.</p>
<p>2) In the google solution, you over estimate the cost per diskless chassis by a<br />
factor of almost 10x ! Today you can buy components to build a dual-core 2.0<br />
GHz 2-GB diskless machine for about $230 from newegg ($40 psu, $60 am2 socket<br />
mobo, $60 athlon 64 x2 3600+ 2.0 GHz, $70 ddr2 ram). Google buy similarly<br />
priced components and strap them with velcro on sheets of insulated material<br />
(they used to use cork sheets, but had to change the material because it turned<br />
out to be a fire hazard, I don&#8217;t know what they use today). So, let&#8217;s round<br />
this $230 to $250/chassis. This is much less than the $2k/chassis figure you<br />
mention.</p>
<p>3) You assume google use 1-TB disks. As Robin correctly pointed out, they are<br />
on the contrary buying what is more cost effective. Robin estimates $160-$180<br />
per raw TB, I&#8217;ll be more conservative and assume $220 per raw TB (500-GB disks<br />
are sold for $110 on newegg). Again, this is much cheaper than your figure of<br />
$500 per TB. To take into account these half smaller disks, you need to double<br />
the number of disks (2400 instead of 1200) and chassis (400 instead of 200).</p>
<p>You found the google solution to be 3x more expensive.<br />
But with these errors now fixed, it is, in fact, 20% cheaper:</p>
<p>raid: 10 chassis x ($20k/chassis) + 480 disks x ($500/disk) = $440,000<br />
google: 400 chassis x ($250/chassis) + 2400 disks x ($110/disk) = $364,000</p>
<p>Additionally there is an even more important reason about why they don&#8217;t raid<br />
but instead prefer to do 3x replication: raid won&#8217;t protect you if the whole<br />
server fails. Whereas they can do 3x replication on 3 different servers on<br />
3 different racks and take down a whole rack (for maintenance for example)<br />
whithout impacting the availability of the data.</p>
<p>That said, I agree that the raid solution offers higher densities and is<br />
probably more power efficient, but it just doesn&#8217;t offer the same level of<br />
reliability than 3x replication&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Richard</title>
		<link>http://storagemojo.com/2007/07/05/how-yahoo-can-beat-google/#comment-93762</link>
		<dc:creator>Richard</dc:creator>
		<pubDate>Mon, 09 Jul 2007 18:45:18 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=488#comment-93762</guid>
		<description>Robin,
I don’t work for any one in the big (or small) iron camp.

I am not suggesting that anyone can compete with ‘roll your own’ strategy &#38; have agreed  that Yahoo &#38; others must follow this model in order to compete. However, when they do, they should improve their approach. 

GFS environment can run on a much more powerful controller, simultaneously supporting larger, protected disk backends to eliminate waste. All data ends up on disk blocks…somewhere. 

If Google does not have such hardware design capability, then perhaps they should ask one of their early backers to show them how….he is already doing it. 

So all that is left is some spin on ‘commodity’ with velcro, unsubstantiated cost figures and more spin on  ‘how green is my valley’. 

Perhaps Google should ‘vertically integrate’ with a power station building business…. a small clean nuclear type, one per datacenter.    

 As someone said, this is a provocative post…. so lets end this story.</description>
		<content:encoded><![CDATA[<p>Robin,<br />
I don’t work for any one in the big (or small) iron camp.</p>
<p>I am not suggesting that anyone can compete with ‘roll your own’ strategy &amp; have agreed  that Yahoo &amp; others must follow this model in order to compete. However, when they do, they should improve their approach. </p>
<p>GFS environment can run on a much more powerful controller, simultaneously supporting larger, protected disk backends to eliminate waste. All data ends up on disk blocks…somewhere. </p>
<p>If Google does not have such hardware design capability, then perhaps they should ask one of their early backers to show them how….he is already doing it. </p>
<p>So all that is left is some spin on ‘commodity’ with velcro, unsubstantiated cost figures and more spin on  ‘how green is my valley’. </p>
<p>Perhaps Google should ‘vertically integrate’ with a power station building business…. a small clean nuclear type, one per datacenter.    </p>
<p> As someone said, this is a provocative post…. so lets end this story.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Robin Harris</title>
		<link>http://storagemojo.com/2007/07/05/how-yahoo-can-beat-google/#comment-93514</link>
		<dc:creator>Robin Harris</dc:creator>
		<pubDate>Mon, 09 Jul 2007 03:57:43 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=488#comment-93514</guid>
		<description>Richard,

If EMC, NetApp, StorageWorks or anyone else's RAID or filers were actually competitive with what Google - and let's not forget Amazon - have built, don't you think Google would be buying them?

Also, GFS and BigTable work on files and tablets, not blocks. So when a server fails, the replication happens to where ever there is space for it. The data is replicated, not mirrored.

And again, Google is probably paying today about $160-$180 per raw TB. Triple that to even $600 TB including packaging - there is very little - and mobo space. Even Apple's Xserve RAID is double that.

Which is why Google is concerned about power: they've cut the cost out of virtually every other aspect of their operations. I've looked at the power numbers for big iron arrays and if all you think about is drive power, then yes, 3x is worse than RAID 6, but not nearly as much as you'd assume. The real issue is the power hungry controllers and all the network infrastructure, FC or IP, required to make it work. And then you still haven't factored in the servers.

I feel a post coming on Richard. Thanks for writing.

Cheers,

Robin</description>
		<content:encoded><![CDATA[<p>Richard,</p>
<p>If EMC, NetApp, StorageWorks or anyone else&#8217;s RAID or filers were actually competitive with what Google - and let&#8217;s not forget Amazon - have built, don&#8217;t you think Google would be buying them?</p>
<p>Also, GFS and BigTable work on files and tablets, not blocks. So when a server fails, the replication happens to where ever there is space for it. The data is replicated, not mirrored.</p>
<p>And again, Google is probably paying today about $160-$180 per raw TB. Triple that to even $600 TB including packaging - there is very little - and mobo space. Even Apple&#8217;s Xserve RAID is double that.</p>
<p>Which is why Google is concerned about power: they&#8217;ve cut the cost out of virtually every other aspect of their operations. I&#8217;ve looked at the power numbers for big iron arrays and if all you think about is drive power, then yes, 3x is worse than RAID 6, but not nearly as much as you&#8217;d assume. The real issue is the power hungry controllers and all the network infrastructure, FC or IP, required to make it work. And then you still haven&#8217;t factored in the servers.</p>
<p>I feel a post coming on Richard. Thanks for writing.</p>
<p>Cheers,</p>
<p>Robin</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Richard</title>
		<link>http://storagemojo.com/2007/07/05/how-yahoo-can-beat-google/#comment-93509</link>
		<dc:creator>Richard</dc:creator>
		<pubDate>Mon, 09 Jul 2007 03:13:32 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=488#comment-93509</guid>
		<description>Yes, this is all they can do.. instead of sparing with a single disk, they need to 'drop'  6 disks and replicate with another system containing additional six disks. The need to keep many spare systems around…  is extra cost. I hope they power these down, also at extra hardware cost…   intelligent power switch per system. 

This is getting a lot closer to my estimate of  $1K per system or $3K per TB, as suggested earlier…. ignoring power &#38; space issues. 

GFS is a very good vehicle but comes at a price. Internal ‘vertical’ integration is a great concept but not with 'commodity' hardware. 

However, this constant ‘spin’ by Google regarding  ‘commodity’ solutions (which their hardware is not), resulting in low cost per TB and their stated concern for power…. all remain very questionable.</description>
		<content:encoded><![CDATA[<p>Yes, this is all they can do.. instead of sparing with a single disk, they need to &#8216;drop&#8217;  6 disks and replicate with another system containing additional six disks. The need to keep many spare systems around…  is extra cost. I hope they power these down, also at extra hardware cost…   intelligent power switch per system. </p>
<p>This is getting a lot closer to my estimate of  $1K per system or $3K per TB, as suggested earlier…. ignoring power &amp; space issues. </p>
<p>GFS is a very good vehicle but comes at a price. Internal ‘vertical’ integration is a great concept but not with &#8216;commodity&#8217; hardware. </p>
<p>However, this constant ‘spin’ by Google regarding  ‘commodity’ solutions (which their hardware is not), resulting in low cost per TB and their stated concern for power…. all remain very questionable.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Harold</title>
		<link>http://storagemojo.com/2007/07/05/how-yahoo-can-beat-google/#comment-93266</link>
		<dc:creator>Harold</dc:creator>
		<pubDate>Sun, 08 Jul 2007 14:21:29 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=488#comment-93266</guid>
		<description>One thing mentioned in passing that people are not (I think) considering is that Google doesn't do field service on their machines as we normally think of it.

Instead, they use what I like to refer to as the "ignore operation".  If a disk fails, they ignore it at the hardware level (at the filesystem level they of course make a copy of what it was holding).  Same for a system.

Sooner or later, I gather that FS people go through a site and pull broken stuff, but one of their key insights is KSS (Keep It Simple, Stupid).  The less fancy their hardware, the less they directly (hands on) interact with it, the cheaper, since people are expensive.  Note Robin's comments that "Google generates 50-60% more revenue with 4,000 fewer people."

Hmmm, all this predates me hearing about their concerns about power consumption, so maybe they can now remotely power down disks , and systems that suffer complete failure (in fact, trying a power cycle to unwedge a machine seldom hurts :-).

However, I would say that overall they have a VERY keen eye on TOC, and to assume they are doing stupid things here is unwise.

- Harold</description>
		<content:encoded><![CDATA[<p>One thing mentioned in passing that people are not (I think) considering is that Google doesn&#8217;t do field service on their machines as we normally think of it.</p>
<p>Instead, they use what I like to refer to as the &#8220;ignore operation&#8221;.  If a disk fails, they ignore it at the hardware level (at the filesystem level they of course make a copy of what it was holding).  Same for a system.</p>
<p>Sooner or later, I gather that FS people go through a site and pull broken stuff, but one of their key insights is KSS (Keep It Simple, Stupid).  The less fancy their hardware, the less they directly (hands on) interact with it, the cheaper, since people are expensive.  Note Robin&#8217;s comments that &#8220;Google generates 50-60% more revenue with 4,000 fewer people.&#8221;</p>
<p>Hmmm, all this predates me hearing about their concerns about power consumption, so maybe they can now remotely power down disks , and systems that suffer complete failure (in fact, trying a power cycle to unwedge a machine seldom hurts :-).</p>
<p>However, I would say that overall they have a VERY keen eye on TOC, and to assume they are doing stupid things here is unwise.</p>
<p>- Harold</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Richard</title>
		<link>http://storagemojo.com/2007/07/05/how-yahoo-can-beat-google/#comment-93092</link>
		<dc:creator>Richard</dc:creator>
		<pubDate>Sun, 08 Jul 2007 05:06:35 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=488#comment-93092</guid>
		<description>Robin,
You missed one of my key points.

With these new multiple processor cores, Google should be able to run GFS in conjunction with a large number of backend disks on the *same* controller, with E’net front-end. This is nothing new in terms of ‘architecture’….it is multi-core already. There is not much difference in hardware if they do a 'purpose' built controller already.... and they should not call it 'commodity'

They can easily add protection to the backend to eliminate triplication, save 30KW per 400TB of storage in power and greatly reduce the initial cost. Also, with this they will get x 10 datacenter density. 

Imagine the level of saving across the whole infrastructure….on power alone. 

My argument holds with different capacity disks. Disks always get cheaper and all you are trading is the initial cost vs power consumption of more disks and more nodes. 

The problem is that there is no such high performance ‘commodity’ controller on the market today … but it is not a technical problem.

Also, I suggest that Google need to triplicate at the local level has little to do with storage but more with GFS and the need for bandwidth. To get ultimate protection, they still should replicate across different geographical datacenters.</description>
		<content:encoded><![CDATA[<p>Robin,<br />
You missed one of my key points.</p>
<p>With these new multiple processor cores, Google should be able to run GFS in conjunction with a large number of backend disks on the *same* controller, with E’net front-end. This is nothing new in terms of ‘architecture’….it is multi-core already. There is not much difference in hardware if they do a &#8216;purpose&#8217; built controller already&#8230;. and they should not call it &#8216;commodity&#8217;</p>
<p>They can easily add protection to the backend to eliminate triplication, save 30KW per 400TB of storage in power and greatly reduce the initial cost. Also, with this they will get x 10 datacenter density. </p>
<p>Imagine the level of saving across the whole infrastructure….on power alone. </p>
<p>My argument holds with different capacity disks. Disks always get cheaper and all you are trading is the initial cost vs power consumption of more disks and more nodes. </p>
<p>The problem is that there is no such high performance ‘commodity’ controller on the market today … but it is not a technical problem.</p>
<p>Also, I suggest that Google need to triplicate at the local level has little to do with storage but more with GFS and the need for bandwidth. To get ultimate protection, they still should replicate across different geographical datacenters.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Lawrence</title>
		<link>http://storagemojo.com/2007/07/05/how-yahoo-can-beat-google/#comment-93080</link>
		<dc:creator>Lawrence</dc:creator>
		<pubDate>Sun, 08 Jul 2007 04:25:45 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=488#comment-93080</guid>
		<description>Robin,

You made a passing comment about dedupe and backup which I'd like to further address.  NetApp's A-SIS dedupe technology works wonderfully for primary storage outside of any backup context (as well as within backup of course).

We see 20:1 dedupe ratios on master VMware images of all kinds.  Also on structured data sets like Oracle, SQL or (exchange) Jet databases.  Outlook PST files are a perfect example.

If that's any indication, I think Richard's arguments about the savings of advanced RAID arrays could be really powerful in Yahoo's context.  I'd have to imagine their primary Email storage and archived Email storage contains a tremendous amount of data which could be deduped by 80% or more.  If that's the case, Google can't possible have a more efficient storage infrastructure!

/L.</description>
		<content:encoded><![CDATA[<p>Robin,</p>
<p>You made a passing comment about dedupe and backup which I&#8217;d like to further address.  NetApp&#8217;s A-SIS dedupe technology works wonderfully for primary storage outside of any backup context (as well as within backup of course).</p>
<p>We see 20:1 dedupe ratios on master VMware images of all kinds.  Also on structured data sets like Oracle, SQL or (exchange) Jet databases.  Outlook PST files are a perfect example.</p>
<p>If that&#8217;s any indication, I think Richard&#8217;s arguments about the savings of advanced RAID arrays could be really powerful in Yahoo&#8217;s context.  I&#8217;d have to imagine their primary Email storage and archived Email storage contains a tremendous amount of data which could be deduped by 80% or more.  If that&#8217;s the case, Google can&#8217;t possible have a more efficient storage infrastructure!</p>
<p>/L.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Robin Harris</title>
		<link>http://storagemojo.com/2007/07/05/how-yahoo-can-beat-google/#comment-92760</link>
		<dc:creator>Robin Harris</dc:creator>
		<pubDate>Sat, 07 Jul 2007 14:50:03 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=488#comment-92760</guid>
		<description>Richard,

Excellent points.  However, this is comparing apples and oranges. GOOG is deploying systems not "storage" &lt;i&gt;per se&lt;/i&gt;. Further, they configure for lowest cost per unit of goodness, including CPU cycles, cycles per watt, network bandwidth and bytes. rather than highest density. A linear programming problem, rather than optimizing for one or two metrics.

For example, GOOG would not use 1TB drives, which are currently around $0.30/GB, in favor of whatever the lowest cost per GB happens to be. I'd guess in their volumes, possibly with special warranty terms, they'd be getting $0.16-$0.18/GB, less if a vendor is overstocked. 

Also, the mobo's are purpose-designed as well, with unnecessary PCI slots, graphics etc. removed, but unlike any array vendor they use high-volume parts everywhere on high-volume motherboards. They are buying high-volume parts in high-volumes. It doesn't get any cheaper than that. Qual is minimal. They don't care about drive firmware levels. Surface mount SATA connectors eliminate problematic cables. They optimize at the system level, not the server and then the storage and then the network.

Your calculations leave out is the server and networking piece. Sure, you can get a lot of disks into a rack if TB per square meter is the metric. That isn't Google's. Add all the servers and gigE networking you'd need to make a complete cluster solution, plus the low-volume RAID controllers, and you are looking at a very costly infrastructure. For example, Yahoo vs. Google.

Robin</description>
		<content:encoded><![CDATA[<p>Richard,</p>
<p>Excellent points.  However, this is comparing apples and oranges. GOOG is deploying systems not &#8220;storage&#8221; <i>per se</i>. Further, they configure for lowest cost per unit of goodness, including CPU cycles, cycles per watt, network bandwidth and bytes. rather than highest density. A linear programming problem, rather than optimizing for one or two metrics.</p>
<p>For example, GOOG would not use 1TB drives, which are currently around $0.30/GB, in favor of whatever the lowest cost per GB happens to be. I&#8217;d guess in their volumes, possibly with special warranty terms, they&#8217;d be getting $0.16-$0.18/GB, less if a vendor is overstocked. </p>
<p>Also, the mobo&#8217;s are purpose-designed as well, with unnecessary PCI slots, graphics etc. removed, but unlike any array vendor they use high-volume parts everywhere on high-volume motherboards. They are buying high-volume parts in high-volumes. It doesn&#8217;t get any cheaper than that. Qual is minimal. They don&#8217;t care about drive firmware levels. Surface mount SATA connectors eliminate problematic cables. They optimize at the system level, not the server and then the storage and then the network.</p>
<p>Your calculations leave out is the server and networking piece. Sure, you can get a lot of disks into a rack if TB per square meter is the metric. That isn&#8217;t Google&#8217;s. Add all the servers and gigE networking you&#8217;d need to make a complete cluster solution, plus the low-volume RAID controllers, and you are looking at a very costly infrastructure. For example, Yahoo vs. Google.</p>
<p>Robin</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Richard</title>
		<link>http://storagemojo.com/2007/07/05/how-yahoo-can-beat-google/#comment-92566</link>
		<dc:creator>Richard</dc:creator>
		<pubDate>Sat, 07 Jul 2007 08:53:51 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=488#comment-92566</guid>
		<description>Robin,

Lets put some more resolution into this.

I don’t care how ‘green’ Google are with their motherboards…there is no magic. 
New Opteron multi-core processors are very fast, lots of IO bandwidth, able to accelerate the performance of existing open software driving large disk backends. Google may be surprised if they test their GFS with a RAID backend on a multi-core Opteron processor. 

If you don’t mind wasting power, a RAID controller design using X86 technology is much like a ‘commodity’ motherboard…with a small added cost of SATA backend chips to drive a total of 48 disks…some products are shipping already. 

Such RAID controller is ‘purpose’ designed, i.e. optimized, cut-down motherboard, no need for PCI expansion slots, etc ), a single hot-plug module into a disk backplane , no internal cabling. Lets not call it a “RAID” again, just to keep you calm.

A 4U (7 inch) mechanical chassis to house 48 vertically mounted disks, dual controllers and triple power is not difficult to design &#38;  has been designed &#38; sold , *six years ago*. That design used SCSI disks. SATA makes it less expensive &#38; power hungry.

A typical 42U rack is able to support 10 such enclosures, for a total of 480 RAID6 protected disks, arranged in 40 groups to deliver 400 ‘data’ disks. 

An equivalent Google configuration, presumably using a 2U, 6 disk chassis with triple redundancy, requires 1200 disks, packaged in 200 chassis, consuming 400U of rack space. Note…  they require 10 full racks….i.e. a 10:1 expansion just in floor space.

It would be good if someone could confirm the exact Google chassis configuration.

In terms of power consumption…..

Qty 10, 4U ‘controller’ solution requires 480 disks vs 1200 disks for Google. We have Qty 10 ‘controllers’  vs  200 motherboards, 1 rack vs 10 racks.

The power consumption on extra 720 disks (1200-480) would be around 10 KWatt (average)….a guess. The extra power consumption of Google 200 motherboards 
 is 190 x 110 Watts  (generous guess) … so a  saving of around 20 KWatts .

Hence the total saving in power is 30 KW, plus the cost of nine extra racks, space &#38; rent. 

Datacenter experts out there …. perhaps someone could verify the above figures in terms of actual running costs, rent, air conditioning etc … say over a 3-5 year period.

In terms of hardware costs…

From experience, in qty 100, such well designed ‘commodity’ 4U chassis will cost just under $10K to build, including a dual core Opteron based ‘controller’  … so it could sell for $20K. The design is ‘cable-less’ and shipped with 48 disk canisters.  The customer buys and (just) plugs-in the disks. This is the *key issue*  in such ‘commodity’ business model. 

So … the cost of a 10 system  4U chassis  infrastructure is $200K per rack (10 x $20K) for  400TB of usable data…or $500 per TB in ‘diskless’ infrastructure cost ….or about the cost of a single 1TB SATA disk. So the end-user cost per TB looks like $1K per TB. 

On the Google side…. they need to buy 200 chassis, add messy SATA cabling and mount the disks….all of which takes time. I suggest that this may cost $2K per chassis, for a diskless dual core solution  ….so we are looking at  $ 400K for a diskless configuration. 

To this they need to add  1200 x 1TB disks =  $600K (at $ 500 each) … plus nine extra  system racks .. plus power cabling. They would not get any change from $1M and their cost is $ 2.5K per TB of storage.

This figure may come closer to $3K/TB if just the extra cost of air conditioning and power cabling  is included…. not counting the extra cost of floor space  &#38; running costs.

So… there appears to be a 3:1 cost ratio…. what do you think..?</description>
		<content:encoded><![CDATA[<p>Robin,</p>
<p>Lets put some more resolution into this.</p>
<p>I don’t care how ‘green’ Google are with their motherboards…there is no magic.<br />
New Opteron multi-core processors are very fast, lots of IO bandwidth, able to accelerate the performance of existing open software driving large disk backends. Google may be surprised if they test their GFS with a RAID backend on a multi-core Opteron processor. </p>
<p>If you don’t mind wasting power, a RAID controller design using X86 technology is much like a ‘commodity’ motherboard…with a small added cost of SATA backend chips to drive a total of 48 disks…some products are shipping already. </p>
<p>Such RAID controller is ‘purpose’ designed, i.e. optimized, cut-down motherboard, no need for PCI expansion slots, etc ), a single hot-plug module into a disk backplane , no internal cabling. Lets not call it a “RAID” again, just to keep you calm.</p>
<p>A 4U (7 inch) mechanical chassis to house 48 vertically mounted disks, dual controllers and triple power is not difficult to design &amp;  has been designed &amp; sold , *six years ago*. That design used SCSI disks. SATA makes it less expensive &amp; power hungry.</p>
<p>A typical 42U rack is able to support 10 such enclosures, for a total of 480 RAID6 protected disks, arranged in 40 groups to deliver 400 ‘data’ disks. </p>
<p>An equivalent Google configuration, presumably using a 2U, 6 disk chassis with triple redundancy, requires 1200 disks, packaged in 200 chassis, consuming 400U of rack space. Note…  they require 10 full racks….i.e. a 10:1 expansion just in floor space.</p>
<p>It would be good if someone could confirm the exact Google chassis configuration.</p>
<p>In terms of power consumption…..</p>
<p>Qty 10, 4U ‘controller’ solution requires 480 disks vs 1200 disks for Google. We have Qty 10 ‘controllers’  vs  200 motherboards, 1 rack vs 10 racks.</p>
<p>The power consumption on extra 720 disks (1200-480) would be around 10 KWatt (average)….a guess. The extra power consumption of Google 200 motherboards<br />
 is 190 x 110 Watts  (generous guess) … so a  saving of around 20 KWatts .</p>
<p>Hence the total saving in power is 30 KW, plus the cost of nine extra racks, space &amp; rent. </p>
<p>Datacenter experts out there …. perhaps someone could verify the above figures in terms of actual running costs, rent, air conditioning etc … say over a 3-5 year period.</p>
<p>In terms of hardware costs…</p>
<p>From experience, in qty 100, such well designed ‘commodity’ 4U chassis will cost just under $10K to build, including a dual core Opteron based ‘controller’  … so it could sell for $20K. The design is ‘cable-less’ and shipped with 48 disk canisters.  The customer buys and (just) plugs-in the disks. This is the *key issue*  in such ‘commodity’ business model. </p>
<p>So … the cost of a 10 system  4U chassis  infrastructure is $200K per rack (10 x $20K) for  400TB of usable data…or $500 per TB in ‘diskless’ infrastructure cost ….or about the cost of a single 1TB SATA disk. So the end-user cost per TB looks like $1K per TB. </p>
<p>On the Google side…. they need to buy 200 chassis, add messy SATA cabling and mount the disks….all of which takes time. I suggest that this may cost $2K per chassis, for a diskless dual core solution  ….so we are looking at  $ 400K for a diskless configuration. </p>
<p>To this they need to add  1200 x 1TB disks =  $600K (at $ 500 each) … plus nine extra  system racks .. plus power cabling. They would not get any change from $1M and their cost is $ 2.5K per TB of storage.</p>
<p>This figure may come closer to $3K/TB if just the extra cost of air conditioning and power cabling  is included…. not counting the extra cost of floor space  &amp; running costs.</p>
<p>So… there appears to be a 3:1 cost ratio…. what do you think..?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Robin Harris</title>
		<link>http://storagemojo.com/2007/07/05/how-yahoo-can-beat-google/#comment-92387</link>
		<dc:creator>Robin Harris</dc:creator>
		<pubDate>Sat, 07 Jul 2007 01:24:27 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=488#comment-92387</guid>
		<description>Augustus,

Google has search sewn up, but their other services don't fare so well. Yahoo needs to build from the strengths they have while reducing the cost disadvantage.

Richard,

I believe that GOOG has 6 disks per mobo. Also remember that they are doing computing on the servers as well - these are combined storage/compute clusters - not just storage.

Believe me when I say that GOOG is extremely concerned about power efficiency. Their system is much more efficient than the standard enterprise kit.

Brian,

People have to look at more than cost to justify buying ANY big iron storage array. Of course GOOG isn't doing backup on their multipetabyte clusters, so de-dupe is a bit of a yawner.

I've said from the very first that Google doesn't have an infrastructure for handling money, which limits their direct applicability. But Amazon does, and they have a very similar architecture of massive clusters built from commodity parts. If they've done it, others can too.

Robin</description>
		<content:encoded><![CDATA[<p>Augustus,</p>
<p>Google has search sewn up, but their other services don&#8217;t fare so well. Yahoo needs to build from the strengths they have while reducing the cost disadvantage.</p>
<p>Richard,</p>
<p>I believe that GOOG has 6 disks per mobo. Also remember that they are doing computing on the servers as well - these are combined storage/compute clusters - not just storage.</p>
<p>Believe me when I say that GOOG is extremely concerned about power efficiency. Their system is much more efficient than the standard enterprise kit.</p>
<p>Brian,</p>
<p>People have to look at more than cost to justify buying ANY big iron storage array. Of course GOOG isn&#8217;t doing backup on their multipetabyte clusters, so de-dupe is a bit of a yawner.</p>
<p>I&#8217;ve said from the very first that Google doesn&#8217;t have an infrastructure for handling money, which limits their direct applicability. But Amazon does, and they have a very similar architecture of massive clusters built from commodity parts. If they&#8217;ve done it, others can too.</p>
<p>Robin</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brian</title>
		<link>http://storagemojo.com/2007/07/05/how-yahoo-can-beat-google/#comment-92251</link>
		<dc:creator>Brian</dc:creator>
		<pubDate>Fri, 06 Jul 2007 19:39:57 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=488#comment-92251</guid>
		<description>Hi Robin,

Nice provocative post as always :-)  Regarding Richard's comment above about NetApp, let me add another perspective.  NetApp customers often look beyond raw $/TB due to the unique space-saving functionality in the SW.  Things like fast RAID-6 vs triple mirroring are bound to have a positive operational impact on Yahoo!'s power, cooling and floor-tile space consumption compared to Google.

I'd also imagine Yahoo! is actively using NetApp's de-dupe technology now which probably yields enormous 20:1 style space savings for email storage and archives.

Looking a little deeper at the real-world storage footprints of both Google &#38; Yahoo!, I think a far more interesting article would be where trendy commodity technology has current limitations (i.e. can it be "greener"?) as opposed to naive projections that it's ready for prime time in all applications....</description>
		<content:encoded><![CDATA[<p>Hi Robin,</p>
<p>Nice provocative post as always <img src='http://storagemojo.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  Regarding Richard&#8217;s comment above about NetApp, let me add another perspective.  NetApp customers often look beyond raw $/TB due to the unique space-saving functionality in the SW.  Things like fast RAID-6 vs triple mirroring are bound to have a positive operational impact on Yahoo!&#8217;s power, cooling and floor-tile space consumption compared to Google.</p>
<p>I&#8217;d also imagine Yahoo! is actively using NetApp&#8217;s de-dupe technology now which probably yields enormous 20:1 style space savings for email storage and archives.</p>
<p>Looking a little deeper at the real-world storage footprints of both Google &amp; Yahoo!, I think a far more interesting article would be where trendy commodity technology has current limitations (i.e. can it be &#8220;greener&#8221;?) as opposed to naive projections that it&#8217;s ready for prime time in all applications&#8230;.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic Page Served (once) in 1.008 seconds -->
