Google has pummeled Yahoo into near-obscurity: the early search leader – the Google of the 1990’s – Yahoo’s market cap is a fraction of GOOG’s while their search share is a distant second. It is easy to forget that Yahoo is actually a large and highly profitable company by most standards – over $6 billion in sales and more than $700 million in profits in 2006.
But when you’re competing against the baddest internet company around, good isn’t good enough. Which is why ex-CEO Terry Semel was shown the door and a new team, co-founder Jerry Yang and CFO Mary Decker, are now at bat.
Sclerotic decision-making
Analysts point to Yahoo’s inability to make fast decisions to acquire hot properties, like FaceBook and YouTube, as indicative of the company’s malaise. A compelling vision and crisp decisions would surely help.
Yet for all their success, the Google’s top management is hardly more experienced than Yahoo’s new team. It is the battle of the billionaire geeks.
But Yahoo has a much bigger problem than corporate culture. How about some basic blocking and tackling?
Profit = Revenue minus Cost.
Financial analysts are focused on Yahoo’s missed revenue opportunities. What about the cost side?
Bringing a knife to a gun fight
Yahoo’s infrastructure is built like a very large enterprise data center with brand name products. For example, Yahoo’s very successful mail system is run on NetApp filers. (Is it just me or does that NetApp page sound a little off?) Yahoo does use as much free software – FreeBSD, Apache and Perl – as they can, so their problem is hardware capital expense and operating expense.
Google’s infrastructure is built on commodity PC products. Cheap SATA drives velcro’d on quad-core mobo’s instead of high-performance network storage. A software layer that optimizes and manages across the cluster, so people don’t have to. No costly RAID arrays. No RAID at all, just 3x – or more – replication.
Cheaper, better, faster: pick any three
I compared Yahoo and Google IT cost structures a year ago and found that for every dollar Google and Yahoo invest in IT, Google generates 50-60% more revenue with 4,000 fewer people. Since then, I’ve estimated that Google has a 5-8x cost advantage per user I/O over Yahoo. That includes search, mail, and other services.
Google’s cost advantage gives it two huge advantages:
- Its ROI bar is lower than competitors, so it can afford to make improvements that competitors can only dream of.
- Its lower costs mean that even if a deep-pocketed competitor like Microsoft wanted to eliminate the profits in AdSense, they’d be bleeding red ink while Google broke even.
The low-cost producer doesn’t always win. But they sure have more maneuvering room than higher-cost producers.
So how can Yahoo win?
This isn’t rocket science.
- Get IT costs under control. Go to a Google-like infrastructure using commodity products.
- Defend your market leadership where you have it, like Yahoo mail. Google’s marketing is pretty much MIA – as the $1.6 billion purchase of YouTube reflected. Google is pleased with the growth of Gmail, which is largely a function of people opening multiple email accounts to take advantage of the $10 Google Checkout promotion, but Yahoo has the users.
- Add capabilities to its new commodity-based infrastructure, such as transaction processing, that Google can’t easily add, and use it to drive new business and traffic.
The StorageMojo take
Google looks invulnerable, but their inability to win outside of search and advertising points to just how weak their management and their market position is. They don’t have the integration skills of a Cisco – look at how poor YouTube’s search function still is months after purchase by the world leader in search – and while a huge bag of money can rent a lot of love, Microsoft has proven that it can’t buy internet success.
Yahoo has an opportunity to catch the next wave of search goodness if they are aggressive about bringing their infrastructure costs down. If not, it doesn’t matter what they do on the revenue side: they will continue a long, slow decline. They’ll be good, but they’ll never be great.
Comments welcome, as always. And as an aside I’m here to report that Brad Bird’s new movie, Ratatouille is a must see – even better than The Iron Giant and The Incredibles. The animated short Lifted is hysterical. One caution: I don’t think the movie is suitable for some pre-schoolers due to disturbing images of swarming vermin. Wait for the DVD.
Hey Robin,
I use Yahoo and Google many times a day. I have Ymail and Gmail. I also have both a MyYahoo page and an iGoogle page.
The bad news for yahoo is that I am in a transition away from their site. Yahoo is still my start page, but I enjoy all of google services much more. On top of their superior mail and their equal RSS reader they have google docs which is actually invaluable to me.
In summary I am slowly slipping away from Yahoo and I really can’t find any reason to stay. the changes suggested will help them financially, but I think that google is much more inside my head than yahoo can be.
Robin,
NetApp hardware vs “no RAID at all, just 3x – or more – replication”…
Please someone explain (with figures) how this is a lower cost solution … say over a three year period…. disks, cabling and running costs of power included.
Having it stuck together with Velcro does not save costs … its just bad engineering and does not save anything.
Mike,
I was going to mention marketing – as in user experience – as an area where Yahoo needs to improve. But I beat that horse all the time so I left it off. Thanks for the comment from someone who uses both.
Richard,
Even heavily discounted , NetApp boxes are several bucks per gigabyte, and more when you add the software that does, say, replication. SATA drives are now $0.20 per gigabyte. Add the mobo they’re mounted on, which is also a server, and Google’s cost per terabyte is about $500. Power is a wash, ethernet cables are cheap. Show me a NetApp that is even $5000 per TB, and then you have to buy the server to run your app.
Maybe this should be a post.
Robin
I agree with your observations on the cost structure but not the strategy.
The main problem is that Google has 65% market share in search.
Yahoo has to come up with a strategy that uses its assets in a more profitable manner. Maybe a product that integrates all its properties into one. Or maybe offer advertisers pay per action marketing on search ads etc.
-Augustus
Robin,
There is a lot to be said for a ‘vertically’ integrated, ‘roll your own hardware’ model. Any large data center ‘consumer’ will need to emulate Google in order to compete, especially if ‘storage’ is increasingly being offered for free.
I fully agree with you that Yahoo (and others) must ‘go to a Google-like infrastructure using commodity products’.
The ‘Google-way’ on hardware may not be the best way. At a certain point, the saving on hardware will be balanced by the cost of in-house software & support, as the infrastructure expands and needs to scale.
Google hardware cost is much higher than the cost of parts. We should add other associated factors (purchasing, integration, testing, internal warranty & field service, etc.) … so lets agree on say $1K per TB….. and they need x 3 on this ….i.e 3K/TB.
A well designed RAID controller is able to support a large number of disks.
For example, a 24 disk front-access,4U packaging is becoming common. Vertically mounted disk packaging provides 48 disks in the same 4U vertical space. Such system will run (say) 4 x Raid 6 (10+2) groups & deliver the equivalent of 40-disk capacity of useable storage.
I guess that a Google ‘commodity’ server (i.e. with a ‘commodity’ motherboard chipset) will support (just) four disks in a 1U chassis. If so, then Google needs 10 x 3 = 30 motherboard-based enclosures and 30U of vertical rack space.
Ignoring disks, each motherboard in a Google chassis will consume about the same power as a single RAID controller (say 150 Watts per chassis) …i.e approx 4.5K Watts of extra power …. so the running costs are not trivial.
Google would probably argue a higher level of reliability … through x 3 approach. I doubt this, considering the quality of commodity velcro-held hardware. There is probably a good reason why they are not bolted down.
A high-quality, commodity RAID enclosure with unbundled disks, combined with open software….. is still a good option.
Hi Robin,
Nice provocative post as always 🙂 Regarding Richard’s comment above about NetApp, let me add another perspective. NetApp customers often look beyond raw $/TB due to the unique space-saving functionality in the SW. Things like fast RAID-6 vs triple mirroring are bound to have a positive operational impact on Yahoo!’s power, cooling and floor-tile space consumption compared to Google.
I’d also imagine Yahoo! is actively using NetApp’s de-dupe technology now which probably yields enormous 20:1 style space savings for email storage and archives.
Looking a little deeper at the real-world storage footprints of both Google & Yahoo!, I think a far more interesting article would be where trendy commodity technology has current limitations (i.e. can it be “greener”?) as opposed to naive projections that it’s ready for prime time in all applications….
Augustus,
Google has search sewn up, but their other services don’t fare so well. Yahoo needs to build from the strengths they have while reducing the cost disadvantage.
Richard,
I believe that GOOG has 6 disks per mobo. Also remember that they are doing computing on the servers as well – these are combined storage/compute clusters – not just storage.
Believe me when I say that GOOG is extremely concerned about power efficiency. Their system is much more efficient than the standard enterprise kit.
Brian,
People have to look at more than cost to justify buying ANY big iron storage array. Of course GOOG isn’t doing backup on their multipetabyte clusters, so de-dupe is a bit of a yawner.
I’ve said from the very first that Google doesn’t have an infrastructure for handling money, which limits their direct applicability. But Amazon does, and they have a very similar architecture of massive clusters built from commodity parts. If they’ve done it, others can too.
Robin
Robin,
Lets put some more resolution into this.
I don’t care how ‘green’ Google are with their motherboards…there is no magic.
New Opteron multi-core processors are very fast, lots of IO bandwidth, able to accelerate the performance of existing open software driving large disk backends. Google may be surprised if they test their GFS with a RAID backend on a multi-core Opteron processor.
If you don’t mind wasting power, a RAID controller design using X86 technology is much like a ‘commodity’ motherboard…with a small added cost of SATA backend chips to drive a total of 48 disks…some products are shipping already.
Such RAID controller is ‘purpose’ designed, i.e. optimized, cut-down motherboard, no need for PCI expansion slots, etc ), a single hot-plug module into a disk backplane , no internal cabling. Lets not call it a “RAID†again, just to keep you calm.
A 4U (7 inch) mechanical chassis to house 48 vertically mounted disks, dual controllers and triple power is not difficult to design & has been designed & sold , *six years ago*. That design used SCSI disks. SATA makes it less expensive & power hungry.
A typical 42U rack is able to support 10 such enclosures, for a total of 480 RAID6 protected disks, arranged in 40 groups to deliver 400 ‘data’ disks.
An equivalent Google configuration, presumably using a 2U, 6 disk chassis with triple redundancy, requires 1200 disks, packaged in 200 chassis, consuming 400U of rack space. Note… they require 10 full racks….i.e. a 10:1 expansion just in floor space.
It would be good if someone could confirm the exact Google chassis configuration.
In terms of power consumption…..
Qty 10, 4U ‘controller’ solution requires 480 disks vs 1200 disks for Google. We have Qty 10 ‘controllers’ vs 200 motherboards, 1 rack vs 10 racks.
The power consumption on extra 720 disks (1200-480) would be around 10 KWatt (average)….a guess. The extra power consumption of Google 200 motherboards
is 190 x 110 Watts (generous guess) … so a saving of around 20 KWatts .
Hence the total saving in power is 30 KW, plus the cost of nine extra racks, space & rent.
Datacenter experts out there …. perhaps someone could verify the above figures in terms of actual running costs, rent, air conditioning etc … say over a 3-5 year period.
In terms of hardware costs…
From experience, in qty 100, such well designed ‘commodity’ 4U chassis will cost just under $10K to build, including a dual core Opteron based ‘controller’ … so it could sell for $20K. The design is ‘cable-less’ and shipped with 48 disk canisters. The customer buys and (just) plugs-in the disks. This is the *key issue* in such ‘commodity’ business model.
So … the cost of a 10 system 4U chassis infrastructure is $200K per rack (10 x $20K) for 400TB of usable data…or $500 per TB in ‘diskless’ infrastructure cost ….or about the cost of a single 1TB SATA disk. So the end-user cost per TB looks like $1K per TB.
On the Google side…. they need to buy 200 chassis, add messy SATA cabling and mount the disks….all of which takes time. I suggest that this may cost $2K per chassis, for a diskless dual core solution ….so we are looking at $ 400K for a diskless configuration.
To this they need to add 1200 x 1TB disks = $600K (at $ 500 each) … plus nine extra system racks .. plus power cabling. They would not get any change from $1M and their cost is $ 2.5K per TB of storage.
This figure may come closer to $3K/TB if just the extra cost of air conditioning and power cabling is included…. not counting the extra cost of floor space & running costs.
So… there appears to be a 3:1 cost ratio…. what do you think..?
Richard,
Excellent points. However, this is comparing apples and oranges. GOOG is deploying systems not “storage” per se. Further, they configure for lowest cost per unit of goodness, including CPU cycles, cycles per watt, network bandwidth and bytes. rather than highest density. A linear programming problem, rather than optimizing for one or two metrics.
For example, GOOG would not use 1TB drives, which are currently around $0.30/GB, in favor of whatever the lowest cost per GB happens to be. I’d guess in their volumes, possibly with special warranty terms, they’d be getting $0.16-$0.18/GB, less if a vendor is overstocked.
Also, the mobo’s are purpose-designed as well, with unnecessary PCI slots, graphics etc. removed, but unlike any array vendor they use high-volume parts everywhere on high-volume motherboards. They are buying high-volume parts in high-volumes. It doesn’t get any cheaper than that. Qual is minimal. They don’t care about drive firmware levels. Surface mount SATA connectors eliminate problematic cables. They optimize at the system level, not the server and then the storage and then the network.
Your calculations leave out is the server and networking piece. Sure, you can get a lot of disks into a rack if TB per square meter is the metric. That isn’t Google’s. Add all the servers and gigE networking you’d need to make a complete cluster solution, plus the low-volume RAID controllers, and you are looking at a very costly infrastructure. For example, Yahoo vs. Google.
Robin
Robin,
You made a passing comment about dedupe and backup which I’d like to further address. NetApp’s A-SIS dedupe technology works wonderfully for primary storage outside of any backup context (as well as within backup of course).
We see 20:1 dedupe ratios on master VMware images of all kinds. Also on structured data sets like Oracle, SQL or (exchange) Jet databases. Outlook PST files are a perfect example.
If that’s any indication, I think Richard’s arguments about the savings of advanced RAID arrays could be really powerful in Yahoo’s context. I’d have to imagine their primary Email storage and archived Email storage contains a tremendous amount of data which could be deduped by 80% or more. If that’s the case, Google can’t possible have a more efficient storage infrastructure!
/L.
Robin,
You missed one of my key points.
With these new multiple processor cores, Google should be able to run GFS in conjunction with a large number of backend disks on the *same* controller, with E’net front-end. This is nothing new in terms of ‘architecture’….it is multi-core already. There is not much difference in hardware if they do a ‘purpose’ built controller already…. and they should not call it ‘commodity’
They can easily add protection to the backend to eliminate triplication, save 30KW per 400TB of storage in power and greatly reduce the initial cost. Also, with this they will get x 10 datacenter density.
Imagine the level of saving across the whole infrastructure….on power alone.
My argument holds with different capacity disks. Disks always get cheaper and all you are trading is the initial cost vs power consumption of more disks and more nodes.
The problem is that there is no such high performance ‘commodity’ controller on the market today … but it is not a technical problem.
Also, I suggest that Google need to triplicate at the local level has little to do with storage but more with GFS and the need for bandwidth. To get ultimate protection, they still should replicate across different geographical datacenters.
One thing mentioned in passing that people are not (I think) considering is that Google doesn’t do field service on their machines as we normally think of it.
Instead, they use what I like to refer to as the “ignore operation”. If a disk fails, they ignore it at the hardware level (at the filesystem level they of course make a copy of what it was holding). Same for a system.
Sooner or later, I gather that FS people go through a site and pull broken stuff, but one of their key insights is KSS (Keep It Simple, Stupid). The less fancy their hardware, the less they directly (hands on) interact with it, the cheaper, since people are expensive. Note Robin’s comments that “Google generates 50-60% more revenue with 4,000 fewer people.”
Hmmm, all this predates me hearing about their concerns about power consumption, so maybe they can now remotely power down disks , and systems that suffer complete failure (in fact, trying a power cycle to unwedge a machine seldom hurts :-).
However, I would say that overall they have a VERY keen eye on TOC, and to assume they are doing stupid things here is unwise.
– Harold
Yes, this is all they can do.. instead of sparing with a single disk, they need to ‘drop’ 6 disks and replicate with another system containing additional six disks. The need to keep many spare systems around… is extra cost. I hope they power these down, also at extra hardware cost… intelligent power switch per system.
This is getting a lot closer to my estimate of $1K per system or $3K per TB, as suggested earlier…. ignoring power & space issues.
GFS is a very good vehicle but comes at a price. Internal ‘vertical’ integration is a great concept but not with ‘commodity’ hardware.
However, this constant ‘spin’ by Google regarding ‘commodity’ solutions (which their hardware is not), resulting in low cost per TB and their stated concern for power…. all remain very questionable.
Richard,
If EMC, NetApp, StorageWorks or anyone else’s RAID or filers were actually competitive with what Google – and let’s not forget Amazon – have built, don’t you think Google would be buying them?
Also, GFS and BigTable work on files and tablets, not blocks. So when a server fails, the replication happens to where ever there is space for it. The data is replicated, not mirrored.
And again, Google is probably paying today about $160-$180 per raw TB. Triple that to even $600 TB including packaging – there is very little – and mobo space. Even Apple’s Xserve RAID is double that.
Which is why Google is concerned about power: they’ve cut the cost out of virtually every other aspect of their operations. I’ve looked at the power numbers for big iron arrays and if all you think about is drive power, then yes, 3x is worse than RAID 6, but not nearly as much as you’d assume. The real issue is the power hungry controllers and all the network infrastructure, FC or IP, required to make it work. And then you still haven’t factored in the servers.
I feel a post coming on Richard. Thanks for writing.
Cheers,
Robin
Robin,
I don’t work for any one in the big (or small) iron camp.
I am not suggesting that anyone can compete with ‘roll your own’ strategy & have agreed that Yahoo & others must follow this model in order to compete. However, when they do, they should improve their approach.
GFS environment can run on a much more powerful controller, simultaneously supporting larger, protected disk backends to eliminate waste. All data ends up on disk blocks…somewhere.
If Google does not have such hardware design capability, then perhaps they should ask one of their early backers to show them how….he is already doing it.
So all that is left is some spin on ‘commodity’ with velcro, unsubstantiated cost figures and more spin on ‘how green is my valley’.
Perhaps Google should ‘vertically integrate’ with a power station building business…. a small clean nuclear type, one per datacenter.
As someone said, this is a provocative post…. so lets end this story.
Richard, you made 3 important errors when doing the cost comparison between the
raid and google solutions. After fixing these errors, the google solution
appear 20% cheaper instead of 3x more expensive.
1) In the raid solution, you need to buy a total of 480 disks, not 400 as you
assumed in your reasoning, to fill up ten 48-disk chassis.
2) In the google solution, you over estimate the cost per diskless chassis by a
factor of almost 10x ! Today you can buy components to build a dual-core 2.0
GHz 2-GB diskless machine for about $230 from newegg ($40 psu, $60 am2 socket
mobo, $60 athlon 64 x2 3600+ 2.0 GHz, $70 ddr2 ram). Google buy similarly
priced components and strap them with velcro on sheets of insulated material
(they used to use cork sheets, but had to change the material because it turned
out to be a fire hazard, I don’t know what they use today). So, let’s round
this $230 to $250/chassis. This is much less than the $2k/chassis figure you
mention.
3) You assume google use 1-TB disks. As Robin correctly pointed out, they are
on the contrary buying what is more cost effective. Robin estimates $160-$180
per raw TB, I’ll be more conservative and assume $220 per raw TB (500-GB disks
are sold for $110 on newegg). Again, this is much cheaper than your figure of
$500 per TB. To take into account these half smaller disks, you need to double
the number of disks (2400 instead of 1200) and chassis (400 instead of 200).
You found the google solution to be 3x more expensive.
But with these errors now fixed, it is, in fact, 20% cheaper:
raid: 10 chassis x ($20k/chassis) + 480 disks x ($500/disk) = $440,000
google: 400 chassis x ($250/chassis) + 2400 disks x ($110/disk) = $364,000
Additionally there is an even more important reason about why they don’t raid
but instead prefer to do 3x replication: raid won’t protect you if the whole
server fails. Whereas they can do 3x replication on 3 different servers on
3 different racks and take down a whole rack (for maintenance for example)
whithout impacting the availability of the data.
That said, I agree that the raid solution offers higher densities and is
probably more power efficient, but it just doesn’t offer the same level of
reliability than 3x replication…
Distributed storage using commodity components (peer-to-peer) architectures have consistently proven to be less expensive than big-iron centralized storage. There are multiple companies that have taken advantage of this cost differential over the years most notably in the research and academic circles where money is always tight. While Google has become the most widely known user of grid/distributed storage many companies benefit from the cost savings in hardware and data management every day.
I really think that the discussions about the various merits of different hardware miss the real crux of the problem. Google have a clearly defined, burning desire to be and stay number 1. That is the bottom line, whereas Yahoo seams very centred on their own internal politics. Well from my outside view point anyway.
So I think it is a cultural thing rather than a technology thing. Get the culture right and the technology will follow. (Just my thoughts)
http://www.smtnet.co.uk/
Here is a Idea please look at this
http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9043942
For $4,000 or so, I can get eight PS3s that can do the same task that I’d do on a supercomputer
A different approch $ maybe cost affective ?
Larry
To be honest I used to use yahoo…but just hated how much rubbish they had on their home pages. Google is just so much more attractive and simple- I am not told the news, weather, sports (especially annoying when you are trying to avoid results). And the fact is that most people start from their home page, which is why so many companies desperately try to get you to change to theirs or a sponsor’s. This is less so with the development of toolbars, but for me google=the epitome of Keep It Simple Stupid, whereas Yahoo = Stupid.