After reviewing the impressive Google File System, I wondered about Google’s competitors: MSN, AOL and Yahoo. Is it possible to quantify the economic advantage of GFS over conventional enterprise architectures?
NetApp’s web site notes that Yahoo Mail uses NetApp equipment. They also claim in one of their 10-K reports:
NetApp success to date has been in delivering cost-effective enterprise storage solutions that reduce the complexity associated with managing conventional storage systems. Our goal is to deliver exceptional value to our customers by providing products and services that set the standard for simplicity and ease of operation.
Uh-huh. Like those 520 byte sector disk drives with the Advanced Margin Enhancement Technology?
The Smackdown: Yahoo vs Google
The idea: compare the revenue returned for each dollar of IT capital cost for two tech-savvy, leading-edge internet firms. For every dollar they invest in IT, what do they get back in revenue? Capitalism 101. Since IT is virtually all they do, the differences should be stark
I chose Yahoo! to compare to Google, since they are roughly similar in revenue, they each run always-on data centers with at least 100,000 servers, and they offer a similar range of services. AOL and MSN are both part of larger companies, so digging out numbers would be difficult if not impossible.
Another YHOO/GOOG similarity: Yahoo also uses open source software: FreeBSD, Apache, and Perl. So the differences between Yahoo and Google should be mostly hardware, not software, except for, I’d guess, proprietary management software. And since storage is typically the largest part of IT capital expense, that hardware should be mostly storage. NetApp for example.
Stalking the Wild IT Numbers
This is where I explain where the numbers come from. If you are a financial type you’ll want to know, but most of you can skip ahead to The Bottom Line. The numbers are conservative. The YHOO problem is worse than they indicate.
This is a great stream on how Google builds out their architecture – lord knows it was hidden for so long. That said, it is dangerous to extrapolate the operations of web-based organisations like Google & Yahoo to “normal” enterprise-type businesses. Much like equating scientific proicessing to business processing is fraught with peril. GFS would not work in a corporate environment. The data access is primarily database-related, with a lower tier of file-type access. This is why the big iron is thrown at things. Sure, people like ORacle have tried to do some disrtibuted processing with RAC – but you want high OPEX? Let me introduce you to Oracle RAC on Red Hat Linux!
Cheers,
Simon
Simon,
Your points are well taken — and I agree. In the review of GFS I concluded that it would NOT be a successful commercial product: it is too tightly tuned to Google’s unique workloads. The “relaxed” consistency model would give any CIO hives as well.
Yet from a future-tech perspective, as postulated in Storage and Cosmology the digital data universe is cooling. It has to as storage gets more affordable. Which means that the hot, transaction intensive storage that is the pinnacle of storage engineering,will find itself in a niche whose growth, relative to the total storage market, will lag. Think mainframes.
To me the importance of Google is to show — “he who has eyes to see, let him see” — that for a demanding set of 7×24 apps it is possible to create a significant competitive advantage using IT. A few years ago Nick Carr wrote an essay titled “Does IT Matter?”. Really, how could it NOT matter, unless we refuse to take risks? Google has taken a risk, and they are kicking it. Their marketing is pathetic in the extreme, so their hard-won advantage is at serious risk, but for now they are a shining avatar of what IT can and should aspire to.