A reader wrote me a note that asks a question that I think is on the minds of many data center folks. He said it well himself, so I’ll quote liberally, starting with the compliment.
I really enjoy reading your blogs!
One thing I’ve notices in your blog, other blogs, and all over the computer media is how Google keeps coming up as an example. It’s how Google leverages commodity servers, drives, custom software (Google filesystem), etc, etc. It’s as if everyone thinks Google should be emulated.
Ok . . .I can buy that . . . but I wonder, how much of what Google does is really applicable to my work. I work for a utility. The vast majority of our IS computer resources go toward transaction processing type systems.
What you _NEVER_ read about Google is what computer resources they use for their internal business processes. You only hear about what resources they use for their products (search, gmail, maps, etc). When I read about how wonderful the Google infrastructure is and how it should be emulated, I always start wondering if the described infrastructure (commodity server, cheap disk, google filesystem and massive parallelism) are also used for their billing, payroll, accounting systems, and whatever? Do they use custom written software for these, or Oracle apps or SAP? Do they use a database, which one? What disk systems do they use with it? How is it laid out?
It always interesting to hear how Google does things, expecially since they are so secret about it, but I’m not convinced that what does come out is useful to us, or is a very complete picture of their infrastructure. I guess I wonder if buried deep in their datacenters is a more normal infrastructure like ours.
Good questions – one’s I’ve often asked myself.
Google the 10,000 person company doesn’t have nearly the clout that Google, the world’s largest internet advertising company and, more importantly, buyer of 500,000 servers a year, does. If they asked IBM to clusterize MVS to win an order they’d get laughed at just like anyone else.
But dangle a few hundred million a year in front of Intel while asking them to do things their engineers want to do anyway and that is much warmer.
I suspect that their offices and internal data centers look a lot like yours, at least for the database business apps – the corporate underwear. But I bet they back up their unstructured data on GFS – why not?
Linux, PCs and Macs
I know they use Macs and PCs and that, at the very least, they outsource some of their IT work to people using Microsoft server products. They may even have Microsoft servers inside the company, though I’ve never seen evidence of it.
However, I have never held up Google’s infrastructure as one that could be used to count money. Check out the StorageMojo take on the Google File System and I said as much.
Amazon is a different story
The more appropriate example is Amazon. They have millions of customers, they count billions of dollars, they customize each web page on the fly and they do it with a services-based distributed architecture based on open source software clusters. They scale well. And they arrived at that architecture only after trying all the “enterprise” products, including a mainframe. They not only built it, they migrated to it from a very large installed base.
If Werner Vogels ever decides to build his own company, that would be the pitch.
Amazon does transaction processing on a cluster. That is the enterprise problem.
Amazon is the company IT architects should be studying. They just don’t publish very much.
I don’t believe that “enterprise” hardware and software are going away in my lifetime, any more than the mainframe has or probably will. What will shift is the growth. When the market shifts, the weaker players will fold or consolidate, just as they did in the mainframe market.
But with 85%+ of digital data in ordinary files, even mid-range RAID solutions are overkill. Big blobs of cheap cluster storage would solve all kinds of IT problems. Back up window closing fast? Back up to a storage cluster sized to be a 6-10 week FIFO buffer. I suspect there are many data center applications for cheap cluster storage today if someone offered a reasonable product and notoriously conservative IT managers tried them.
Enterprise growth rate
Moore’s Law is driving up CPU power faster than enterprise application growth rates. The enterprise market share has been shrinking for years, and in the next five years that market’s growth could stall entirely.
The StorageMojo take
Google is a fun story, the way Microsoft was in the 1980’s. They picked up a lot of ideas that folks had been working on for, in some cases, decades and rolled them out in a big way. They’ve produced something we’d never seen before even though much of it was percolating around CompSci departments for years. The antics of the boy billionaires makes good copy.
The real power of Google will be seen when the computer scientists who are now multi-millionaires get tired of working for a big company and decide to see if lightning can strike twice. For most of them it won’t, but what the hey, they didn’t go into for the money anyway. They’re the hot rodders of the digital age, channeling, chopping, stroking and boring the bits to create beauty, handling and speed. With luck, all three.
Comments welcome, of course. Have a good weekend!