So Mr. Tucci, Where Are EMC’s Google Application Notes?

by Robin Harris on Saturday, 29 April, 2006

They’ve come out of nowhere and in a few short years built one of the world’s largest always on data centers supporting data and compute intensive applications such as search, mail, chat, mapping, blogging and much more. They roll out new applications faster than anyone in the business, including such deep-pocketed and savvy competitors as Microsoft and Yahoo.

Someone has to ask: how could more than 20,000 terabytes of mission-critical storage be built and operated WITHOUT any of the “big iron” storage vendors? Google collectively probably hasn’t spent two minutes thinking about EMC, but the gnomes of Hopkinton are praying that their big customers don’t notice. Fat frickin’ chance!

With all the chatter about SOA and Web 2.0 Google is exhibit “A” for someone who is doing it, not for a specialized application with a single dominant data type, but for a dozen widely divergent data types, 7×24. The Google platform is clearly an incredible competitive advantage.

So where are all the proud “application notes” that vendors buff up to show just how indispensable they are to the creation of these vital money-spinning applications?

EMC? HP? IBM? Sun? Anyone?

They don’t exist.

I’ll be exploring the implications in future posts, but consider this:

  • Numbers are scarce (no application notes?) but the best estimates are that Google currently is managing well in excess of 20,000 Terabytes of storage. Only the NSA’s domestic surveillance program is likely to be in significant excess of that, and they don’t have to file quarterly reports with the SEC.
  • Other than a possible version of software disk mirroring (RAID 1) it appears that Google does this without any big iron RAID boxes.
  • This is less certain, but it also appears that Google has built their platform on PATA and SATA disks. You know, the ones that are so flaky that the Conventional Wisdom is stampeding to RAID 6 as the only (and surprisingly expensive) way to safely incorporate them into mission-critical 7 x 24 production infrastructures like, you know, everyone but Google needs.

I’ve been puzzled for years over why cheap, high volume storage hasn’t made it into the data center as so many other high volume consumer technologies have. In Google I think I have my answer: it has, using hardware so cheap that the people who build it can’t afford slick “application notes”, big user groups, fat contracts for the “independent” analysts and four color ads in all the IT publications. Not to mention that Google has no incentive to give their secrets away.

Developing . . . .

