Google’s Secret Architecture

by Robin Harris | Thursday, June 29, 2006 | Enterprise, Future Tech | 8 comments

Talking to Bryan Cantrill, one of the ZFS engineers, in San Francisco a few weeks ago, I mentioned that it seemed that Google could really kick Ebay’s tail and Bryan, selected by MIT’s Review as one of the TR35 quickly responded that no one can “spray on” transaction processing (TP) capability. No doubt he is correct.

As noted in the Google File System review, GFS is not optimized for small reads and writes, so there is no way it is the basis for a big TP system. Yet with the advent of Google Checkout, a heavy-duty TP application, the company must have one.

What architecture is Google using to provide high-performance, large-scale transaction processing? Unlike GFS, there are no papers covering this architecture or even hinting around that Google has done anything at all with transaction processing. Are they using the open-source MySQL or Postgres? If so, it would be by far the biggest installation of either of these databases in the world. If anyone could make it work, it would be Google, yet it would be a huge risk.

The most likely conclusion is that Google is using a commercial solution for TP – one that no one in the industry – let alone Google – is talking about. I’ve combed through Google searching for any indication they’ve built a TP infrastructure and haven’t found it. Unless they’ve developed TP technology so amazing that they aren’t even letting their researchers publish, which doesn’t seem to fit with their free-wheeling PhD seminar culture.

Anyone have any insights into who the lucky vendor is getting many tens of millions from Google for big TP systems? Sun is an obvious candidate, since Schmidt used to work there. Yet Oracle prefers simple disk architectures, which we know Google favors, so are they getting big PO’s from Google?

Email me or comment with your thoughts.

Update 1.0
A couple of smarter-than-me folks point out in the comments that there is no reason for Google Checkout to be a TP application. My apologies.

But that made me dig deeper, using links suggested by other readers, and I discovered a story of Google using MySQL for their AdWords database – followed by lots of entertaining (if you’re a nerd) discussion of database gack. Personally I love to listen to engineers argue because you can learn a lot about engineers (and a little about engineering) without the bother of being one.

Net-net: So it may very well be that Google is still preserving its open-source purity. I root for underdogs, so I hope so.

Update 2.0
Bryan is NOT a member of the ZFS team. They brought him along and he talked a lot, so I just assumed he was. He is actually the Jedi Master of Dtrace at Sun.

8 Comments

aLong on Thursday, 29 June, 2006 at 8:16 pm

hello, Robin Harris

I am working for a telecom company which have about 4o0 millions users, now we have a site with about 4 millions unique visitors per day, we need to record all the user’s actions to analysis so we can generate a personalize service and recommended goods for them based on their behaviors, each visitor maybe generate up to 100 records per day, we want to find a storage way to do it, because there are so many small wirtes and reads, I can not found a good way? could you give me some advices? sorry for my poor english.
Anil Gupta on Thursday, 29 June, 2006 at 10:11 pm

Robin,

You may want to check out Xooglers blog http://xooglers.blogspot.com. It may give you some clue to early architecture of Google infrastructure.

One of the Xooglers post mentioned using mySQL. It also mentioned that they tried a commercial DB but couldn’t get the performance they desired.

– Anil
Dan Creswell on Friday, 30 June, 2006 at 1:03 am

So what’s actually required for google’s system is “once only” behaviour which is, in many cases, achieved with transactions but doesn’t have to be.

There are various methods based on message idempotency and logs that could be used instead. You layer on top of that some form of “guarenteed message delivery” which amounts to ensuring enough copies are sent out so that at least one message survives to make it to it’s intended destination. You might also expect to see gossip protocols used in such systems.

I think the above would fit very nicely with Google’s existing infrastructure including GFS’ append only model and given some of the things they’ve done in BigTable.

Werner Vogels (Amazon) has put some notes on this sort of thing in some of his slides before now. I can probably dig them out if you’re interested.
Bryan Cantrill on Friday, 30 June, 2006 at 9:19 am

From what I can tell, Google Checkout is not a transaction-oriented system at all — let alone a “heavy-duty” one: it simply remembers your payment information and fills it in for you automatically if you shop at one of their partners. Yawn. Other than actively corrupting your credit card information (or posting your credit card information to some underground warez BBS), there isn’t much here for Google to screw up — and they certainly do not need the ACID properties associated with transaction-oriented systems. Finally, for whatever it’s worth: while I love ZFS, I’m not a member of the ZFS engineering team — when it comes to ZFS, I’m just a user/crank/critic/fan-boy…
Robin Harris on Friday, 30 June, 2006 at 12:09 pm

aLong- I’m not a systems engineer. From what you’ve said though some basic parameters can be mapped out. Sounds like you need to do about 400 million writes a day. Given that there are are 86,400 seconds in a day that sounds like an average of about 5,000 I/Os per second, and you’d probably have some serious peaks. Using SATA drives you’d probably want about 200 drives working. You didn’t mention a data base, but perhaps your best bet would be using Oracle on Linux with a cheap fast JBOD like Apple’s Xserve RAID. But again, I’m not a systems engineer. More information is undoubtedly needed to actually design something that would work.

Regards,

Robin
Robin Harris on Friday, 30 June, 2006 at 1:00 pm

Don, Bryan, you are both saying something similar: this ISN’T a hardcore TP application. Yet at some point it seems like they are going to have to do *something* that requires heavy duty TP. I’ll re-run the post with the name changed at that time. Thanks for taking the time to inform me.

Following a link suggested by another reader I did find this http://xooglers.blogspot.com/2005/12/lets-get-real-database.html charming story of how Google used MySQL for AdWords. Great discussion in the comments. Trying to distill it down past the religion to a user take-away, it sounds like some commercial database feature set exceed the requirements of many, maybe even most, applications.

And Bryan, pardon me for assuming the guy doing most of the talking during lunch with the ZFS team was *part* of the ZFS team. 😉 For the record, Bryan is the Jedi Master of Dtrace at Sun.
Stefan Tramm on Friday, 30 June, 2006 at 10:29 pm

Whats about sqlite? It has strong support by google and fills very well the gap between big files for GFS and small updates for TP. If Google can route traffic (and I have no doubt, that this is ‘easily’ possible) to a big farm of sqlite DBs, they can scale…

just my $0.02
Stefan
JKirbs on Saturday, 5 August, 2006 at 3:24 pm

for aLong, you may want to look at AT&Ts Daytona DB architecture for some pointers:

http://www.research.att.com/viewProject.cfm?prjID=69