How can Microsoft’s MSN compete with Google’s powerful cost advantage in large scale web services? After all, Google’s infrastructure is a clean sheet design, intended to be the world’s most scalable Internet Data Center. And Microsoft is going to beat that with Windows? In their dreams.

Not so fast. The suits at Microsoft may be clueless, but there are some very smart technologists there. Want to beat Google’s cost advantage with an all-Microsoft solution? The folks at Microsoft Research Silicon Valley have a modest suggestion: Boxwood.

Using Windows the way Google uses Linux – to host its higher level software – Boxwood expects that there will be disk, server and network failures and it auto-manages around them. Boxwood has several significant advantages that could give Microsoft an edge in infrastructure cost and scalability. Remember, Microsoft just has to stay in the game long enough for Google to stumble and then shovel cash at the opportunity. Yet without cost parity with Google, Microsoft will always be playing catch up, for Google could still simply cut prices to spur demand.

Microsoft’s lead techie on Boxwood is Chandramohan A. Thekkath, a leading scholar of distributed infrastructure systems. Earlier he was a developer of Petal, distributed virtual disks and Frangipani a distributed file system. A very smart guy who’s been working on this stuff for decades. Has Google met its match?

Microsoft, MICROSOFT, doing something cool?
Don’t get your knickers in a twist – this is Microsoft Research – and billg is to be commended for hiring a lot of brilliant computer scientists and giving them a big, money-lined sandbox to play in. Industry giants like Gordon Bell and Jim Gray work there, as well as many other really smart folks. I only wish that more of their research turned into products.

Low-cost clustered storage for Windows enterprise and SMBs?
Boxwood’s basic idea is hide disks from applications by creating a set of higher-level infrastructure services. The goal: to simplify building applications. This approach, as stated in the MSR paper (see below) on Boxwood, offers three main advantages:

. . . by directly integrating data structures into the persistent storage architecture, higher-level applications are simpler to build, while
getting the benefits of fault-tolerance, distribution, and scalability at little cost. Furthermore, abstractions that can inherently deal with sparse and non-contiguous storage free higher level software from dealing with address-space or free-space management. . . . A third advantage is that using the structural information inherent in the data abstraction can allow the system to perform better load-balancing, data prefetching, and informed caching. These mechanisms can be implemented once in the infrastructure instead of having to be duplicated in each subsystem or application.

They compare the services Boxwood offers to standard storage arrays and conclude:

. . . even sophisticated virtual disk systems that provide scalability and ease of management require higher layers like the file system to deal with free space management, data placement, and maintaining user-visible abstractions.

In short, even state-of-the-art disk arrays don’t do what these folks think they should, even if they were free, which they most assuredly are not. The whole concept of managing storage on disks, physical or virtual, is broken. But you knew that already.

Acknowledgement: The Boxwood information is from a 16 page technical paper authored by John MacCormick, Nick Murphy, Marc Najork, Chandramohan A. Thekkath, and Lidong Zhou.

Part II tomorrow: Boxwood’s organization. I expect this article will be in three parts, but a fourth part may be lurking. Part II will be published tomorrow, with Part III and, if need be, Part IV coming within 10 days. Sorry for the distended schedule, yet I’m pretty sure there will be some good stuff coming out of Storage Network World, which I am attending this week.

Comments welcome, of course. Moderation is turned on to control comment spam.