Concall today with Bryan Cantrill, the smart guy behind Dtrace. Dtrace was the engine behind Sun’s Oracle’s Fishworks server and application monitor. Dtrace has also been incorporated into OS X.
Bryan left Oracle last week and started Monday at Joyent the cloud infrastructure provider, as VP of engineering. Why?
Bryan is an instrumentation geek. He really wants to know what’s going on. Instrumentation in the cloud is the next big challenge.
That makes sense: there are so many moving parts that understanding and resolving performance and availability issues will be critical to the widespread adoption of cloud.
Tech epiphanies
Bryan described 3 technology epiphanies that he’s enjoyed. The 1st was when he saw Java for the first time back in 1995. The 2nd was when he saw a Ruby on Rails video about deploying a web app.
His 3rd epiphany came recently when he saw something called node.js. Developed by Ryan Dahl it turns the JavaScript paradigm on its head: node.js runs on the server, not the client.
Latency bubbles
We know that server I/O latency can kill performance. It’s even worse in the cloud.
A single bad drive can hose a server if the app is holding locks. What if you have a webpage that relies on five different Web services, or as many Amazon pages do, 150 services?
You need an infrastructure that is resilient in the face of long latency while maintaining high throughput. Bryan says that most failures are not hard failures but are latency bubbles that cascade out and lock up the rest of the infrastructure.
Ryan took Google’s of V8 JavaScript engine and extended it so you can handle long latency events. Without locking up the server.
Ryan does a fine job introducing node.js in a 1 hour Google Tech Talk last week. He outlined how to build a server that can handle 10,000 or more users. His goal with node.js was to make it easy to write high-performance servers.
There is an arms race out there for performance – Google, Apple, Mozilla, Opera, Microsoft – to win the hearts and eyeballs of hundreds of millions of consumers. Fickle consumers.
Node.js only exposes nonblocking asynchronous interfaces to the programmer. It has very few abstractions. Its power lies in the fact that it moves you away from certain interfaces like synchronous I/O that you shouldn’t do.
You don’t have to worry about some event completing and taking over while you’re in the middle of something else. Each node.js is a single thread. If you want to do more work you start multiple node.js instances and let the kernel do the load balancing.
Memory isolation is enforced at the process boundary. The kernel manages it, not the coder. That’s a good thing.
The StorageMojo take
Latency is the app killer of the cloud. The current cloud focus on write once/read never apps reflects that.
The fight against latency proceeds on many fronts: storage; network; CPU; and software. Asankya and others have good ideas for reducing Internet latency. Flash architectures are undergoing rapid evolution. Multicore and multiprocessor servers are attacking throughput.
Node.js is a big step in the right direction. Removing the dependency is that synchronous I/O create means any more resilient and higher performance infrastructure. Ryan reports that a Japanese website is already running several hundred thousand users on node.js instances.
As for Bryan, he’ll bring the same intelligence and energy to Joyent that he brought to Dtrace and Fishworks. Expect more great things.
Courteous comments welcome, of course. Update: The other smart guys behind Dtrace are the redoubtable Adam Leventhaland Mike Shapiro.
I’m afraid others already follow his idea. I start to see lots of people leaving Oracle/Sun lately. Cisco and others are grabbing them in sales to promote UCS.
From what I saw with node.js it is a promising stack. System build with long latency in mind. Clever!
If you have a chance look at cloud service onlive.com. Latency is also part of their design. Demo at http://video.allthingsd.com/
Cantrill isn’t exactly a great catch for Joyent. Sun’s “open storage” DTRACE GUI was a disaster in terms of how much price premium they were expecting. I mean, there wasn’t that much magic behind their 7410 line, it was just a regular server attached to a bunch of JBODs with some pretty Dtrace graphics on a HTTP server, something NexentaStor copied over in a few months.
There are far better catches inside Oracle: Mike Shapiro, Adam Leventhal, Brendan Gregg, Eric Schrock, just to name a few. Nexenta is likely to grab a few more of them.
Wonderful insightful post.
Fits right in with your continuing theme of instrumenting the Lower Metrics. An accomplishment devoutly to be desired. People keep talking about boxen like those are why you are there and how you make money. Boxen are 100% TCO.
Take how you get to work everyday. Most people would prefer to drive their luxury vehicle of choice and have assigned, covered parking at the destination. This doesn’t happen for many people.
Walk, bike, bus, train, etc. Depends on your pocketbook, preferences and opportunities.
So does good management of Information. “Good Management of Information” is Operationally Defined as proper (a bit more than just the required if you can afford it) care, feeding, retrieval, presentation and graceful End-of-Life. This Information has worked hard for you and paid your bills.
Have you seen an infographic like this for storage?
http://cohort11.americanobserver.net/latoyaegwuekwe/multimediafinal.html
This one is supposed to be from government records but I have not verified that. Looking for a job is tough in this economy, and this map shows why. Runs from Jan 2007 to May 2019 and is a very graphic presentation of what has happened. A very interesting use of Infographics.
We need some Infographics like this to visualize the IT infrastructures and what is happening to Information and Information flow. An Event Management System would do this and more.
I have tried selling several people on having a product like this to display both a dynamic and static timeline of their IT infrastructure. Of particular interest would be the “Lower Metrics” visual of the Event Management System like is mentioned in this

.
I thought years ago we would have something great by now. I was really encouraged by EMC’s MCC but the wheels came off that project, as well as many others. There have been many others.
The HighGround SRM was a thing of beauty until Sun bought it. Just like the Amiga or OS/2. Commodore killed the Amiga through lack of vision and ability and IBM killed OS/2 to protect other products.
Seems it only takes one really strong personality to derail many good ideas. The old NIH dilemma.
I really appreciate your twitter and the article pointed to http://twitter.com/StorageMojo/status/19690590745 . Starting with your twitter comment I could not stop reading it and its implications for IT between now and 2030. We need those Lower Metrics instrumented.
Big changes are coming…