How high redundancy can hurt availability

by Robin Harris on Monday, 24 July, 2017

I wrote about how clouds fail on ZDNet today, but there was another wrinkle in the paper that I found interesting: high redundancy hurts. Counter intuitive?

This comes from the paper Gray Failure: The Achilles’ Heel of Cloud-Scale Systems, by Peng Huang, Chuanxiong Guo, Lidong Zhou, and Jacob R. Lorch, of Microsoft Research, and Yingnong Dang, Murali Chintalapati, and Randolph Yao, of Microsoft Azure. The paper explores the “gray failure” problem, where component failures are subtle, often intermittant, and thus are difficult to detect and correct.

Go read the ZDNet piece to get the gist of their findings. This post focuses on the problem of redundancy reducing availability.

Department of redundancy department
Cloud networks are configured with high redundancy to better tolerate failures. A switch stoppage is usually a non-event because the protocols re-route packets through other switches. Thus redundancy increases availability in the case of a switch failure.

But some switch failures are intermittant gray failures: random and silent packet drops. The protocols see the dropped packets and resend them, so the packets are not re-routed. But the applications see increased latency or other glitches as those lost packets are resent.

Let’s say your cloud has a front-end server that fans out a request to many back-end servers, and the front-end must wait until almost all of the back-end servers respond. If you have 10 core switches that fan out to 1000 backend servers, you have an almost 100% chance that a gray failure at any core switch will delay nearly every front-end request.

Thus, the more core switches you have, the more likely you are to have a gray failure, and, with a high fan-out factor, the more likely you are to have a gray failure that delays nearly every front-end request.


The StorageMojo take
The paper is a highly recommended read if you architect for or rely upon one of the major cloud vendors, especially if your main focus is software. While human errors are a major cause of cloud outages, the authors make the point that undetected gray failures tend to accumulate over time, stressing the healthy infrastructure, and can lead to cascading failures and a major outage.

As anyone experienced with hardware can tell you, gray failures are regretably common, and a total bear to diagnose and correct. The late, great Jim Gray coined the term Heisenbugs to describe them, because, like quantum particles, they behave differently when you try to observe them.

The bigger lesson of the paper though is that scale changes everything. Even the kinds of bugs that can take 100,000 server system down.

Courteous comments welcome, of course. If you’re a cloud user, have you seen behavior that that gray failures might explain. Please comment!


Hike blogging: 07-17-2017

by Robin Harris on Monday, 17 July, 2017

Hike blogging has been on hiatus for several reasons, including no good pictures, packing up for a short move, too much rain – it’s monsoon time now – and I’ve been getting back to biking as well.

But this morning got out at 630 on to the Twin Buttes/Hog Heaven/Hog Wash loop. It’s about 4.5 miles, with about 370 feet of vertical.

The Hog Heaven portion is a double black diamond mountain bike trail. Given that I find it a little hairy on foot, I can’t imagine how skilled – or crazy – you have to be to bike it.

But the views were fabulous in the early morning light. Here’s one:

Click to enlarge.

The StorageMojo take
Let me know if you come to town. It’s a beautiful place and well worth a visit. Happy to recommend hikes and places to go in town for food, wine, music, and art.

Courteous comments welcome, of course.


Flash Memory Summit next month

by Robin Harris on Monday, 17 July, 2017

StorageMojo’s crack analyst team will be attending next months Flash Memory Summit. The dates are August 8-10, at the Santa Clara Convention Center.

Wasn’t able to attend last year, but the 2015 summit was the best storage show I’d seen in years. Flash is where the action is, with NVRAM coming along as well.

I’ve got a couple of meetings scheduled, but if your company is doing something early stage, I’d like to talk to you. Comment below to set up a meeting. I won’t publish invites.

The StorageMojo take
With flash products moving into maturity, StorageMojo is really interested in NVRAM technologies and in how they are affecting system architectures. Especially interested in emerging concepts.

Courteous comments welcome, of course.


The moving target problem

by Robin Harris on Tuesday, 11 July, 2017

With the news that Toshiba has developed 3D quad-level cell flash with 768Gb die capacity, I’m reminded of the moving target problem. This is a problem whenever a new technology seeks to carve out a piece of an existing technology’s market.

Typically, a startup seeks funding based on producing a competitive product in, say, two years. Good analysis will allow for the fact that competition will improve, typically based on then-current improvement trends.

Often two things happen to derail the projections. The most likely is that the new product development cycle slips out, so when the product ships it is up against another 6-18 months of incumbent improvement.

But sometimes the pace of incumbent improvement rises, so even if the newtech meets its schedule projections – when does THAT ever happen? – it is still facing a tougher competitor than planned.

Disk vs flash
Flash had this problem for a couple of decades with disk. In the early 90s I bought an HP Omnibook 300 and forked over another $400 for a 10MB Compact Flash card to replace the power hungry disk. Some flash proponents probably hoped this was the beginning of a trend.

But it was not to be. Disk vendors discovered how to increase bit density on a regular basis, and disk capacities and areal densities started rising at ≈40% a year. They also built rugged 2.5″ drives for the burgeoning notebook market, and invested in power-saving technologies.

That helped keep flash at bay for another 15 years.

But finally, the flash cost-per-bit dropped below that of DRAM, and the floodgates opened. Flash won the smartphone market, which powered investment in huge fabs, and soon flash prices were dropping faster than disks.

But the key was that flash found niches that disks could not serve. And when one of those niches exploded into industry-altering size, the economics of critical mass and mass production kicked in.

I’ve been following NVRAM with great interest for years. That’s partly due to interest in what it could mean for system architecture, but also for its potential as a substitute for flash.

While it’s clear that the NAND flash cost advantage is good for the next decade, it’s also clear that flash has been shoehorned into applications – such as caches – for which it is suboptimal. NVRAM will encroach around the edges of the flash market, not the heart.

MRAM, for example, is already doing a good business in the automotive and mil-spec sectors, because it is really tough. Diablo’s current hybrid NVDIMMs – combo DRAM and flash – could certainly benefit from a pure NVRAM solution if the price was right.

The key is that NVRAM’s sweet spot is well away from flash’s cost-per-bit and density sweet spots. A fact that Toshiba’s announcement exemplifies.

The StorageMojo take
Watching how flash and NVRAM interact in the marketplace over the next decade will be instructive for students of technology diffusion. The two technologies are close in some ways, but differ dramatically in others, so simple flash out/NVRAM in stories are will be the exception.

That also ignores the potential creativity of architects and engineers as they explore the capabilities of new kinds of NVRAM. Or the potential for a new class of devices that drive NVRAM adoption, as the smartphone drove flash.

In any case the calculus of the moving target will remain. To the nimble go the spoils.

Courteous comments welcome, of course.


Why startups fail

by Robin Harris on Wednesday, 21 June, 2017

A great piece at CB Insights. They collected the failure stories of 101 startups and then broke those failures into 20 categories.

Spoiler alert!
Here are the top 10 reasons for failure, as compiled by CB Insights.

Click to enlarge.

What I find interesting is that 8 of the top 10 reasons are marketing related.

  • No market need.
  • Get outcompeted.
  • Pricing, cost issues.
  • Poor product.
  • Need, lack, business model.
  • Poor marketing.
  • Ignore customers.
  • Product mis-timed.

Across the cultural divide
Tech founders tend to be techies, and techies tend to have a problem with folks of the sales/marketing persuasion. One problem is that many marketing people don’t really understand the technology they are marketing, which means they can’t be full partners to the tech team.

Another problem is that marketing people tend to be well-versed in the arts of persuasion. If the marketer takes a position, especially in regards to technology they don’t appreciate, they can easily steer the startup in the wrong direction.

Plus, every techie has a story where they’ve felt misled by a sales or marketing person, and that anger or regret can bleed into professional relationships in a startup.

Finally, techies rarely have a handle on what to look for in their marketing hires. Based on more than 35 years experience, StorageMojo has a suggestion.

The StorageMojo take
My sympathies are with the engineers when it comes to their feelings about marketing. As I said in the link above:

They’d get flayed for every decommit and slip. They’d sweat blood figuring out solutions to hundreds of subtle problems.

Then, after 2 to 3 years of effort, they’d deliver the product to marketing and, all too often, watch their hard work go for naught.

Maybe marketing missed some key features. Didn’t position the product properly. Training failed to equip the field. Mis-pricing. Tougher competition than expected.

That last paragraph captures many of the issues that CB Insights survey did. Which shouldn’t be a surprise.

Startups exist to sell a product. Development is only a means to that end.

Courteous comments welcome, of course. Disclosure: I offer services to help startups with every phase of product development.


A transaction processing system for NVRAM

by Robin Harris on Monday, 19 June, 2017

Adapting to NVRAM is going to be a lengthy process. This was pointed out by a recent paper. More on that later.

Thankfully, Intel wildly pre-announced 3D XPoint. That has spurred OS and application vendors to consider how it might affect their products.

As we saw with the adoption of SSDs, it takes time to unravel the assumptions built into products. Take databases: they spent decades optimizing for hard drives, and when SSDs came along many of those optimizations became detrimental.

Durable transactions
On the face of it it shouldn’t be that hard. You want a durable transaction, you have persistant NVRAM. Are we good here?


In a paper published by Microsoft Research, DUDETM: Building Durable Transactions with Decoupling for Persistent Memory, the authors (Mengxing Liu, Mingxing Zhang, Kang Chen, Xuehai Qian, Yongwei Wu, Jinglei Ren) go into the issues:

While persistent memory provides non-volatility, it is challenging for an application to ensure correct recovery from the persistent data on a system crash, namely, crash consistency. A solution . . . is using crash-consistent durable transaction[s]. . . .

Most implementations of durable transactions enforce crash consistency through logging. However, the. . . dilemma between undo and redo logging is essentially a trade-off between update redirection cost and persist ordering cost.

The authors make a bold claim:

[O]ur investigation demonstrates that it is possible to make the best of both worlds while supporting both dynamic and static transactions. The key insight of our solution is decoupling a durable transaction into three fully asynchronous steps.

To create a fully decoupled transaction system for NVRAM, the researchers made three key design decisions.

  • A single, shared, cross-transaction shadow memory.
  • An out of the box Transaction Memory.
  • A redo log as the only way to transfer updates from shadow memory to persistent memory.

These design choices enabled building an ACID transaction in three decoupled, asynchronous, steps.

  • Perform: execute the transaction in a shadow memory, and produce a redo log for the transaction.
  • Persist: flush the redo log of each transaction to persistent memory in an atomic manner.
  • Reproduce: modify original data in persistent memory according to the persisted redo log.

The paper is lengthy and a recommended read for those professionally interested in transaction processing on NVRAM. But here’s their performance summary.

Our evaluation results show that DUDETM adds guarantees of crash consistency and durability to TinySTM by adding only 7.4% ∼ 24.6% overhead, and is 1.7× to 4.4× faster than existing works Mnemosyne and NVML.

The StorageMojo take
As we’ve seen with the transition from hard drives to SSDs, unwinding decades of engineered-in assumptions in the rest of stack is a matter of years, not months. There’s the issue of rearchitecting basic systems, such as transaction processing, or databases, and then the hard work of stepwise enhancement of those new architectures as we gain knowledge about how they intersect with the new technology and workloads.

There are going to be many opportunities for startups that focus on NVRAM. The technology is coming quickly and with more technology diversity – there are several types of NVRAM already available, with more on the way, and each has different trade-offs – which means that the opportunities for creativity are legion.

Courteous comments welcome, of course.


A distributed fabric for rack scale computing

June 12, 2017

After years of skepticism about rack scale design (RSD), StorageMojo is coming around to the idea that could will work. It’s still a lab project, but researchers are making serious progress on the architectural issues. For example, in a recent paper, XFabric: A Reconfigurable In-Rack Network for Rack-Scale Computers Microsoft Researchers Sergey Legtchenko, Nicholas Chen, […]

1 comment Read the full article →

Infinidat sweetens All Flash Array Challenge

June 6, 2017

In response to yesterday’s StorageMojo post on Infinidat, Brian Carmody of Infinidat tweeted: Robin, Verde Valley is a great organization. @INFINIDAT will donate $10K for every Infinidat Challenge customer who mentions your blog post. — Brian Carmody (@initzero) June 5, 2017 Thanks, Brian! The StorageMojo take Verde Valley Sanctuary is a fine organization that StorageMojo […]

0 comments Read the full article →

Infinidat’s sweet AFA challenge

June 5, 2017

StorageMojo has observed, many times, that great marketing of a mediocre product beats mediocre marketing of a great product all the time. Thus it is always of interest when someone comes up with an innovative marketing wrinkle. That’s what Infinidat has done with their Faster than all flash challenge. Their claim is that their system […]

5 comments Read the full article →

Hike blogging: Devils Creek Road

June 3, 2017

Taking a vacation from the usual slog in NoAZ. I’m some 60 miles north of Seattle, working on my rain tan. The weatherman claims we’ll break 70 degrees sometime during my visit, but I’m not counting on it. Occasional patches of blue sky remind me of what is possible, if not likely. Took a 4.5 […]

0 comments Read the full article →

Routing the I/O stack

May 30, 2017

Lots of energy around the concept of Rack Scale Design (Intel’s nomenclature) in systems design these days. Instead of depositing a cpu, memory, I/O, and storage on a single motherboard, why not have a rack of each, interconnected over a high-bandwidth, low-latency network – PCIe is favored today – and use software to define bundles […]

6 comments Read the full article →

Liqid’s composable infrastructure

May 8, 2017

The technology wheel is turning again. Yesterday it was converged and hyperconverged infrastructure. Tomorrow it’s composable infrastructure. Check out Liqid a software-and-some-hardware company that I met at NAB. The software – Element – enables you to configure custom servers from hardware pools of compute, network, and, of course, storage. I met Liqid co-founder Sumit Puri […]

1 comment Read the full article →

NAB 2017 storage roundup

May 4, 2017

Spent two days at the annual National Association of Broadcasters (NAB) confab in Las Vegas. With 4k video everywhere, storage was a hot topic as well. Here’s what caught my eye. Object storage – often optimized for large files – continues to be a growth area. Scality, Dynamic Data Pool, Object Matrix, HGST, Data IO, […]

0 comments Read the full article →

Is NetApp still doomed?

April 20, 2017

A reader wrote to ask for the StorageMojo take on NetApp now, as opposed to the assessment in How doomed is NetApp? two years ago. Q3 had some good news for NetApp. In their latest 10Q filing, they noted that while revenues for the first 9 months of the year were down 3%, for the […]

2 comments Read the full article →

Spin Transfer Technologies: next up in the MRAM race

April 19, 2017

MRAM technology is hot. I’ve written about Everspin – they’ve been shipping for years and just IPO’d – and now I’d like to introduce Spin Transfer Technologies. They’ve kept a low profile – they AREN’T shipping, are sampling protos, and they do have some nice Powerpoints. I spoke to their CEO, Barry Hoberman, and the […]

2 comments Read the full article →