Storage surprises at NAB 2016

by Robin Harris on Friday, 22 April, 2016

I did NAB a little differently this year: attended on Wednesday and Thursday, the last two days of the floor exhibits. Definitely easier, although many of the execs left Wednesday.


But that wasn’t a surprise. Here’s what did surprise me:

  • EMC seemed to have less of presence than in past years. I expected more.
  • HGST is pushing aggressively on its – and WD’s – systems business. They’re one to watch.
  • Thunderbolt 3 storage is definitely a thing: 40Gb/s of bandwidth essentially for free? Or 20Gb/s for even less? Of course!
  • Thunderbolt-based clusters may also be a thing. Need to learn more.
  • Several companies I hadn’t seen before: OpenIO, Quobyte, Symply, Glyph and Dynamic Drive Pool. The last had a good-sized booth and has been in business for 10 years – but I’d never heard of them.
  • Video – 4k/8k, drone, 360°, surveillance, streaming, phone – are all growing rapidly. OK, not 8k – yet.

The StorageMojo take
I’ll be writing some more about what I saw at NAB. I also asked a number of companies for briefings, including Pure.

There are some larger trends, beyond hardware, that I saw. The big one: complex storage systems are on the way out. More on that later.

Courteous comments welcome, of course.

{ 1 comment }

NABster 2016

by Robin Harris on Monday, 18 April, 2016

Tomorrow the top StorageMojo superforecasting analysts are saddling up for the long ride to the glittering runways of Las Vegas. The target: NAB 2016.

As much as I like CES, NAB is my favorite mighty tradeshow. It is toy show for people with very large budgets – and we all know who gets the best toys.

The StorageMojo take
If you have some storage coolness you would like to share with the world, shoot me a comment with your show floor address and I’ll come for a visit. Really. I’m looking for you!



by Robin Harris on Monday, 18 April, 2016

I see a forecast in your future
A few months ago I wrote about the best single metric for measuring marketing. That metric:

It’s the forecast, when compared to actuals. If the forecast is accurate to ±3%, you’ve got great marketing. If ±10% you’ve got good marketing.

So I was happy to see a book on the scientific study of forecasting. Short review: a great book for those curious about the future.

America spends over $50B annually on our various “intelligence” services. There’s no exact number because the budgets are classified.Superforecasting

The Intelligence Community includes the CIA, NSA, DIA and another dozen or so alphabet agencies. When agency heads aren’t busy lying to Congress, they’re working hard to plant backdoors in the Internet – another forecastable fiasco.

They also aren’t very good forecasters. They missed 9/11 – well, George W. Bush blew off their warning, too – but the worst was their “slam dunk” assessment of WMDs in Iraq. A trillion dollars and many thousands of casualties later, oopsie!

The Good Judgement Project
To their credit, the Intelligence Advanced Projects Agency (IARPA) decided to see what could be done to improve IC accuracy. That’s where the authors of Superforecasting – Penn prof Philip E. Tetlock and writer Dan Gardner – come in.

IARPA set up a prediction tournament and invited five scientific teams to make frequent and measurable predictions, with a control group and a team from IC that had all the intelligence that $50B can buy. The contest was supposed to run for 5 years, but the other teams were dropped after two because Tetlock’s team blew them all away, including the IC team.

The top two percent of Tetlock’s Good Judgement Project team excelled and, for the most part, continued to offer excellent forecasts for the remainder of the contest. The superforecasters weren’t geniuses either. Just bright, curious, flexible and careful to make fine distinctions. Yes, making fine distinctions makes you more accurate – among other best practices.

The StorageMojo take
In short, anyone who is a product or marketing manager at a tech company should be able to dramatically improve their forecast skills if they take the book’s lessons to heart. And it describes the issues that are common to bad forecasts: use it to amp up your BS detector.

The book is well-written – thanks Dan! – and is a gentle but thorough introduction to forecasting pitfalls, both psychological and statistical. There’s helpful appendix Ten Commandaments for Aspiring Superforecasters, but this is one case where the book is meaty enough to reward a careful read.

Highly recommended!

Courteous comments welcome, of course.


Smart storage for big data

by Robin Harris on Friday, 15 April, 2016

IBM researchers are proposing – and demoing – an intelligent storage system that works something like your brain. It’s based on the idea that it’s easier to remember important, like a sunset over the Grand Canyon, than the last time you waited for a traffic light.

We’re facing a data onslaught like we’ve never seen before. Some are forecasting that we’ll be generating more data than we’ll have capacity to store once IoT gets rolling.

But even with new tools, such as graph databases, making sense of massive amounts of raw data is getting harder every day. That’s why data scientists are in high demand.

Just as any software problem can be solved by adding a layer of indirection, any analytics problem can be solved by adding a layer of intelligence. Of course, we know a lot more about indirection than we do intelligence.

For IBM, a world leader in AI, as their success with Watson has demonstrated, applying intelligence to the problem of storage is a natural application.

Wheat vs chaff
In Cognitive Storage for Big Data (paywall), IBM researchers Giovanni Cherubini, Jens Jelitto, and Vinodh Venkatesan, of IBM Research—Zurich, described their prototype system. The key is using machine learning to determine data value.

If you’re processing IoT data sets, the storage system’s AI would have processed what is important about prior data sets and apply those criteria – access frequency, protection level, divergence from norms, time value, – to the incoming data. As the system watches human interaction with the data set, it learns what is important and tiers, protects and stores data according to user needs.

Experimental results
The researchers use a learning algorithm known as the “Information Bottleneck” (IB)

. . . a supervised learning technique that has been used in the closely related context of document classication, where it has been shown to have lower complexity and higher robustness than other learning methods.

IB, essentially, relates the information’s metadata values to cognitive system relevance values with the goal of preserving the mutual information between the two. The greater the mutual information, the more valuable the data and, hence, the higher the level of protection, access, and so on.

With relatively small data sets, the team found that

As the training set got larger, the accuracy for the smaller classes improved, reaching nearly 100 percent accuracy at around 30 percent of the training data included.

Sounds workable!

The StorageMojo take
An interesting question for the future from the paper is “. . . whether storage capacity growth will fall behind data-growth rates, meaning that the standard model of storing all data forever will no longer be sustainable because of a shortage of available storage resources.” The chances of that are, in my estimation, close to certain.

So enabling machine intelligence to trash data is probably the most essential issue and value of cognitive storage – and the capability most likely to frighten users. Establishing human trust in machine intelligence is a major domain problem – see Will Smith’s character in I, Robot.

Sure, you can schlep unlikely-to-be-needed data off to low cost tape – IBM is a leading tape drive vendor – but the “store everything forever” algorithm doesn’t scale – and if something won’t scale forever, it won’t.

The other issue – beyond the scope of the paper – is also scale-related: how large will the storage system need to be to justify the cost and overhead of cognition? Is this going to work at enterprise scale or will it be a purely web-scale service?

There have been sporadic attempts over the decades to add intelligence to storage systems, and they’ve all come to grief because the cost of the intelligence was higher than the cost of additional storage. Storage costs continue to fall faster than computational costs, creating a difficult economic dynamic for cognitive storage.

Nonetheless, the IBM team is doing important work. While the applications of machine intelligence are many, they aren’t infinite. Understanding its limits with respect to the foundation of any digital civilization – storage – is critical to our cultural infrastructure.

{ 1 comment }

Hike blogging: Soldiers Pass

by Robin Harris on Tuesday, 12 April, 2016

If you’ve been wondering why the dearth of hike blogging the last few months, wonder no more: I wasn’t hiking. A nasty bug made its way through Arizona and I caught it, thought I shook it, went to CES, and relapsed, big time.

So I’ve been taking it easy. But I’ve started up again, and here’s proof:

Click to enlarge.

Click to enlarge.

There’s an unofficial trail up to some caves – the arches – a couple of hundred feet up on the east side of Soldiers Pass canyon. The photo was taken just below the entrance to the caves, looking northwest.

Clouds make the sky so much more interesting than the usual unrelieved Arizona blue, and I love the sun-dappled rocks. A beautiful hike!


Qumulo comes of age

by Robin Harris on Tuesday, 12 April, 2016

Qumulo is crossing the chasm: they have 50 paying customers with over 40PB in production. Real production, not POCs.

That includes clusters from 4 nodes to more than 20 nodes with over 4PB at a large telco. They practice agile development with 24 software releases in the last year. Roughly a drop every two weeks.
Qumulo logo

Their model is 100% software delivered on commodity hardware and Linux, delivered 100% through the channel. Their key product is QSFS, the Qumulo storage file system.

What they do

  • Data-aware scale-out file & object storage software
  • Real-time analytics built into the file system for visibility into data footprint, usage, performance

Their key markets: commercial HPC, large scale unstructured data, and machine data. What’s missing? Oil and gas, but that’s due to the global collapse of oil prices, not their product.

So what’s new?
They’re announcing QSFS v2.0 or Qumulo Core 2. New features include

  • Reed-Solomon erasure coding. They’ve mirrored until now, so this will use current capacity more efficiently.
  • Capacity trending. Ever come in Monday morning and there’s no free capacity? Qumulo can tell you why.
  • New parallel rebuild of less protected data in a few minutes. A data rebuild, not a drive rebuild.
  • New appliance line with up to 260 TB of raw capacity per 4U node.

The StorageMojo take
While Qumulo and Data Gravity have a similar focus on data analytics, they address different markets. Qumulo’s scale out architecture favors large scale deployments while Data Gravity is an SMB appliance play. Room enough for both.

Isilon is Qumulo’s real competition. Qumulo’s modern architecture and data analytics gives them a significant lead over Isilon.

NetApp should buy Qumulo and keep it a separate product line – no “dis-integration” into OnTap – to fight Isilon and to provide a viable alternative to customers whose workloads aren’t amenable to cloud hosting. NetApp’s sales force could accelerate Qumulo’s growth, if the company can overcome their NIH mindset.

Courteous comments welcome, of course.


Plexistor’s Software Defined Memory

April 5, 2016

What is Software Defined Memory? A converged memory and storage architecture that enables applications to access storage as if it was memory and memory as if it was storage Unlike most of the software defined x menagerie, SDM isn’t simply another layer that virtualizes existing infrastructure for some magical benefit. It addresses the oncoming reality […]

2 comments Read the full article →

So, how much will Optane SSDs cost?

April 1, 2016

I opined recently on ZDNet that I expected Optane SSDs would come out at $2/GB. Josh Goldenhar of Excelero had a thoughtful rejoinder: . . . I think Octane will be more expensive. You mentioned $0.20/GB for flash – but I think that’s for SATA flash or consumer flash. The Intel NVMe p DC3XXX line […]

0 comments Read the full article →

StorageMojo’s 10th birthday

March 29, 2016

On March 29, 2006, published its first posts to universal indifference. The indifference didn’t last long: the second week of StorageMojo’s existence I published 25x Data Compression Made Simple. The post was /.’d and the vituperation rolled in claiming no such thing was possible: over 400 mostly negative comments on /. and dozens more […]

6 comments Read the full article →

CloudVelox: building a freeway into the cloud

March 23, 2016

You have a data center full of Windows and Linux servers running your key applications. How do you migrate them to the cloud; or, at the very least, enable cloud-based disaster recovery? That’s the question CloudVelox is trying to answer. Their software enables enterprises to move their entire software stack to a public cloud. I […]

2 comments Read the full article →

Integrating 3D Xpoint with DRAM

March 21, 2016

Intel is promising availability of 3D Xpoint non-volatile memory (NVM) this year, at least in their Optane SSDs. But Xpoint DIMMS are coming soon, and neither will give you anything close to 1,000x performance boost. In a recent paper, NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories Jian Xu and Steven Swanson of […]

0 comments Read the full article →

RRAM vendor inks deal with semi foundry

March 11, 2016

I’ve been a fan of RRAM – resistive random access memory – for years. It is much superior to NAND flash as a storage medium, except for cost, density and industry production capacity. Hey, you can’t have everything! But Crossbar announced today that they’ve inked a licensing deal with the Chinese Semiconductor Manufacturing International Corporation […]

0 comments Read the full article →

The cult of memory

March 11, 2016

Persistent digital storage is an absolute requirement for a persisting digital civilization – and I remain concerned that with the exception of M-disc we don’t have digital media with a 500 year life. But I also take a broader view of storage, including that which we carry around in our heads, which has way more […]

1 comment Read the full article →

Asymmetric innovation: legacy vs cloud

March 9, 2016

Here’s a good question: why are cloud vendors able to innovate so much faster than legacy vendors? AWS brags about the hundreds of new features and services they implement every year – and their accelerating pace. But here’s a better question: why don’t enterprise buyers accept the same level of innovation from legacy vendors as […]

3 comments Read the full article →


March 3, 2016

Why do we focus on I/O? Because our architectures are all about moving data to the CPU. But why is that the model? Because Turing and von Neumann? Universal Turing Machines (UTM) have a fixed read/write head and a movable tape that stores data, instructions and results. Turing’s work formed the mathematical basis for the […]

3 comments Read the full article →