VMworld next week

by Robin Harris on Friday, 26 August, 2016

The StorageMojo crack analyst team is busy polishing their cowboy boots and ironing their jeans to get respectable (why now?) for next week’s VMworld in Las Vegas. Las Vegas is a short – by Western standards – 4 to 5 hour drive from the high pastures of northern Arizona, and a favorite place for the boys to let off some steam.

The StorageMojo take
Looking forward to catching up with storage in the virtual world, especially after missing the Flash Memory Summit. Please leave a comment if you’d like to meet.

Courteous comments welcome, of course.


Excel may be dangerous to your health – and your nation

by Robin Harris on Friday, 26 August, 2016

Over on ZDNet I’ve been doing a series looking at the issues we face incorporating Big Data into our digital civilization (see When Big Data is bad data, Lying scientists and the lying lies they tell, and Humans are the weak link in Big Data. I’m not done yet, but I wanted to share a couple of cautionary Excel tales.

The latest comes by way of a paper Gene name errors are widespread in the scientific literature. The researchers

. . . downloaded and screened supplementary files from 18 journals published between 2005 and 2015 using a suite of shell scripts. Excel files (.xls and.xlsx suffixes) were converted to tabular separated files (tsv) with ssconvert (v1.12.9). Each sheet within the Excel file was converted to a separate tsv file. Each column of data in the tsv file was screened for the presence of gene symbols.

Result: 20% of the papers had errors. Specifically

In total, we screened 35,175 supplementary Excel files, finding 7467 gene lists attached to 3597 published papers. We downloaded and opened each file with putative gene name errors. Ten false-positive cases were identified. We confirmed gene name errors in 987 supplementary files from 704 published articles

The cause?

The problem of Excel . . . inadvertently converting gene symbols to dates and floating-point numbers was originally described in 2004 [1]. For example, gene symbols such as SEPT2 (Septin 2) and MARCH1 [Membrane-Associated Ring Finger (C3HC4) 1, E3 Ubiquitin Protein Ligase] are converted by default to ‘2-Sep’ and ‘1-Mar’, respectively. Furthermore, RIKEN identifiers were described to be automatically converted to floating point numbers (i.e. from accession ‘2310009E13’ to ‘2.31E+13’). Since that report, we have uncovered further instances where gene symbols were converted to dates in supplementary data of recently published papers (e.g. ‘SEPT2’ converted to ‘2006/09/02’).


Nation unbuilding
Another, older, Excel misadventure occurred in Ken Rogoff’s and Carmen Reinhart’s paper, Growth in a Time of Debt, which was the intellectual justification for widespread national austerity in the last 7 years. That austerity put millions of people out of work and slowed – and in some cases reversed – economic recovery after the Great Recession.

Too bad for the unemployed who lost homes, life savings, families, and self-respect, but the academics made some key Excel mistakes that weren’t uncovered until a grad student tried to replicate their results. As this piece in The Atlantic notes the paper itself was suitably conservative, but the academics oversold their results to Congress and other policy-making bodies.

The StorageMojo take
Given that the genetic issue was first identified in 2004, it is unsettling that Microsoft, with its vast resources and world-class research organization, hasn’t been proactive in helping Excel users avoid these issues. Word has a grammar checker, and helping users avoid common mistakes seems doubly applicable to numerical data that most readers assume is correct because, after all, the computer did it.

Perhaps a smarter Excel would have noted that Rogoff failed to include five countries in the data set in the final calculations – and maybe a neural-net data checker could flag problems like that – but it isn’t the Excel team’s fault that economists oversold their faulty results. Publishing the spreadsheets along with papers – as they do in genome research – would be a help.

But the larger takeaway is that while our computers are usually accurate our human brains are riddled with cognitive and logical bugs. While Computer-Assisted-Everything has enormous potential, we must remember to keep our BS detectors tuned up and running.

Courteous comments welcome, of course.

{ 1 comment }

NetApp’s surprising Q1

by Robin Harris on Tuesday, 23 August, 2016

NetApp’s Q1 was a happy surprise for Wall Street: earnings blew past estimates and the stock spiked over 16%. But the quarterly 8k report was more downbeat.

Product revenues
Net revenue was down $41 million year over year. Products the company calls Strategic – presumably hybrid cloud and flash, but not defined in the 8k – were up $77m YOY, but the Mature products were down $81 leaving the product segment down slightly YOY.

Flat product sales don’t sound too bad, given the wrenching turnaround CEO George Kurian is trying to execute. But gross margins have dropped to the mid-40s, down over 300 basis points YOY, evidence that NetApp is buying business to keep product revenues up.

Maintenance warning
The bigger problem was hardware maintenance contracts, down $23m – over 7% – YOY. Maintenance revenues are tied to annual contracts, and thus highly predictable.

Given that total product sales were relatively flat, this says that either old kit is coming off contract and not being replaced, OR, that new kit isn’t going on lucrative 24/7 contracts. Since gross margins are up almost 400 basis points – from 64.1% Q1/15 to 67.9% Q1/16, this also suggests that NetApp is milking the base – and they don’t like it.

Cash bonfire
Over the last quarter NetApp has burned through more than $1.2 billion in total assets. Presumably much of that – $850m – went to pay back a short term loan used to buy SolidFire.

But that leaves another $350 million, of which $228m was spent on stock buybacks and dividends – which is more than they put into R&D. Propping up the stock price more important than new products?

The StorageMojo take
All NetApp has to do is build products that customers want to buy and that have good margins. The growth in strategic product sales – whatever those are – says they are having some luck with customers, but the gross margins suggests they are buying the business rather than winning it.

That’s not a good long-term strategy.

While the company has some good news to share – they won the Flash Memory Summit Best of Show for Customer Implementation, and all-flash NAS and unified SAN/NAS arrays in the 2016 IT Brand Leader Survey – the brand has been badly damaged.

Missing the flash transition – which was obvious from the first Fusion-io demo – was beyond clueless.

The movement of storage to inside servers is a fundamental threat to network storage on the high-end.

Object storage threatens the NAS market from the active archive side, whether cloud or on-premise.

Finally, bringing in a new Marketing VP from IBM during the SolidFire integration is a risky move. Clearly, NetApp’s culture needs a good shaking, but whatever caused multiple flash projects to fail is at the core of the problem, with training sales to sell multiple product lines next.

The only strategy I see for NetApp going forward is to be acquired for their world-wide sales and service organization. Perhaps one of the cloud vendors that they are courting will take the plunge.

Courteous comments welcome, of course.


World’s largest manufacturer of vinyl records

by Robin Harris on Monday, 22 August, 2016

A story from the byways of data storage.

Vinyl audio records have been making something of a comeback. Fans prefer the sound, and DJs like to “scratch” them, which is pretty cool the first hundred times you hear it.

A series of pieces in the UK paper the Guardian, describes the current state of vinyl, including a Czech Republic firm – a holdover from Communist days – that is now the world’s largest presser of vinyl disks. The company, once down to 500 employees, now employs 2,000, and business is growing to the tune of 25 million records this year.

Another piece describes the world’s oldest record store. Based in Cardiff, Wales, UK, Spillers Records has been in business since 1894, back when records were cylinders, not disks.

A third piece covers the 6 million records of Brazilian Zero Freitas’ collection.

And finally, since vinyl has become so popular, there are again best seller charts that cover vinyl records.

The StorageMojo take
Imagine that: emotional attachment to a storage medium. I used to have a few hundred LPs, but now my collection is down to a couple of dozen rarities and sentimental favorites. I even have a dust-covered turntable somewhere.

I lament the loss of the large canvas for album art that 33 1/3rd RPM Long Playing records offered. But like most people I was happy to graduate to CDs for their convenience, even though I taped my LPs so wear was not an issue.

Now, of course, I have thousands of MP3 tracks, most from my ripped CDs.

While the vinyl revival might appear irrational, I note that almost no one is agitating for the return of 8 track tape or wax cylinders. There is something intrinsically satisfying about watching the tone arm’s progress down that one long groove on a slowly rotating disk.

And, of course, the respect and ceremony in removing a record from its jacket and sleeve, placing it on the turntable, and dropping the needle on the groove. Now I can punch up Bob Wills Take Me Back to Tulsa or Florence + The Machine’s Kiss With A Fist in a few seconds.

Instant gratification, yes. Reverence, not so much. I hope the vinyl LP has a long life.

Courteous comments welcome, of course.

{ 1 comment }

Flash Memory Summit next week

by Robin Harris on Monday, 1 August, 2016

And sad to say, for the first time in years, StorageMojo won’t be there. Dang it!

A physical condition is cramping my style. It’s temporary and will be fixed by early next year.

So I’ll be looking for whatever gets posted online, but missing the show floor.

The StorageMojo take
For a few years the best storage shows weren’t focussed on storage. VMworld took the crown because virtualization and its storage problems were all the rage. But the Flash Memory Summit I went to last year was even better than VMworld.

I have no doubt FMS will be an even stronger storage show this year. The flash revolution continues.

Courteous comments welcome, of course.


A look at Symbolic IO’s patents

by Robin Harris on Friday, 22 July, 2016

Maybe you saw the hype:

Symbolic IO is the first computational defined storage solution solely focused on advanced computational algorithmic compute engine, which materializes and dematerializes data – effectively becoming the fastest, most dense, portable and secure, media and hardware agnostic – storage solution.

Really? Dematerializes data? This amps it up from using a cloud. What’s next? Data transubstantiation?

I haven’t talked to anyone at Symbolic IO, though I may. In general I like to work from documents, because personal communications are low bandwidth and fleeting, while documents can be reviewed and parsed.

So I went to look at their patents.

Fortunately the founder, Brian Ignomirello, has an uncommon name, which makes finding his patents easy. There are two of particular interest: Method and apparatus for dense hyper io digital retention and Bit markers and frequency converters.

The former seems to have had a lot of direct input from Mr. Ignomirello, as it is much easier to understand than the usual, lawyer-written patent. How do patent examiners stay awake?

The gist
There are two main elements to Symbolic IO’s system:

  • An efficient encoding method for data compression.
  • A hardware system to optimize encode/decode speed.

The system analyzes raw data to create a frequency chart of repeated bit patterns or vectors. These bit patterns are then assigned bit markers, with the most common patterns getting the shortest bit markers. In addition, these patterms are further shortened by assuming a fixed length and not storing, say, trailing zeros.

Since the frequency of bit patterns may change over time, there is provision for replacing the bit markers to ensure maximum compression with different content types. Bit markers may be customized for certain file types, such as mp3, as well.

Symbolic IO’s patent for digital retention discusses how servers can be optimized for their encoding/decoding algorithms. Key items include:

  • A specialized driver.
  • A specialized hardware controller that sits in a DIMM slot.
  • A memory interface that talks to the DIMM-based controller.
  • A box of RAM behind the memory interface.
  • Super caps to maintain power to the RAM.

Lots of lookups to “materialize” your data, so using RAM to do it is the obvious answer. Adding intelligence to a DIMM slot offloads the work from the server CPU.

Everything else is normal server stuff. Here’s a figure that shows what is added to the DIMM socket.

Diagram showing where Symbolic IO adds hardware to a server.

Diagram showing where Symbolic IO adds hardware to a server.

The StorageMojo take
Haven’t seen any published numbers for the compression ratio, but clearly such a system could far exceed Shannon’s nominal 50% compression. I can even see how it could further compress already compressed – and therefore apparently random and uncompressable – bit streams.

Reconstructing the data from a cache kept in RAM on the memory bus to achieve extreme data rates would be possible.

The controller in a DIMM slot is genius – and it won’t be the last, I’m sure. That’s the fastest bus available to third parties, so, yeah! Super caps for backup power? Of course!

Concerns? Much depends on the stability of the bit patterns over time. Probably a great media server. The analytical overhead required to develop the dictionary of bit patterns could make adoption problematic for highly varied workloads. But who has those?

Also, all the data structures need to be bulletproof, or you’ve got very fast write only storage.

Marketing: pretty sure that “dematerialize my Oracle databases” is not on anyone’s To Do list. Love to see some benchmarks that back up the superlatives.

But over all, a refreshingly creative data storage architecture.

Courteous comments welcome, of course.


Bandwidth reduction for erasure coded storage

July 12, 2016

In response to Building fast erasure coded storage, alert readers Petros Koutoupis and Ian F. Adams noted that advanced erasure coded object storage (AECOS) isn’t typically CPU limited. The real problem is network bandwidth. It turns out that the same team that developed Hitchhiker also looked at the network issues. In the paper A Solution […]

3 comments Read the full article →

Building fast erasure coded storage

July 11, 2016

One of the decade’s grand challenges in storage is making efficient advanced erasure coded object storage (AECOS) fast enough to displace most file servers. Advanced erasure codes can give users the capability to survive four or more device failures – be they disks, SSDs, servers, or datacenters – with low capacity overhead. By low I […]

3 comments Read the full article →

The top storage challenges of the next decade

July 6, 2016

StorageMojo recently celebrated its 10th anniversary, which got me thinking about the next decade. Think of all the changes we’ve seen in the last 10 years: Cloud storage and computing that put a price on IT’s head Scale out object storage. Flash. Millions of IOPS in a few RU. Deduplication. 1,000 year optical discs. There’s […]

10 comments Read the full article →

July 4th, 2016: Mormon Canyon

July 4, 2016

July 4th is when the United States of America celebrates the signing of the Declaration of Independence. For most Americans Independence Day is the most important secular holiday of the year. Of course, July 4th wasn’t the actual date of the signing – July 2nd was – but no matter. Of greater interest is the […]

1 comment Read the full article →

Meeting young Mr. Trump

June 30, 2016

Back in 1980 I met Donald Trump. He came to a finance class to talk about real estate finance. I have no recollection of his talk. But I DO remember the visit and, given what I’ve read about Mr. Trump, some readers may find my recollection an interesting footnote. Ivana To set the scene, this […]

8 comments Read the full article →

Enterprise storage goes inside

June 20, 2016

Some interesting numbers out of IDC by way of Chris Mellor of the Reg. First up: the entire enterprise storage market in the latest quarter: Note that HPE is #1. Then the numbers for the external enterprise storage market: HPE is now #3 with $535.7 million. The difference is internal storage That means that HPE […]

3 comments Read the full article →

Hike blogging: Hog Heaven trail

June 12, 2016

Mountain biking is very popular in the surrounding national forest. Enthusiasts have built many challenging bike trails, which I like to hike – not bike. I’ve never broken a bone and don’t intend to start now. Hog Heaven is one of the toughest of the local trails, with a double black diamond rating. Yesterday morning […]

0 comments Read the full article →

EMC perfumes the pig

June 10, 2016

I feel sorry for EMC’s marketers: they have to make 10-20 year old technology seem au courant. It’s an uphill battle, but that’s why they get the big bucks. The latest effort to perfume the pig – hold still, dammit! – is EMC Unity. In a piece that – and this is a sincere compliment […]

7 comments Read the full article →

Commoditizing public clouds

June 8, 2016

I’m a guest of Hewlett-Packard Enterprise at Discover 2016 in Las Vegas, Nevada this week. I enjoy catching up with the only remaining full-line computer company. HP was a competitor in my DEC days, and since the Compaq purchase they incorporate the remains of DEC as well. One of their themes this year is multi-cloud […]

4 comments Read the full article →