A look at Symbolic IO’s patents

by Robin Harris on Friday, 22 July, 2016

Maybe you saw the hype:

Symbolic IO is the first computational defined storage solution solely focused on advanced computational algorithmic compute engine, which materializes and dematerializes data – effectively becoming the fastest, most dense, portable and secure, media and hardware agnostic – storage solution.

Really? Dematerializes data? This amps it up from using a cloud. What’s next? Data transubstantiation?

Patents
I haven’t talked to anyone at Symbolic IO, though I may. In general I like to work from documents, because personal communications are low bandwidth and fleeting, while documents can be reviewed and parsed.

So I went to look at their patents.

Fortunately the founder, Brian Ignomirello, has an uncommon name, which makes finding his patents easy. There are two of particular interest: Method and apparatus for dense hyper io digital retention and Bit markers and frequency converters.

The former seems to have had a lot of direct input from Mr. Ignomirello, as it is much easier to understand than the usual, lawyer-written patent. How do patent examiners stay awake?

The gist
There are two main elements to Symbolic IO’s system:

  • An efficient encoding method for data compression.
  • A hardware system to optimize encode/decode speed.

Encoding
The system analyzes raw data to create a frequency chart of repeated bit patterns or vectors. These bit patterns are then assigned bit markers, with the most common patterns getting the shortest bit markers. In addition, these patterms are further shortened by assuming a fixed length and not storing, say, trailing zeros.

Since the frequency of bit patterns may change over time, there is provision for replacing the bit markers to ensure maximum compression with different content types. Bit markers may be customized for certain file types, such as mp3, as well.

Optimizing
Symbolic IO’s patent for digital retention discusses how servers can be optimized for their encoding/decoding algorithms. Key items include:

  • A specialized driver.
  • A specialized hardware controller that sits in a DIMM slot.
  • A memory interface that talks to the DIMM-based controller.
  • A box of RAM behind the memory interface.
  • Super caps to maintain power to the RAM.

Lots of lookups to “materialize” your data, so using RAM to do it is the obvious answer. Adding intelligence to a DIMM slot offloads the work from the server CPU.

Everything else is normal server stuff. Here’s a figure that shows what is added to the DIMM socket.

Diagram showing where Symbolic IO adds hardware to a server.

Diagram showing where Symbolic IO adds hardware to a server.


The StorageMojo take
Haven’t seen any published numbers for the compression ratio, but clearly such a system could far exceed Shannon’s nominal 50% compression. I can even see how it could further compress already compressed – and therefore apparently random and uncompressable – bit streams.

Reconstructing the data from a cache kept in RAM on the memory bus to achieve extreme data rates would be possible.

The controller in a DIMM slot is genius – and it won’t be the last, I’m sure. That’s the fastest bus available to third parties, so, yeah! Super caps for backup power? Of course!

Concerns? Much depends on the stability of the bit patterns over time. Probably a great media server. The analytical overhead required to develop the dictionary of bit patterns could make adoption problematic for highly varied workloads. But who has those?

Also, all the data structures need to be bulletproof, or you’ve got very fast write only storage.

Marketing: pretty sure that “dematerialize my Oracle databases” is not on anyone’s To Do list. Love to see some benchmarks that back up the superlatives.

But over all, a refreshingly creative data storage architecture.

Courteous comments welcome, of course.

{ 2 comments }

Bandwidth reduction for erasure coded storage

by Robin Harris on Tuesday, 12 July, 2016

In response to Building fast erasure coded storage, alert readers Petros Koutoupis and Ian F. Adams noted that advanced erasure coded object storage (AECOS) isn’t typically CPU limited. The real problem is network bandwidth.

It turns out that the same team that developed Hitchhiker also looked at the network issues. In the paper A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster, K. V. Rashmi1, Nihar B. Shah1, and Kannan Ramchandran of UC Berkeley, and Dikang Gu, Hairong Kuang, and Dhruba Borthakur, of Facebook, looked at the problem of network overhead during data recovery.

Replicas vs Reed Solomon erasure codes
Recovering data from replicas is easy: copy it. Since three copies is the norm, the recovery process only minimally impacts operations.

With RS codes though, there is no replica. For a system that encodes k units of data with r parity units, all k data is recoverable from any k of (k+r) units.

Thus an HDFS system, such as Facebook commonly uses, that puts the data into 10 data units and 4 parity units, can survive the loss of 4 drives, servers, or even data centers – depending on how the units are distributed. That’s way better than RAID 5 or 6 on legacy RAID arrays.

But you see the problem: during data recovery the system has to read and transfer k units. And the units can be quite large – depending on the AECOS configuration – typically up to 256MB. In that case the system would transfer 2.56GB to recover one unit.

Of course, if it is a server failure, there will be many units to recover, bringing the typical top-of-rack switch to its knees. Here’s the data from a Facebook facility:

Click to enlarge.

Click to enlarge.

Piggyback on RS codes
As in the Building fast erasure coded storage post, the team added a one-byte stripe that saves around 30% on average in read and download volume for single block failures. With 256MB block sizes, recovery speed is limited by network and disk bandwidth, so the reduction should significantly reduce recovery time.

Added bonus: because recovery is quicker – and disk failures are correlated – the piggybacked RS code should be even more reliable than a straight RS code.

The StorageMojo take
Much appreciate the readers who pointed out the critical role of bandwidth in AECOS systems. I hope this discussion helps address my oversight.

Courteous comments welcome, of course.

{ 2 comments }

Building fast erasure coded storage

by Robin Harris on Monday, 11 July, 2016

One of the decade’s grand challenges in storage is making efficient advanced erasure coded object storage (AECOS) fast enough to displace most file servers.

Advanced erasure codes can give users the capability to survive four or more device failures – be they disks, SSDs, servers, or datacenters – with low capacity overhead. By low I mean 40% over the net stored data, rather than the 3x replication default for many object stores today.

Advanced erasure codes are capacity efficient, but at a price: computational and read latency overhead. Accessing the data requires a lot of processing, which is why most of these codes are used for archives, not active data.

Yet as data volumes rise faster than either areal density grows (for disks) or cost-per-bit drops (for SSDs), capacity efficiency will become critical. Despite multiple variables and leverage points, there is a growing need for capacity efficient storage with at least good, and preferably great, performance.

Hitchhiker
Since CPUs and networks are not getting (much) faster, the obvious place to look for the needed performance improvements are in the algorithms underlying advanced erasure codes. In the paper A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers, K. V. Rashmi1, Nihar B. Shah1, and Kannan Ramchandran of UC Berkeley, and Dikang Gu, Hairong Kuang, and Dhruba Borthakur, of Facebook, present

. . . Hitchhiker, a new erasure-coded storage system that reduces both network traffic and disk IO by around 25% to 45% during reconstruction of missing or otherwise unavailable data, with no additional storage, the same fault tolerance, and arbitrary flexibility in the choice of parameters, as compared to RS-based systems. Hitchhiker “rides” on top of RS codes, and is based on novel encoding and decoding techniques. . . .

Piggybacking
The unintuitive part of Hitchhiker is that it builds on top of existing Reed-Solomon codes. So how does adding more data to an existing code make it more, rather than less, efficient?

At this point, those with a professional interest should download the PDF for detailed explanation. Essentially, the piggyback framework uses finite field arithmetic to add one byte of data to impart new properties to the underlying RS code.

These added properties can be designed to achieve different goals. The team focussed on reconstruction efficiency.

The authors present three different codes to demonstrate the concept and to test for production efficiency. Two of the codes use only low-overhead XOR operations, while the third – and most efficient – requires complex finite field arithmetic.

Test implementation
The team implemented their algorithms in the Hadoop Distributed File System (HDFS), which is widely used at Facebook. They built on the HDFS-RAID mmodule using RS codes as normally deployed in Facebook infrastructure. Here’s a diagram of what they implemented:

Click to enlarge.

Click to enlarge.

Results
The team evaluated both computation times for encoding and degraded reads and read times for degraded reads and recovery. As expected, the additional computation overhead for encoding the different Hitchhiker variants is higher than straight RS codes.

Bottom line: substantial improvement in read times over traditional RS codes:

Click to enlarge.

Click to enlarge.

This graph compares encoding times:

Click to enlarge.

Click to enlarge.

The StorageMojo take
Most files aren’t accessed more than a handful of times. So why put them on costly high performance file servers?

Better to use commodity object storage with advanced erasure codes to get lower cost and higher availability than legacy active-active file servers can provide. Of course, but the performance penalty for advanced erasure coding has been a problem, as Cleversafe and others found.

Nonetheless, this paper demonstrates that significant progress is possible. Expect a decade of stepwise enhancements until AECOS displaces the vast majority of enterprise file servers.

Courteous comments welcome, of course. AECOS is a terrible acronym. Anyone have a better idea?

{ 3 comments }

The top storage challenges of the next decade

by Robin Harris on Wednesday, 6 July, 2016

StorageMojo recently celebrated its 10th anniversary, which got me thinking about the next decade.

Think of all the changes we’ve seen in the last 10 years:

  • Cloud storage and computing that put a price on IT’s head
  • Scale out object storage.
  • Flash. Millions of IOPS in a few RU.
  • Deduplication.
  • 1,000 year optical discs.

There’s more, like new file systems, advanced erasure coding, data analytics, and remote storage management. All great stuff, making storage more reliable, robust, and easier to manage.

But hey, that was then. This is now.

Don’t worry: the next decade is shaping up to be even more exciting and disruptive than the last. OK, some of you should worry.

Grand challenges
For the next decade the storage industry has a new set of challenges. With the flood of data, especially video and IoT, we’ll need more capacity, at lower cost, using fewer human cycles than ever before.

That implies a number of new market opportunities for storage entrepreneurs. And more emerging storage technologies!

What are these grand challenges? Here’s my list in no particular order:

  • Data-centric infrastructure. Hyper-converged is a good start, but not the end-game.
  • Eliminate backup. Finally.
  • Fast object storage. Make scale-out advanced erasure codes fast and efficient enough to enable object stores to displace file servers.
  • Autonomous storage. Storage with enough AI to manage itself, including deleting data.
  • NVRAM optimized CPUs, I/O stacks and storage systems.
  • Much lower I/O latencies.
  • High density, low access time archives. Even more active than today’s “active” archives.

The StorageMojo take
I expect to write about each of these in the coming years. But the fundamental driver is that we do IT for the information, not the infrastructure.

Now that the rate of performance improvements are slowing – especially in CPUs, but also in networks and storage – we are forced to focus on important second order gains: reducing costs; tighter integration; greater flexibility.

Yes, there are breakthrough technologies ahead. But the future will be won by smarter architectures, not brute force, solving the big challenges of future storage.

Courteous comments welcome, of course.

{ 9 comments }

July 4th, 2016: Mormon Canyon

by Robin Harris on Monday, 4 July, 2016

July 4th is when the United States of America celebrates the signing of the Declaration of Independence. For most Americans Independence Day is the most important secular holiday of the year.

Of course, July 4th wasn’t the actual date of the signing – July 2nd was – but no matter. Of greater interest is the fact that the colonials were deeply divided on the independence issue, with perhaps a third supporting the British throne, another third indifferent, and the final third pushing for revolution. A deeply divided America is nothing new.

Not only is the date wrong, but the name is too. America wasn’t independent on that day or for years after – and wouldn’t have been but for French aid. Not the picture that the name “Independence Day” conjures for most Americans.

Revolution Day
A better name is “Revolution Day”, for that is what began in 1776. These are perhaps the most famous words in America’s political history, and among the most revolutionary in world history:

We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness.

Much less famous, yet reflecting yet another deep division that bedevils America to this day, is another, less generous statement in the Declaration:

[The King] has excited domestic insurrections amongst us, and has endeavoured to bring on the inhabitants of our frontiers, the merciless Indian Savages, whose known rule of warfare, is an undistinguished destruction of all ages, sexes and conditions.

Mormon Canyon
Senator Mike Lee of Utah, recently noted that Mr. Trump’s anti-Muslim rhetoric did not play well in a state dominated by a religious minority whose ancestors faced violent discrimination, such as exemplified by the infamous Missouri Executive Order Number 44 of 1838, issued by Gov. Lilburn Boggs, that said, in part:

The Mormons must be treated as enemies, and must be exterminated or driven from the state if necessary for the public peace. . . .

Executive Order Number 44 was not officially rescinded until 1976, by then-Missouri Gov. Kit Bond.

Arizona’s beautiful Mormon Canyon has no bloody history that I know of. The name reflects the fact that Arizona, like Nevada and Idaho, has a long history of Mormon settlement.

I took this picture walking north on Brins Mesa trail at 705 AM on July 3rd, 2016. The edge of Brins Mesa is to the left, at an altitude of almost 5100 feet.

What I particularly liked was the sunshine lighting the left most peak. It reminded me of the oft-used “shining city on a hill” metaphor for American exceptionalism.

Mormon Canyon, Sedona AZ

Click to enlarge.

It is the nature of ideals that we often fail to live up to them. But it is important to try, and to continue to try, despite many failures. I wish all readers, American or not, a happy and peaceful July 4th.

{ 1 comment }

Meeting young Mr. Trump

by Robin Harris on Thursday, 30 June, 2016

Back in 1980 I met Donald Trump. He came to a finance class to talk about real estate finance.

I have no recollection of his talk. But I DO remember the visit and, given what I’ve read about Mr. Trump, some readers may find my recollection an interesting footnote.

Ivana
To set the scene, this was a graduate MBA course in finance, at a top business school – Wharton – with maybe a couple of dozen almost all male students in the class.

Trump was then about 12 years out of Wharton undergrad, and had notched a major success with the Grand Hyatt in mid-town Manhattan in 1976. Trump partnered with the Pritzker family on that project, and after a falling out, sold his half for $140 million in 1996.

What WAS memorable was that he brought a tall, slim, blond. He introduced her as his wife Ivana and a former Czech Olympic skier (evidently not true). Mrs. Trump spent the entire time with a deer in the headlights look, as if someone might ask her about finance.

No one did.

The StorageMojo take
Maybe the professor thought Mr. Trump would say something useful. Or he wanted a day off.

In retrospect though, the only reason to bring Ivana was as a prop. Given that Mr. Trump had actually pulled off a major success in the tough Manhattan real estate market, that was unnecessary.

Draw your own conclusions about what this says about the young Mr. Trump. Based on what I’ve seen of the old Mr. Trump, he would be a disaster for America and the world as President of the United States of America. There’s a reason we rarely elect business people as Presidents: politics requires totally different skills.

Courteous comments welcome, of course. Yes, this is off-topic for StorageMojo. Back to our regularly unscheduled programming soon.

{ 4 comments }

Enterprise storage goes inside

June 20, 2016

Some interesting numbers out of IDC by way of Chris Mellor of the Reg. First up: the entire enterprise storage market in the latest quarter: Note that HPE is #1. Then the numbers for the external enterprise storage market: HPE is now #3 with $535.7 million. The difference is internal storage That means that HPE […]

3 comments Read the full article →

Hike blogging: Hog Heaven trail

June 12, 2016

Mountain biking is very popular in the surrounding national forest. Enthusiasts have built many challenging bike trails, which I like to hike – not bike. I’ve never broken a bone and don’t intend to start now. Hog Heaven is one of the toughest of the local trails, with a double black diamond rating. Yesterday morning […]

0 comments Read the full article →

EMC perfumes the pig

June 10, 2016

I feel sorry for EMC’s marketers: they have to make 10-20 year old technology seem au courant. It’s an uphill battle, but that’s why they get the big bucks. The latest effort to perfume the pig – hold still, dammit! – is EMC Unity. In a piece that – and this is a sincere compliment […]

6 comments Read the full article →

Commoditizing public clouds

June 8, 2016

I’m a guest of Hewlett-Packard Enterprise at Discover 2016 in Las Vegas, Nevada this week. I enjoy catching up with the only remaining full-line computer company. HP was a competitor in my DEC days, and since the Compaq purchase they incorporate the remains of DEC as well. One of their themes this year is multi-cloud […]

4 comments Read the full article →

Thunderbolt: a fast and cheap SAN

June 2, 2016

If memory serves – and mine often doesn’t – I asked a panel at the NVM Workshop at UCSD their opinion on using Thunderbolt as a cheap, fast, and flexible interconnect. After all, I thought, academics always need more than they can afford, so these guys would have been looking into it. Nope! They laughed […]

1 comment Read the full article →

Hike blogging: Memorial Day 2016

May 31, 2016

Yesterday I took a seven mile out-and-back hike along the Chuckwagon Trail. This is an area I want to explore more. It didn’t look like a good day for pictures, so I didn’t take the Canon EOS-M. But the smoke from a couple of burns on the Rim cleared and some nice clouds appeared, so […]

0 comments Read the full article →

The array IP implosion

May 23, 2016

We’ve seen this movie before The value of legacy array intellectual property is collapsing. This isn’t complicated: SSDs have made IOPS – what hard drive arrays were optimizing for the last 25 years – easy and cheap. Think of all the hard-won – well, engineered – optimizations that enabled HDD-based arrays to dominate the storage […]

2 comments Read the full article →

WD is not a disk drive company – and not a moment too soon

May 20, 2016

While you weren’t looking Western Digital stopped being a hard drive company, morphing into a storage company. Such transitions are nothing new for a company that started life making calculator chips in the 1970s, morphed into SCSI, ATA and graphics in the 80s, and built its disk drive business in the 90s and 00s. The […]

2 comments Read the full article →

Scale and the all-flash datacenter

May 9, 2016

There’s a gathering vendor storm pushing the all-flash datacenter as a solution to datacenter ills, such as high personnel costs and performance bottlenecks. There’s some truth to this, but its application is counter-intuitive. Most of the time, storage innovations benefit the largest and – for vendors – most lucrative datacenters. OK, that’s not counter-intuitive. But […]

6 comments Read the full article →