What is “primary” storage?

by Robin Harris on Monday, 26 July, 2010

A commenter recently asked

Archivas was focused on archive, do you expect the new solution to sustain performance for primary storage as well?

Which is a good question, if you know what “primary” means. Do we?

Tiers of a clown
10 years ago we all agreed on 1st tier or primary storage: block-based; RAID 5; enterprise FC or SCSI drives; SCSI, FC or ESCON host connects; optimized for transactional workloads; and large mirrored (with 1 notable exception) caches. When SANS took off we stuck FC switches in front of the boxes and called it good.

But something happened to that consensus: iSCSI; NFS; CIFS; SSD; MEMcache; Internet scale-out; Infiniband; 10GigE; storage & processor virtualization; CDNs; web-serving; pNFS; and lower-cost out-sourced high-scale infrastructure (i.e. cloud). And more – such as non-SQL data management – is coming.

Will the real primary storage please stand up?
Amazon runs a high-growth $25B/yr business on scale-out storage, servicing millions of customers, taking real money and shipping real goods, 7x24x365. Smells like enterprise spirit.

Is Amazon’s storage “primary” and, if so, what makes it primary?

Yes, it is primary storage. No, it isn’t the logo that makes it so.

Workload & service level
It’s tempting to consider workload, but what workload? IOPS? Bandwidth?

How about parallelism? Web service is highly parallel. ACID database updates less so.

And what about files vs blocks? Blocks don’t require as much processing as files, as the host is handling the file system.

It is clear that most files aren’t often accessed. Does primary storage for files mean availability and reasonable performance? Or is there little difference between archive and primary for files?

NetApp is deduping primary storage. Others will follow, whether it makes sense or not, at least in messaging. Skeptics ask “If it is deduped, is it really primary?”

The StorageMojo take
We do a disservice to customers if we talk about “primary” storage as a class of equipment. It isn’t.

Primary storage is whatever works as primary storage for your application. Bare SATA drives Velcro’d to motherboards to a big cluster of DMXs. Both are in use in major enterprises for mission critical applications – and they both work.

The 60 year secular trend to cooler data is the cause – an inverse of Moore’s Law. As the average accesses of data declines, technologies that meet the need at a lower cost become attractive, find a market, and grow. Niche products become mainstream – and perhaps “primary” – for their markets.

At the same time Moore’s Law is working its magic: creaky slow 10Mbit Ethernet becomes 10GigE. Board level controllers become chips. Storage software migrates from firmware to a stack running on commodity processors. Yesterday’s “archive” storage is tomorrows “primary” storage for the right apps.

Even the term “enterprise” is losing its meaning. As firms begin the 10 year migration to private clouds for cooler data, commodity hardware – servers, unmanaged switches, SATA drives – will be knit by cluster software that may even be open source. It is “enterprise” because an enterprise is using it.

This why all the big iron vendors are migrating their software from embedded firmware to stacks running on commodity processors and operating systems. For the mainstream market the commodities are fast enough and the economics are compelling.

If if works for you, it’s primary.

Courteous comments welcome, of course. BTW, I’m getting a briefing from HDS on the old Archivas product, so maybe I’ll have more to say RSN.

{ 5 comments }

HDS: masters of stealth marketing

by Robin Harris on Thursday, 22 July, 2010

Winding up the week – it is Friday here – in Japan as a guest of Hitachi Data Systems. Fine hospitality from my American and Japanese hosts in steamy mid-summer Tokyo. Looking forward to Arizona.

The practitioners in the group – one who loves XIV, others with EMC and NetApp kit – were surprised by what the HDS stuff does. Such as virtualizing and managing your current storage platforms, regardless of vendor.

Seems like the big guys have been promising that for years. HDS delivered? Whoa.

A couple of things impressed me:

  • The senior Japanese execs weren’t the starchy, face-saving guys I’d expected. The Chairman of Hitachi made a speech to about 10,000 people without a tie, and all the other execs I spoke to followed suit. Even giving careful non-answers they came across as relaxed and realistic. Are they also decisive? We’ll see.
  • HDS has a clustered object store. I hope to get briefed on it next month.
  • The parent company has a vision for using massive amounts of data to improve our quality of life. Since they also produce power systems and high-speed trains they have a direct line into some critical issues.

The StorageMojo take
HDS is a multi-billion dollar company with some leading edge products and technologies. They’re about the size of NetApp – and I know you’ve heard of them.

As their OEM relationship with Sun winds down – or at least I expect it to – they’ll have more direct contact with a new group of customers. Now is the time for HDS to sharpen their messaging and turn up the volume.

Sadly that isn’t likely. The internal dynamics of the company seem to lead to generic messaging that fails to plant a hook. Maybe it is a consensus thing. But they aren’t doing customers any favors.

Courteous comments welcome, of course. Any recent experience with HDS?

{ 13 comments }

Off to Tokyo

by Robin Harris on Sunday, 18 July, 2010

The friendly folks at Hitachi are flying me and a number of other analysts and bloggers to Tokyo. They want to tell us about their plans for – well, I don’t know what – and it’s under NDA.

Normally I don’t sign those, but between Tokyo – which I like – and the promise of seeing Hitachi’s strategy, I was reminded of Emerson’s comment:

Foolish consistency is the hobgoblin of small minds.

The StorageMojo take
I’ll be arriving at Narita about 4am PDT, so don’t expect crisp comment moderation. I will try to post this week though.

Courteous comments welcome, of course. Anything you want me to ask Hitachi, even though I won’t be able to tell you what they said?

{ 0 comments }

A cloud app for the masses

by Robin Harris on Friday, 16 July, 2010

Cloud computing gets a bad rap because it can’t replace corporate data centers for mission critical apps. But new computing paradigms never do that: it is the new capabilities they enable that drive adoption. Case in point: transcoding.

Why?
Anyone who shoots video soon discovers that changing from, say, AVCHD to an editing-friendly codec and then to H.264 for distribution takes a lot of compute cycles. Conversion from one codec to another is called transcoding. It is the price we pay for high quality compressed content.

Compression and format conversion are necessary because highly compressed video – the kind most camcorders shoot – isn’t easy to edit. And the stuff that’s easy to edit has large files that chew up bandwidth and storage.

So we transcode. Add to that the number of formats we use – ranging from iPhones to flash to SD and 1080p – and transcoding is a major CPU cycle sink.

Fortunately, transcoding can be a highly parallel operation. A frame – or a series of frames – can be divided and split among multiple cores and CPUs.

Where?
Where can you find a lot of CPUs for a quick job? Right, the cloud. Which is why there are a number of online services that front-end Amazon Web Services to provide transcoding.

I spoke to the CEO of startup Zencoder, Jon Dahl to learn more.

Zencoder
Zencoder is a transcoding service provider that uses Amazon as a cloud provider. The Zencoder team has developed transcoding infrastructure for several startups and finally decided to build a general-purpose service.

While they use open source software in their stack – as do most transcoding providers – their major value-add is in a high-performance scalable interface. Handling 100,000 concurrent transcodes is non-trivial.

They also look out for problems common in transcoding such as audio/video getting out of sync and aspect ratio distortion. They can transcode 1080p faster than real time. And they’ve licensed the proprietary formats as well.

Amazon offers Linux as a service and a file service. S3′s files are limited to 5 GB, but that isn’t a problem for Zencoder: customers can specify input and output locations, bypassing Amazon storage.

Also they don’t transcode Mac ProRes – Final Cut Pro’s preferred editing format – today. But they do handle QuickTime movies.

The StorageMojo take
So the glass house doesn’t want to outsource cloud infrastructure. Who cares? They’re the last to adopt new technology anyway.

It is apps like transcoding that drive the business. In 5 years much, perhaps most, transcoding will be cloud-based.

Before the digital video craze in the last 5 years there wasn’t much demand for transcoding. But today, with HD video smartphones, millions are producing videos that they want to share and save.

Your smartphone won’t have the cycles to do it, but the cloud does. Expect transcoding vendors to add new features, such as noise-reduction or sharpening.

Business units are discovering the power of short videos to inform, train, persuade and excite. All at a fraction of the cost of 4-color brochures.

The outlook for storage vendors is mixed. Yes, much more storage will be sold – but cost-conscious cloud managers will be buying it. And as more new services develop on the cloud, consumers will be as hazy about “local” and “cloud” as they are about “memory” and “disk” today. Branding nightmare, but that’s where those petabytes will be.

Courteous comments welcome, of course.

{ 8 comments }

Making data Vanish

by Robin Harris on Friday, 9 July, 2010

Given how hard it is to save data you want (see The Universe hates your data) to keep, losing data on the web should be easy. It isn’t, because it gets stored so many places in its travels.

Problem
But the power of the web means that silliness can now be stored and found with the speed of a Google search. You don’t want sexy love notes – or pictures – to a former flame posted after infatuation ends.

Or maybe you want to discuss relationship, health or work problems with a friend over email – and don’t want your musings to be later shared with others. Wouldn’t it be nice to know that such messages will become unreadable even if your friend is unreliable?

Researchers built a prototype service – Vanish – that seeks to:

. . . ensure that all copies of certain data become unreadable after a user-specified time, without any specific action on the part of a user, without needing to trust any single third party to perform the deletion, and even if an attacker obtains both a cached copy of that data and the user’s cryptographic keys and passwords.

That’s a tall order. Their 1st proof-of-concept failed. But they are continuing the fight.

Vanish
In Vanish: Increasing Data Privacy with Self-Destructing Data Roxana Geambasu, Tadayoshi Kohno, Amit A. Levy and Henry M. Levy of the University of Washington computer science department present an architecture and a prototype to do just that.

Ironically, the project utilizes the same P2P infrastructures that preserves and distribute data: BitTorrent’s VUZE distributed hash table (DHT) client.

The basic idea is this: Vanish encrypts your data with a random key, destroys the key, and then sprinkles pieces of the key across random nodes of the DHT. You tell the system when to destroy the key and your data goes poof!

They developed a data structure called a Vanishing Data Object (VDO) that encapsulates user data and prevents the content from persisting. And the data becomes unreadable even if the attacker gets a pristine copy of the VDO from before its expiration and all the associated keys and passwords.

Here’s a timeline for that attack:


DHT overview

A DHT is a distributed, peer-to-peer (P2P) storage network. . . . DHTs like Vuze generally exhibit a put/get interface for reading and storing data, which is implemented internally by three operations: lookup, get, and store. The data itself consists of an (index, value) pair. Each node in the DHT manages a part of an astronomically large index name space (e.g., 2160 values for Vuze).

DHTs are available, scalable, broadly distributed and decentralized with rapid node churn. All these properties are ideal for an infrastructure that has to withstand a wide variety of attacks.

Vanish architecture

Data (D) is encrypted (E) with key (K) to deliver cyphertext (C). Then K is split into N shares – K1,…,KN – and distributed across the DHT using a random access key (L) and a secure pseudo-random number generator. The K split uses a redundant erasure code so that a user definable subset of N shares can reconstruct the key.

The erasure codes are needed because DHTs lose data due to node churn. It is a bug that is also a feature for secure destruction of data.

Prototype
They built a Firefox plug-in for Gmail to create self-destructing emails and another – FireVanish – for making any text in a web input box self-destructing. They also built a file app, so you can make any file self-destructing. Handy for Word backup files that you don’t want to keep around.

The major change to the Vuze BitTorrent client was less than 50 lines of code to prevent lookup sniffing attacks. Those changes only affect the client, not the DHT.

The Vanish proto was cracked by a group of researchers at UT Austin, Princeton, and U of Michigan. They found that an eavesdropper could collect the key shards from the DHT and reassemble the “vanished” content.

Who is going to collect all the shard-like pieces on DHTs? Other than the NSA and other major intelligence services, probably no one. For extra security the data can be encrypted before VDO encapsulation.

The StorageMojo take
The Internet is paid for with our loss of privacy. Young people may think it no great loss, check back in 20 years and we’ll see what you think then.

It is slowly dawning on the public that their lives are an open book on the Internet. Expect a growing market for private communication and storage if ease-of-use and trust issues can be resolved.

You don’t have to be Tiger Woods to want to keep your private life private. I hope the Vanish team succeeds.

Courteous comments welcome, of course. Figures courtesy of the Vanish team.

{ 6 comments }

Greg Reyes sentenced

by Robin Harris on Saturday, 26 June, 2010

Greg Reyes, former CEO of Brocade, received a sentence of 18 months and a $15 million dollar fine for his conviction on 10 felony counts related to options backdating. Prosecutors had asked for 37 months and a $137 million dollar fine. Mr. Reyes was emotional at his sentencing:

When Reyes got his opportunity to address Breyer, he stood at the lectern silently for a few seconds, and then broke down sobbing. [His attorney] read his statement for him.

“I am a shell of the man I once was,” he read.

Breyer said he was quite moved by the 400 letters sent in on Reyes’ behalf, as well as the financial and emotional support he extends toward others. Yet a message must be sent to executives that deceiving the public markets is a serious crime, Breyer said.

The judge cited one more reason for a prison term.

“White-collar defendants, unlike most defendants I see in court every day, have choices,” Breyer said, adding that he had just sentenced a man to more time than Reyes because he illegally re-entered the United States to see his 5-year-old son.

In two weeks, Breyer will sentence another man whose drug addiction began when his father shot him up with heroin when he was 11.

“What choices did that young boy have?” Breyer said.

[From Law.com]

The best CEO of any high tech company?
I met Mr. Reyes a couple of times when both of us wanted Sun to buy FC switches to make Sun’s early FC array more maintainable. I was at Sun at the time. He was an excellent salesman, but some idiot had decreed no FC switches for the storage group.

Storage Newsletter had an odd bit of history as well:

In 2002, we asked Steve Duplessie, well known consultant, to told [sic] us who was the best CEO in the storage industry. His answer: “[The best CEO] would be Greg Reyes of Brocade.”"

The 2 critical success factors for salesman are: a capacity for self-delusion – so you can sincerely and honestly tell your prospects how good it is; and a resolutely short term focus, because making this quarter’s numbers is what counts. Don’t hire a salesman to design your products or your strategy.

The StorageMojo take
Given Brocade’s current problem – they’ve been for sale for over 9 months and there are no takers – and his own, Mr. Reyes was no strategist. But Brocade’s IPO timing made fortunes for Mr. Reyes and co-founders Paul Bonderson and Kumar Malavalli. Isn’t that what really counts?

But Mr. Reyes can be forgiven if he feels unfairly singled out. Here we are 2 years after after the big Wall Street meltdown, where the big ibanks were packaging and selling crap and calling it gold, when mortgage companies and rating agencies had gone wild, and who’s gone to jail for that?

At the same time, Maher Arar, a Canadian who was arrested in 2002 by U.S. officials while changing planes in New York on a trip to Montreal and then rendered by US officials to a Syrian jail was denied a hearing by the US Supreme Umpires. According to the findings of fact, Mr. Arar

. . . was in Syria for a year, the first ten months in an underground cell six feet by three, and seven feet high. He was interrogated for twelve days on his arrival in Syria, and in that period was beaten on his palms, hips, and lower back with a two-inch-thick electric cable and with bare hands.

So buck up, Mr. Reyes, things could be worse. In 18 months you will have paid your debt to stockholders and you will still be among the richest 30,000 or so people in the world.

Courteous comments welcome, of course. America is a nation of laws, not of men, unless the men are fighting terrorism.

{ 9 comments }

How tape dies

by Robin Harris on Wednesday, 16 June, 2010

Storage Newsletter reports that Tape Drive and Media Revenues Decreased by 25% in 2009. The data comes from a report by the Santa Clara Consulting Group.

The numbers show us how old tape formats die: slowly. While the overall market for drives and media was $1.58B it was split among LTO, DLT, DAT, 8mm and even, gasp, QIC.

The good news: drive sales were $629M, suggesting that media sales will continue for years to come. LTO had over 83% of drive sales – $534M – with DAT (!) drives making most of the rest – $69M – and DLT much of the remainder.

The media numbers are revealing. Overall, media sales were only about 50% greater than drive sales or $955M. But in the case of DAT, media sales of $45M were less than drive sales. Buyers aren’t making much use of their new drives.

8 mm and QIC bring up the rear. Somebody bought over a million units of AIT media and over $16M of QIC media.

The StorageMojo take
The long tail of tape is longer than I’d thought. There must be ancient systems out in retail or OEM equipment that use the media. Military, too.

But why that 25% drop in the overall tape market? I’d need more time series data to draw any firm conclusions, but here’s what I’d look at:

  • The Great Recession. The overall slowing in business and capital expenditures is a piece of that. But the world economy did not decline 25%, thank goodness, so that can’t be the full cause.
  • D2D. Data de-duplication is aimed at making disk competitive with tape. Looks like 2009 was the year it took a byte out of tape.
  • Tape capacity growth. The LTO folks have been increasing LTO tape capacity at a rate near that of disk. More data, fewer tapes. Disks, of course, wear out, so the replacement market is huge.
  • Drive cost. At $3-$4k for an LTO 5 drive and $125 per 3 TB tape, the use of tape is moving upmarket, which means smaller volumes.

Some people love what tape does. But others don’t: I haven’t seen a new tape or disk-based camcorder introduced in over a year. Everyone is going to flash.

$1.5B markets don’t die overnight – even dropping 25% a year. Tape will be around for a long time to come.

Courteous comments welcome, of course. I kicked off DLT for DEC back in 1991 and have always wondered why Quantum just rolled over for LTO instead of fighting. Oh well.

{ 13 comments }

Room at the top

by Robin Harris on Wednesday, 9 June, 2010

Kaminario has introduced the world’s fastest SAN storage, the K2. If time is money, this is for you.

DRAM
Kaminario’s K2 is fast because DRAM, not disk, is the primary storage. DRAM’s low latency, high bandwidth and durability breaks the tight link between capacity and performance that disks and flash impose. No need for excess capacity to ensure enough IOPS, bandwidth or service life.

The product
Kaminario is a software company. However, they configure customer systems and install the software to order. No home-baked integration here.

The basic hardware unit is a Dell blade server. The blade servers are either I/O directors or data nodes. The Dell server chassis is a passive box – no active components on the backplane – but some customers opt for dual chassis for redundancy out of caution.

I/O directors
The I/O directors use 8 gig Fibre Channel to servers and 10Gig/Ethernet to data nodes. The company says they can saturate both due to proprietary software optimizations.

Using FC switches, each I/O director can talk to multiple servers. Each I/O director can handle 150,000 random IOPS.

K2 architecture - courtesy Kaminario


Data nodes
Each data node supports up to 288 GB of ECC DRAM. All the data nodes have battery backup and 2 disks for de-staging data to persistent storage. Background de-staging during idle time reduces backup times during power failures.

The minimum config is 2 I/O directors and 4 data nodes with 500 GB of capacity. That’s 300,000 IOPS. They’ve been tested to 10 nodes and 1.5 million random read/write IOPS with support for 16 nodes – and double the IOPS – reportedly coming soon.

Under the covers
The I/O directors are clustered so when 1 fails the others pick up the load. The switched back end 10Gig Ethernet enables all I/O directors to access all data nodes.

The replication default is 2 copies of all data on different blades. Plus copies on disk.

All this runs on standard Dell blade servers. No specialized, low-volume RAID controllers or power-hungry disk shelves.

Software
The secret sauce is the software. Kaminario doesn’t say much about how they do what they do. In any high-performance cluster maintaining metadata coherence across nodes is one of the tough problems.

They did say they maintain hash tables that enable very short updates to all I/O directors after writes. I also suspect they also have implemented a low latency backend update protocol. Metadata serving is distributed across the cluster.

They must also have some creative ways to max out FC links. I’d like to know more.

Management
With storage this fast they say you need little tuning. Lay LUNs across the data nodes and fasten your seatbelt. The software includes optimizations, like pseudo-random block layout to minimize contention, automatic load balancing and demand-based block replication.

If your app calls for it you can tune chunk sizes and set replication policies. Kaminario says K2 is much easier to manage than typical high-performance storage – you don’t have to worry about disk-induced issues like stride.

Management is kept out of the data path on a dedicated GigE network.

Support
Kaminario says they have designed the product and their organization to provide mission-critical Enterprise support. The visible elements from configuration control and software installation to phone home and remote diagnostics back that up.

Who needs this?
If you are hammering a few TB of data for stock trading, real-time business intelligence or TLA government work, this could be the ticket.

Pricing
If you have to ask. . . .

Kaminario has a unique approach: pay for performance:

. . . we price the solution based on the customer IOPS and capacity needs, so basically the way we present such a platform price is by $/GB/IOPS.

I *think* small configs start around $200k. For the performance market price is something like #7 on the list. The first 3 are performance/availability – 2 sides of the same coin, really.

This removes SPEC shadow puppetry between application requirements and storage performance. Of course, you have to know what performance you want. But anyone who’s performance tuning high-end arrays will know that.

The StorageMojo take
Kaminario is opening a new niche at the performance end of the market.

The current Big Storage vendors claim that they too can do a million IOPS. And they can, for millions. A price that makes a few TB of DRAM look cheap.

Since high-end disk – ≈$1/GB retail – makes up 5-10% of the cost of a high-end array, replacing disk with DRAM might be expected to double the cost of an array. But K2 does away with all the low-volume kit – controllers, shared cache, disk packaging and more – and replaces it with high-volume blade hardware. That lowers costs a lot.

Kaminario has opened a new niche: hyper-performance data storage. While a few TB doesn’t sound like much, it is more text than all but the world’s largest libraries place on miles of shelves.

The data arms race has kicked up another few notches. It is more competition for the big iron arrays where they least expected it: at the high-end of the market.

Courteous comments welcome, of course.

{ 17 comments }