StorPool’s new distributed storage software

by Robin Harris on Tuesday, 16 September, 2014

It was obvious in 2006 that Google’s clean-sheet GFS would revolutionize massive storage. The problem has been taking Google’s concepts and scaling them down to less than warehouse scale.

A number of companies have tried – Nutanix is probably the latest – and there’s a new entrant. StorPool offers distributed block and object storage designed to be very efficient at handling common business requirements for VMs, containers and bulk storage. StorPool_logo.png

StorageMojo spoke to founders Boyan Ivanov and Boyan Krosnov a couple of weeks ago about what they are shipping today. To achieve that StorPool has done some things differently.

  • StorPool started with a clean sheet and has rebuilt the entire storage stack.
  • Own on-disk format.
  • End-to-end data integrity with 64-bit checksums for each storage sector.
  • No metadata servers to slow down operations.
  • Changes to TCP to improve network efficiency.
  • Applications can be run on the storage servers is it uses only 10-15% of system CPU and RAM.

Of course, they also did things others have done because they work.

  • Shared nothing architecture for maximum scalability.
  • In-service rolling upgrades.
  • Snapshots, clones, thin provisioning, QoS, rolling software upgrades, synchronous replication.
  • SSDs support for performance.
  • Runs on commodity hardware.

While StorPool supports blocks, it also supports objects with a 32MB object size that is striped across the whatever pool you place the data in.

On a small system of 6 servers, 12 SSDs, 30 hard drives and 10GigE network they’ve measured 2700MBs/1500MBs sequential read/write. Random 4k reads 170,000 IOPS and 66,000 IOPS on writes.

Resources reserved for the storage system – total across 6 servers: 48 GB RAM; and 12 CPU cores.

Thanks to the shared-nothing architecture performance increases are essentially linear as you add servers, SSDs and disks.

No fancy GUIs here. If you aren’t comfortable with the command line you’d best move on.

There’s a short but detailed video demo on YouTube.

StorPool offers flexible acquisition options. You can buy a perpetual license or a month-to-month license. They have a free trial as well.

A nice wrinkle: pricing is based on the number of disks, not their capacity. So load up on the new 8TB drives.

The StorageMojo take
StorPool isn’t trying to be all things to all people. Simply economical, good-performance scale-out bulk storage with the flexibility to be used as an array or a converged infrastructure.

StorPool is currently targeting service providers, cloud vendors and academics. If you like your hardware vendor but want more flexible storage, StorPool may be just the ticket.

What’s sobering is that while GFS is over 10 years old, we’re only now getting to the point where enterprises are embracing modern storage technology. That’s good news for StorPool and this market because it means most of the growth is still ahead of them.

Courteous comments welcome, of course.


Friday hike blogging: Brins Mesa

by Robin Harris on Friday, 12 September, 2014

With family visiting I only got out once this week: a 3 hour hike on the Brins Mesa, Soldiers Pass, Cibola and Jordan Trail’s loop. It’s a favorite: bracing vertical; much variety; not too many tourists (usually); and, of course, fabulous vistas.

We’re just coming to the end of Arizona’s monsoon season, which has been – thankfully – especially rainy, although we’ve avoided the drenchings that the Phoenix area has seen. What I like about summer and winter are the clouds that dapple the rocks, as in this view:

Click to enlarge. Click to enlarge.

This was taken on top of the mesa at close to 5100 ft. altitude, looking west with rising sun behind. The leafless and blackened trees in the foreground are relics of the 2006 Brins Mesa fire, the area’s last major fire until this year’s Slide Rock fire.


Optimizing erasure-coded storage for latency and cost

by Robin Harris on Friday, 12 September, 2014

Erasure coded (EC) storage has achieved remarkable gains over current RAID arrays in fault-tolerance and storage efficiency, but the knock against it is performance. Sure, it’s highly available and cheap, but it’s slo-o-w.

Advanced erasure codes – those beyond traditional RAID5 and RAID6 – require a lot more compute cycles to work their magic than the parity calculations RAID uses. With the slowdown in CPU performance gains, waiting for Moore’s Law to rescue us will take years.

But in a recent paper Joint Latency and Cost Optimization for Erasure-coded Data Center Storage researchers Yu Xiang and Tian Lan of George Washington University and Vaneet Aggarwal and Yih-Farn R. Chen of Bell Labs tackle the problem with promising results.

3 faces of storage
The paper focuses on understanding the tradeoffs through a joint optimization of erasure coding, chunk placement and scheduling policy.

They built a test bed using the Tahoe open-source, distributed filesystem based on the zfec erasure coding library. Twelve storage nodes were deployed as virtual machines in an OpenStack environment distributed across 3 states.

Taking a set of files, they encoded each file i into ki fixed-size chunks and then encode it using an (ni, ki) MDS erasure code. A subproblem is chunk placement across the infrastructure to provide maximum availability and minimum latency.

The researchers then modeled various probabalistic scheduling schemes and their impact on queue length and the upper bound of latency.

Joint latency – cost minimization
The 3 key control variables are erasure coding scheme; chunk placement and scheduling probabilities. However, optimizing these without considering cost is a ticket to irrelevance.

It’s not any easy problem, as the paper’s pages of math attest. But one graph shows what is possible with their JLCM algorithm:

Graph courtesy of the authors Graph courtesy of the authors.

The StorageMojo take
If CPUs were getting faster as they used to we could wait a few years for high-performance erasure-coded storage. But unless Intel puts optimized EC co-processors on its chips – similar to its GPUs – we’ll have to do something else.

EC storage faces a higher bar than earlier innovations. Even pathetic RAID controllers could out perform a single disk. Similarly, early flash controllers could as well, thanks to flash performance.

But EC storage is slower than even disk-based arrays. But the financial and availability benefits of cracking this particular nut are huge.

The paper offers a valuable perspective on moving EC storage forward. Let’s hope someone takes this opportunity and runs with it.

Courteous comments welcome, of course. What would it take to accelerate EC performance?


HGST & Amplidata to co-develop “ultra-dense” storage

by Robin Harris on Monday, 8 September, 2014

Amplidata announced this morning that Western Digital Capital has made a $10m investment.

HGST, a wholly owned subsidiary of Western Digital Corp., has selected Amplidata’s Himalaya software to jointly develop a family of ultra-dense storage solutions to address the rapidly growing demand to store data in public and private cloud data centers. . . .

The companies will partner to create solutions that will dramatically improve the storage economics for the Exabyte-scale needs of the world’s largest businesses and will be available in the market during the first half of 2015.

The StorageMojo take
Wow. Didn’t see that coming. But I like it.

WD getting into the high-scale storage business? The storage landscape really has changed. Time was when a drive vendor would never – well, except for Seagate’s flirtation with Xiotech – have competed with customers.

But the lure of vertical integration – and the margins – has worked its magic. When software eats everything the barriers to entry are lowered. And when new apps are speaking native S3 it’s way easier to look at a rack full of disks as a really big disk drive.

Instantiation? Two possibilities spring to mind.

  1. HGST integrates the Amplidata software with their smart disks (see Seagate’s Kinetic vision shipping – but not from Seagate).
  2. They stay with Amplidata’s controller model to produce something that looks like what Amplidata is offering today.

One is more difficult. Two is less disruptive. My money’s on 1.

I did a video for Amplidata a couple of years ago on their architecture that interested people will enjoy.

Courteous comments welcome, of course. What do you think?


Friday hike blogging: Teacup Trail

by Robin Harris on Friday, 5 September, 2014

This view of Coffeepot Rock is from Teacup Trail. This is a more heavily traveled area as it’s close to town, the trails are easy – except for Thunder Mt. Trail, where a friend died last year – and the scenery is great. Enjoy!

Click to enlarge Click to enlarge


Deep file analytics

by Robin Harris on Friday, 5 September, 2014

A new storage market is being born. Will it survive?

As infrastructure continues to adjust to a data-centric world, the ability to manage data – not just storage – is poised to become a must-have capability. Traditionally, of course, data management has meant databases. But file data – confusingly called unstructured data – is by far the predominant business data type today.

Reading the tea leaves
The recent launch of DataGravity – who perform deep file inspection in their array – and a discussion with Quaddra Software’s John Howarth and Marc Farley suggests a mini-wave of activity in this area.

This is not a new idea. I talked to a startup a decade ago that proposed to do the same thing across an entire LAN – and then went nowhere.

ZL Technologies is 15 years old, prospering in the enterprise archive space. Their web site says:

ZL Technologies’ Unified Archive® utilizes a unique, unified architecture that breaks down data siloes in favor of one robust, centralized repository for managing all enterprise unstructured data and performing records management, eDiscovery, and compliance functions.

The thread
All three companies handle both metadata and content. Users can sort based on types of files – PDFs, MP3s – as well as content – social security or credit card numbers – in those files.

The use cases are similar as well. For legal discovery, DataGravity and ZL handle it, but Quaddra is looking to empower others to deliver that service. Chargeback is another common use.

An open question of growing importance is the data generated outside of corporate storage: tweets; Facebook entries; IMs; voicemails; and other types and formats that may not exist today. And you thought email was hard.

The StorageMojo take
What makes business markets take flight? I like the pain theory: when enough people feel the pain AND that pain is high relative to other pains, then people seek relief.

We’re reaching that point with deep file analytics. E-discovery is one driver. Sheer volume of data is another, which in part, drives chargeback.

The bottom line, with commodity and virtualized computes, cheaper networks, security issues and the newly visible costs of storage – thanks, AWS! – the pain of storage, which was always there, is now more visible and higher relative to other pains.

There will be backing and filling over the next 5 years, but deep file analytics is here to stay.

Courteous comments welcome, of course. Agree or disagree as you like – just tell me your reasons.

{ 1 comment }

Infinidat: 21st century enterprise storage arrives

by Robin Harris on Wednesday, 3 September, 2014

Who would have thought, at this late date, that an upstart would appear to challenge high-end EMC, NetApp, HDS and IBM arrays with a fundamentally superior product. But when you don’t have an installed base – and a cash cow – to protect, you can go big.

That upstart is Infinidat. Read on and see if you agree.
As Nimble Storage has shown, founders on their 4th or 5th startup can move much faster than newbies. Storage is just different, and enterprise storage is the toughest part of the storage market.

That’s why the fact that the founding team’s deep experience from EMC, IBM, NetApp and XIV is important. They know the customer requirements and they know the technology.

They wrote their first code in 2008. Won their first customer in 2011. They have over 100PB installed in their QA lab – key to producing enterprise-ready storage – and an investment no VC backed firm would make.

They have earned over 90 patents, but are also big contributors to open source projects. They currently have systems in production in the US, UK, Europe, Israel and China.

Why you haven’t heard of them
Infinidat is self-funded. The founders have succeeded with prior arrays – Symmetrix and XIV among them – and haven’t needed or accepted venture capital.

They’ve been focused on winning strategic customers with a small, elite sales force. Also, they’ve been production-constrained and are looking to build a factory in the US.

Why you will hear of them
Infinidat’s flagship product is the InfiniBox, a petabyte scale enteprise array.

The InfiniBox has a number of advanced features:

  • 99.99999% uptime. Seven 9s of availability: turn it on and it never stops serving I/O over its useful life.
  • 2PB usable capacity. In a single 42U rack. Or start with 300, 1,000 or 1,500TB.
  • Natively supports block, file and object. No bolt-on NAS gateways.
  • FICON, FC, Ethernet. Integrates with OpenStack, VMware and others through native intefaces.
  • Active/active/active controller triplet. Infiniband mesh connectivity for maximum performance – with no switches. All controllers see all drives.
  • Massive caches. Up to 1.2TB of DRAM cache and up to 48TB of flash secondary cache per controller.
  • Automatic real-time data movement between DRAM, flash and disk.
  • 15 minute drive rebuilds. No limping along for hours after a drive failure. All data is protected against two drive failures.
  • End-to-end data authentication. Checksums ensure data integrity.
  • Balanced drive loads. No hot spots because due to advanced data layout.
  • Snapshots with no performance impact. No table locking during snaps virtually eliminates snap overhead.
  • HTML5 GUI. Simple system management.
  • 3 years 24/7 support included in base price.
  • Base price includes all software and system updates. No pricy extras.

Amazingly, all of this is done with NO custom hardware. All hardware is off the shelf – which is reflected in the pricing. Which is the next topic.

Of course, these features – and many more not listed here – are going to cost you.

Less. A lot less.

Street price, all in, with all software, 3 years maintenance, all system updates: 1/3rd to 1/2 of EMC.

A fraction of current enterprise arrays.

The StorageMojo take
The InfiniBox is the single most exciting product StorageMojo has seen in 10 years. Cloud IaaS and all flash arrays are flanking attacks on current enterprise arrays, while Infinibox is a frontal assault. It applies everything the industry has learned in the last decade to the problem of enterprise storage.

The availability, performance, management, density and pricing reflect the benefits of a modern architecture. The enormous investment – 100PB in QA! – behind Infinidat reflects the success of the founders with earlier iterations of enterprise storage.

Infinidat means business. If you’re looking at a new high-end array, take a moment to check out Infinidat. There is nothing else like it on the market: enterprise availability and performance; unbeatable density and efficiency; and excellent all-in pricing. All this thanks to a fully modern architecture and built by a deeply experienced team.

Courteous comments welcome, of course. I couldn’t go into all the details here, but ask questions and I’ll try to answer them.


Labor Day 2014

September 1, 2014

The brutal struggles of Capital and Labor in the 1800s may seem far away in 2014, but they continue to this day. Now there’s less blood, but a good deal more money. Why do we have a Labor Day? Wikipedia says: Labor Day in the United States is a holiday celebrated on the first Monday […]

1 comment Read complete post →

Friday hike blogging: Airport Mesa

August 29, 2014

Often named as one of the most scenic airports in America, the airport sits on a mesa high above the town. There’s a trail that circles the mesa and – more important for me – I can walk to it. According to my iPhone app, the hike is 7.22 miles with 1788 feet of vertical […]

0 comments Read complete post →

The network crunch: DAS and SAN

August 28, 2014

Several years ago an Intel briefer promised me $50 10Gb Ethernet ports. The shocker: prices have dropped little in the last 8 years – well more than a decade in Internet time. I don’t look back as often as I should. But a note from a ZDNet reader prompted some retrospection and research into network […]

2 comments Read complete post →

Why are array architectures so old?

August 25, 2014

25 years ago I was working on DEC’s earliest RAID array. When I look at today’s “high-end” arrays, it’s shocking how little architectural change the big iron arrays have embraced. The industry is ripe for disruption, only part of which is coming from cloud vendors. 21st century problems demand 21st century architectures. Here’s a list […]

8 comments Read complete post →

Friday hike blogging: clouds!

August 23, 2014

There’s a reason Arizona doesn’t have Daylight Saving Time: we have as much daylight as we can stand. With over 300 sunny days a year – and the sunniest are often the hottest – it’s a relief to see the sun set as the perceived temps drop 10-15°F. Despite that, summer is my favorite season […]

0 comments Read complete post →

DataGravity launches

August 20, 2014

The co-founder of Equallogic, Paula Long, is heading up the new startup DataGravity. Their system takes advantage of active/active controllers to bring deep storage inspection to small and medium businesses. What they do DataGravity brings a new level of information awareness to network storage arrays, enabling important new capabilities within the array at no extra […]

3 comments Read complete post →

Friday hike blogging: Mormon Canyon

August 15, 2014

After several days where I couldn’t get out I left at 6am for my favorite hike, the Cibola-Jordan-Soldiers Pass-Brins Mesa loop. Big decision: clockwise or counter-clockwise? Clockwise is gentler with an uphill bias until reaching the highest point – 5138 feet – after 4.5 miles. Counter-clockwise gets the bulk of the vertical done in the […]

0 comments Read complete post →

Scale Computing: infrastructure made simple

August 15, 2014

Google and Amazon have armies of PhDs to design, manage and diagnose their scale-out systems. Few small to medium sized businesses do – nor should they – but they should still have the advantages of scale-out infrastructure. Imagine infrastructure that comes in a box with no costly VMware licenses, great support and good scalability. That […]

2 comments Read complete post →