It was obvious in 2006 that Google’s clean-sheet GFS would revolutionize massive storage. The problem has been taking Google’s concepts and scaling them down to less than warehouse scale.
A number of companies have tried – Nutanix is probably the latest – and there’s a new entrant. StorPool offers distributed block storage designed to be efficient at handling common business requirements for VMs, containers and bulk storage.
StorageMojo spoke to founders Boyan Ivanov and Boyan Krosnov a couple of weeks ago about what they are shipping today. To achieve that StorPool has done some things differently.
- StorPool started with a clean sheet and has rebuilt the entire storage stack.
- Own on-disk format.
- End-to-end data integrity with 64-bit checksums for each storage sector.
- No metadata servers to slow down operations.
- Changes to TCP to improve network efficiency.
- Applications can be run on the storage servers is it uses only 10-15% of system CPU and RAM.
Of course, they also did things others have done because they work.
- Shared nothing architecture for maximum scalability.
- In-service rolling upgrades.
- Snapshots, clones, thin provisioning, QoS, rolling software upgrades, synchronous replication.
- SSDs support for performance.
- Runs on commodity hardware.
Performance
On a small system of 6 servers, 12 SSDs, 30 hard drives and 10GigE network they’ve measured 2700MBs/1500MBs sequential read/write. Random 4k reads 170,000 IOPS and 66,000 IOPS on writes.
Resources reserved for the storage system – total across 6 servers: 48 GB RAM; and 12 CPU cores.
Thanks to the shared-nothing architecture performance increases are essentially linear as you add servers, SSDs and disks.
Management
No fancy GUIs here. If you aren’t comfortable with the command line and a JSON API you’d best move on.
There’s a short but detailed video demo on YouTube.
Pricing
StorPool offers flexible acquisition options. You can buy a perpetual license or a month-to-month license. They have a free trial as well.
A nice wrinkle: pricing is mostly based on the number of disks, and only partly their capacity. So load up on the new 8TB drives.
The StorageMojo take
StorPool isn’t trying to be all things to all people. Simply economical, good-performance scale-out bulk storage with the flexibility to be used as an array or a converged infrastructure.
StorPool is currently targeting service providers, cloud vendors and enterprises. If you like your hardware vendor but want more flexible storage, StorPool may be just the ticket.
What’s sobering is that while GFS is over 10 years old, we’re only now getting to the point where enterprises are embracing modern storage technology. That’s good news for StorPool and this market because it means most of the growth is still ahead of them.
Courteous comments welcome, of course. Update: I got some details wrong in the 1st draft of this post. They’veve been corrected above. Sorry! End update.
One of the biggest advantages of GFS is that Google owns the code, so it doesn’t have to worry about interruptions in the product’s life cycle. How does StorPool mitigate this concern?
Hmm. StorPool doesn’t really explain how they manage the classical cluster problems such as “split brains”, or recovery in case of large failures that would actually bring an interruption of service. From their video and documentation it’s actually closer to Lustre or OrangeFS — but with built-in replication and without explicit metadata servers, which probably means they all are, alla OrangeFS/PVFS2 — than to Gluster, Ceph and friends.
Dear Emmanuel,
Thank you for this question and apologies for the late response.
In order to operate StorPool requires quorum availability, i.e. at least half of the expected nodes + 1. Without quorum no storage operations will take place in order to guarantee data consistency and prevent split brain. In case of split-brain none of the nodes in the sub-clusters will become up, until at least half of the expected nodes + 1 become fully connected through at least one broadcast domain.
There are also more complex cases handled by StorPool, e.g. partial connectivity over two separate networks in case of network redundant connections.
OrangeFS (ex. PVFS) is a parallel file system. The difference is that StorPool is high performance distributed block storage system. The clients have access to a block devices, which may be used with any system designed to use a block device, i.e. filesystems, certain databases, or directly given to virtual machines.