Yay!
Parascale (parallel scale) launched its beta Virtual Storage Network this week. I’ve done some consulting for them so I won’t pretend to be objective. I’m a big fan of software-based storage clusters no matter who makes them.
GFS-like is more accurate
You can read about VSN architecture here. AFAIK it is the first GFS-type software-only storage product intended for enterprises.
Here’s a diagram from their architecture page.
It should be pretty solid for a beta. Blue Coat, which used to be CacheFlow, has been using it for 18 months as a FIFO buffer to stage backups. They set it up and it has worked flawlessly.
The StorageMojo take
With EMC edging into storage clusters next year the credibility of the concept will take a giant leap forward. Parascale is well-positioned to take advantage of the interest, especially if, as I suspect, someone buys them.
Comments welcome, as always. If you decide to test VFN, let me know how it goes.
Update: This post was not my finest hour: I got Parascale’s acronym wrong – it is Virtual Storage Network, not Virtual File Network. Memory like a sieve. And many have taken me to task for calling VSN “. . . the first software-only commercial storage product.” I was trying to point to the GFS-style architecture for the enterprise and did not word it well. So I re-worded it.
In researching some of the suggestions from the comments I noticed that not every vendor talks about their architecture. Interesting. End update.
“AFAIK it is the first software-only commercial storage product” – You’re wrong. The Archivas ArC Storage Archive came before this, and is/was hardware agnostic, but mainly a WORM archive system. It too could run on commodity hardware from various vendors, and still can under HDS’s ownership. It also scales high and fast, and doesn’t suffer from the requirement of a control node. It supports webdav, nfs, cifs, http, ndmp for file ingestion.
Love your site. I was wondering why you are a fan of software-based systems? And do you think I can run a database on top of VFS mount?
Henry, good catch. The Archivas web site is still up here if you want to learn more. As you noted, Archivas could run on commodity hardware, just like Parascale.
Todd, I like software-based systems because hardware is a commodity. When you create custom hardware you also create low-volume, high-cost components whose economics go from bad to worse. If you *need* to do it, then go for it. But data is getting cooler and the requirement for specialized high-performance hardware is shrinking relative to the market.
No, I don’t think I’d try to run a database on VFN for a while, but Moore’s Law continues to work and I think future versions of VFN will be able to handle lower-end requirements with an NFS-speaking DB.
Robin
What about Lustre from Cluster File Systems (which was recently bought by Sun) ? Its a software only solution that sounds similar to Parascale. Lustre is used by some of
the largerst supercomputers in the world.
It has a really good networking layer which supports RDMA transfers across many different
networks types (IB, Myrinet, 10GigE, GigE, Mellanox)
A single Lustre client can achieve ~500MB/sec pretty easily over IB with tons of CPU left over for computation.
ibrix does the software only on commodity storage thing also. And they’ve been shipping product for a longer time. I’d recommend everyone giving them a good long look. ibrix has some great technology for this kind of storage.
That said, these guys do sound interesting. I might give them a test.
Robin, I don’t see any difference between a “software-only” filesystem that is priced per spindle (Parascale) and an “appliance” that is software running on commodity hardware (Isilon) and is priced per dozen spindles.
Wes, Isilon and LeftHand both do the software on commodity hardware thing – although Isilon adds bits such as NVRAM cards for caching – and AFAIK neither is available without buying the associated hardware.
Blake, I’m trying to learn a bit more about Ibrix. It isn’t clear from their web site but it seems they only sell through resellers, so again, you can’t just buy the SW and stick it on handy servers.
Robin
Actaully there are quite a few CLuster file systems that are software only. GPFS can run on pretty much any hardware, although IBM prefers you run it on AIX and on IBM hardware. GFS when it was Sistina was hardware agnostic, and still is under RedHat’s rule. Of course there is Lustre. PVFS2 is out there an will run on any hardware platform as well. LaScala is Windows based and can run on any hardware…where should I stop?
Actually there are “fewer” appliance based models then open models out there. As to “who came first?” that is debatable.
As far as iBrix, it can run on almost any hardware, although it is my understanding they have a “support” matrix. Used to be very common with Rackable until some customer environments didn’t work out so well. As for Lustre, I’ve seen the BEST performance out of Lustre configurations. I agree completely that they are heavy into the National lab space, one of the DOE labs is seeing between 15-20GB/s aggregate throughput over IB with their Lustre deployment.
Based on XFS similar to Rackable clusterfs and SGI’s own CXFS. One issue with XFS is that Redhat moved to 4K stacks on i386 and the combination of XFS/LVM often results in odd crashes. the recommendation is to recompile the kernel with 8K stacks. On 64bit I’ve been very happy with it and would like clustered storage. but I would rather 1 opensource version with for sale management rather than lots of small private implementations.
There is also GlusterFS file system, scalable to peta bytes
Ibrix requires a control node as well. Polyserve is sold as software only, but metadata sharing grows fast as (gfs cluster) nodes grow. Exanet is SW only if used on approved platforms 🙂 (ibm or supermicro, etc)
i hear you that if software only it should be more economical, as the hardware commoditized – and all of the above benefit from more cores and more cache and pci-e/x for more HBAs, so a 2u server is nice vs. a 1U appliance – but we’re finding appliances like Reldata and OnStor (although not true GFS) are still less $$ on a cost/GB up to a certain speed requirement.
Exanet’s throughput per dollar is impressive, google it, if you need lots of i/o.
I really don’t understand your excitement. Stiven already mentioned IBM’s GPFS, and I just wanted to make little correction, this is not just a cluster file system, it’s a parallel file system. Today it is running on Linux (not only IBM iron) and AIX. As far as I know in the future it will be ported to another OS’s.
http://www-03.ibm.com/systems/clusters/software/gpfs/index.html
It has many interesting features like ILM, data and metadata mirroring, snapshots…
Hmm …. CN brings back memories of PVFS manager from PVFS 1 (hang on … I see the point – metadata is not stored directly on CN), and the MDS from Lustre. Both suffer the same problem … high volume concurrent access of small files (metadata intensive operations) becasue CN becomes the bottleneck.
I dont see whats so readically new in this architecture …
And like many pointed out already .. there are a bunch of software only file systems available … have been available since some time … What is your USP?
Am I missing something?