Ask StorageMojo: is Atmos the only choice?

by Robin Harris | Saturday, April 4, 2009 | Architecture, Cloud computing & storage | 8 comments

A regular reader writes:

Weâ€™re facing a customer problem that requires a distributed global namespace, with support for disconnected operations, geographically appropriate distribution of data, and block-level cacheing for frequently accessed parts of remote files (when not geographically distributed to the local area).

I think, perhaps, that Atmos is about the only thing commercially like that. Everything else seems to be Researchville. . . .
Well, you know where to look?

Standard file protocol access is required (namely, CIFS), at least at the branch offices.

My first question was about the application. In response he said the app is a:

. . . version of photoshop, more or less. They pan around a lot, generating 20MB/s in requests per user. Often there are about 10 users. The app is a windows app, and faces a CIFS entity. There is a workgroup of folks at each branch, so a concentrator would work out nicely. Weâ€™re looking at Riverbed, et al., but Riverbed is an incomplete solution. Weâ€™d like to anticipate user demand and push out probable information during off hours. The final bit of the puzzle is that their files are quite large.

Let’s get this straight:

Large file sizes – the panning suggests something like Google Maps, but extra rich.
Not a lot of users, but if they are all working you could see 200 MB/sec in aggregate bandwidth across 10 files.
The app’s users are distributed, and given broadband limitations it is desirable that the data be local to users – where ever they are.
A standard file access protocol.

Did I miss anything?

The StorageMojo take
Atmos would fit, I think – but is it shipping? Haven’t heard squat about it since the announce. I can think of a couple of cluster-based storage systems that could also do the job as outlined, but I’m wondering how you would tackle this problem

Courteous comments welcome, of course.

8 Comments

Andrey Kuzmin on Saturday, 4 April, 2009 at 3:55 am

Looks like isilon’s cup of tea: global namespace, CIFS, large files, high aggregate bandwidth. A cluster per branch with synchronization they supply seems to be a solution. Recent additions of Ocarina & Cisco’s dedupe could make geographic distribution more efficient.
Storagezilla on Saturday, 4 April, 2009 at 4:31 am

Atmos is shipping, is in production at customer sites and the Cloud Infrastructure Group are actively hiring.

Anyone interested should contact their local EMC rep.

And if you’re going to EMC World this year Robin the CIG team will be present.
Nicholas Lee on Saturday, 4 April, 2009 at 1:52 pm

An F5 ARX unit could work with existing storage. Load balance local storage and replicate to remote storage.

http://www.f5.com/products/arx-series/
kamiel str on Sunday, 5 April, 2009 at 1:46 am

How about Caringoâ€™s Castor solution? Castor offers solid storage that is easy to (pre-emptivly) replicate. It can store huge files in unlimited quantities on generic hardware without the need for separate backup. IMHO Castor offers a very good solution for these demanding, distributed siteâ€™s.

Data stored on Castor-managed storage nodes can be streamed directly from the cloud but it can also served out as a CIFS (or NFS) share. The Caringo fileserver is -by design- clustered, extremely scalable and meant to provide global (because database driven) namespace. Aggregated bandwidth is simply achieved by deploying more parallel systems.

Built by the minds behind Centera (and for that matter also parts of Atmos) this solution is not residing in Researchville but is production in numerous sites in the US and Europe.

Pre-emptive- and low impact replication, a standard file access protocol (CIFS) with scalable connection capacity and, due to generic hardware, a pleasant price point; This makes CAStor is a very good and recession proof alternative to the usual suspects.
David Slik on Sunday, 5 April, 2009 at 4:34 pm

Bycast has many mission-critical distributed imaging deployments doing exactly this. From realtime high-resolution cardiology cine angiography to 2GB multi-slice radiology image retrieval and processing, StorageGRID is time-proven with over eight years of commercial deployments in hundreds of organizations all over the world.

StorageGRID provides a global namespace, industry standard CIFS, NFS and HTTPS data access, scalability to hundreds of PBs, 10’s of GB/s throughput, and hundreds of distributed sites, each of which can operate together as a single grid or as separate islands.

Data is intelligently stored and managed based on powerful information lifecycle manage rules, and when accessed, is routed on-demand or pre-fetched to remote sites according to explicit hints or access patterns. High performance multi-tier caching provides extremely high-performance local throughput, and file system access can even be tightly integrated into GPFS clusters to provide supercomputer level scalable file system performance.

For WAN links connecting sites, StorageGRID intelligently monitors topology and computing resources, to determine how to satisfy each user request with the highest throughput and lowest latency. Data movers have been optimized to eliminate the protocol efficiencies of CIFS, and to provide 95%+ link utilization even over high-latency links.

Your reader’s requirements sound very much like that of today’s modern distributed medical imaging customer â€”Â need to rapidly acquire, store and distribute high resolution imagery for processing and analysis to people distributed over a large geographic area. We’d be glad to explain how StorageGRID is designed and proven to solve exactly this problem space, and to introduce your reader to some of the world’s largest healthcare organizations that run their digital workflow on StorageGRID.
Storagezilla on Thursday, 9 April, 2009 at 12:15 pm

Kamiel Str: There’s no one at Caringo who built any part of Atmos unless they also wrote Patrick Eaton’s thesis for him.
Joe Kraska on Friday, 10 April, 2009 at 12:32 pm

Andrey:

Tell me more regarding the proposed Isilon solution. Please be aware that Bycast, Carigno, and Atmos all provide transparent global namespace. A local user does not have to be aware of the locality of the files. Rather, they request access to the file and are redirected to the closest available file, no matter where it is in the enterprise. Location awareness is a significant system liability, as locations can be fragile.
Geoff Tudor on Saturday, 18 April, 2009 at 7:51 am

Based on the description, Nirvanix could address this quickly and at low cost without any capex required and have a solution up-and-running in a matter of hours.

1) CloudNAS CIFS virtual appliance installed at customer site onto an existing x86 windows or Linux server, and images copied into the cloud
2) CloudNAS is installed at all remote sites with the same account credentials. This gives remote sites a single global mount-point that they can all share. All file replication handled by the Nirvanix platform.
3) As users access the application, the application attaches to the CloudNAS CIFS interface and pull down the files from the most local node. Files could then be kept in local CloudNAS cache and subsequent reads would be local to provide high-performance to the end users.
4) Assuming 1 TB of files that are viewed, total cost: $250.00 – $300.00 per month.