The video folks have an interesting set of problems: large needs; major bandwidth; time-critical collaboration; lots of metadata; and more. Like budgets. I do some video production myself and empathize.

They are today where most of us will be in 10 years: lots of large files; local and remote sharing; processor and bandwidth intensive operations; large archives of wanted and rarely accessed files. Today high-end video folks are working at 2k, 4k and, sometimes, 8k video resolutions – and 10 years from now I wouldn’t be surprised if home users weren’t too.

What prompts this is a note I received from, well, I’ll let him introduce himself.

I have a boutique post-production company and I’m a filmmaker. We are small, under a dozen, but swell to a few times that size with freelancers on a project-by-project basis. Because we work with very high resolution media, we need a lot of space, and very high throughput to each user. . . . [W]e’re all working with 2K and 4K media (300 and 1200MBps respectively to EACH user) and 3D animation rendering. . . . We use a mix of Linux, Windows, and OS X clients. In total, we could easily make use of 100TB+ right now, and prefer to stop archiving everything to tape and deleting it, but rather migrate to another tier of storage but keep in one global namespace with the tape just for disaster recovery. We also need security administration.

I can’t find a storage system that does all this. DataDirect Networks seems to be the du jour high-end storage for my industry, and supposing I’m willing to finance that big-ticket brand, they still don’t have a filing system answer. They’re suggesting StorNext or CXFS, and I know the multi-user scalability and expansion limitations well (can anybody say “forklift”?).

The closest I’ve come is Lustre. It seems like it would fit the bill nicely, especially since we’re savvy to integrate in-house, except that it is Linux only, and NFS/CIFS gateways don’t seem like a great idea. I keep hearing they’re working on at least a Windows client, but who knows when it will be ready?

Can you help at all? What have I overlooked? Doesn’t anyone make what I’m looking for?

Short answer to last question:
No.

Longer answer:
No. But there are workarounds.

For those new to video, here’s an abbreviated chart of some video rates in megabytes per second:
video_data_rates1 [Adapted from Integrity Data Systems which offers the whole chart. Aspect ratios and frame rates left out.]
Update: Larry Jordan, a writer and trainer in video editing, graciously wrote to let me know that the above data rates are uncompressed – and that most production houses would use compressed data. The amount of compression varies based on the codec as Larry explains in this informative post. End update.

Issue 1: Interconnects
GigE won’t even handle 32-bit RGB standard def video. And when you get into HD video it gets hairier fast. Trunk multiple GigE’s? 10GbE? 4x Infiniband? FC? eSATA or PCI-e direct attached storage?

Issue 2: Virtualization
A single address space is a wonderful thing. You’ll need a software layer that clusters multiple boxes. You’ll also probably want to build an archive infrastructure that is distinct from your higher performance working set storage, but some vendors will disagree.

Likely software suspects include IBRIX, Parascale, Caringo, MatrixStore, Bycast and Permabit.

On the combined HW/SW side there’s Panasas and Isilon. Something tells me there are some other options, like HP’s Extreme Data Storage 9100, that are also applicable.

Lustre is not a product I would recommend since it was designed for HPC, a market where PhDs work as sysadmins. Sun may have tamed it since they bought it, but it is a non-trivial piece of software.

Come one, come all
StorageMojo readers are invited to offer their 2¢ worth. Architecting is non-trivial, especially if money is an object.

Update:
Our interlocutor wrote in to add some detail:

thanks for the response. Here’s some answers:

– We can manage expensive interfaces like 10GigE and Infiniband QDR. We’ve been paying for dual-channel 4Gb FC for the past few years, after all. I just want to also allow standard Gigabit connections to the cheap seats without a lot of complexity. So I guess the jargon for that would be “multiprotocol” switching?

– The large naming space might be a luxury. The fact is that jobs come in one of three general sizes, and we could have volumes of that size waiting to take on new jobs as they come in, so at least there is one namespace per job. As you said, capacity is cheap…

– Truth is I am pretty savvy, but other than that we have a lot of power desktop users but not sysadmin types. I contract some people with steady part-time work, but it has been our business model to try to keep as many of our full-time people on the creative and producing side as possible, and not in support/administration.

The one thing I don’t understand is what you say about Infiniband not being so great when there’s lots of node churn?

I know what you mean about DAS, but I think I’ve ruled out distributing the data through push/pull from a central repository. The fact is jobs just move to fast through here for that, and we often have about two seconds notice that we need to bring a certain job’s data to System X, Y or Z to do work on it. It’s very dynamic.

I see some brands in your blog post I haven’t checked on yet.

What turned me onto Lustre is that Frantic Films in London has deployed it. They’re the only ones AFAIK.
End update.

The StorageMojo take
Some thoughts on the infrastructure issues.

Capacity is cheap, network bandwidth is expensive. Raw SATA disk is less than $0.10/GB. 10GbE switch ports are over a grand apiece. Infiniband is better from a price/performance perspective, but not as friendly for networks where there is much node churn – unless that’s been fixed in the last few years.

Direct attached storage will give you the best performance – especially with 4k. The new PCI-e attached arrays from JMR and others can offer up to 4,000 MB/sec bandwidth. Stripe across 4 of those and you’ll be able to handle 8k.

Transaction processing is well on its way to niche status, like mainframes and hierarchical databases that once ruled the earth. It is a big file world out there and the files are getting bigger every year.

Courteous comments welcome, of course. I’ve done work for many of these folks – but not all – at one time or another.