Is Parascale new or old?
There were many good reader questions about Parascale’s announcement. Even though I’ve done some work for them I didn’t know the answers so I invited their CTO, Cameron Bahar, to respond. He sent me a text only email, which I’ve decorated with some HTML to improve readability.

CTO Cameron Bahar:

Hi Robin,

We are delighted by the interest shown in both the file management challenges that Parascale seeks to address…and in our newly-announced solution. Your readers bring up many important issues, especially in regards to how existing solutions compare to Parascale. Permit me to try to group these questions into categories and to highlight how Parascale is different.

HPC solutions High Performance Computing (HPC) solutions are typically implemented with kernel code and employ custom client-side software to achieve high bandwidth. For example, Lustre has been successful at many national labs as mentioned in one post. Parascale is targeting a different market. Parascale is all about industry standards. We support NFS, HTTP, and FTP protocols because we don’t expect our customers to recompile their applications. We want our software to be simple to use, as well as to scale in capacity and bandwidth for our target digital content applications.

Archival solutions. Several companies, including Archivas, have delivered archival systems. These solutions are generally WORM (write once read many) systems and disallow updates to existing files. By comparison, Parascale is POSIX-compliant and designed to support large read/write bandwidth—not always a requirement for archiving. Finally, if a large vendor has acquired these technologies (e.g. HDS-Archivas), they’re usually shipped as a rack of pre-installed appliances, limiting choice of hardware provider and hardware configuration.

Clustered file systems. Shared-disk clustered file systems such as Red Hat GFS have the characteristics of traditional distributed file systems such as tight cache coherency, distributed lock management, symmetric topology. Scalability of these file systems is generally limited to 16 or 32 nodes due to heavy cache coherency traffic and message passing between nodes.

Members of our engineering team have written several clustered file systems in previous undertakings. From that experience we elected to adopt a very different architecture for Parascale. For starters, we elected to adopt a loosely-coupled architecture for scalability. Further, we chose not to write a new file system. File systems are very delicate (as we know by having written them in the past) and they take 5-7 years to fully stabilize and stop corrupting data. We simply aggregate existing file systems to present a “virtual file system” layer to clients/applications over standard protocols.

Appliances versus software. NAS appliances are ideal for many markets, like SMBs and enterprise workgroups, that need simplicity of installation and for which scalability in volume and bandwidth are not key requirements. Appliances generally employ hardware highly-customized for serving files, including hardware features like NVRAM to boost write-performance and RAID controllers for data redundancy.

Parascale seeks to solve a different problem, that for management of large digital content repositories. Think of video on demand, photo archives, medical imaging, seismic data, and genomics data. Don’t fault us for being inappropriate as secondary storage for an RDBMS. We didn’t design Parascale for block storage because many excellent products already address this market.

We’ve constrained our solution to run as an application (with no kernel code) on industry-standard servers, as qualified only by Red Hat. We want our customers to enjoy the very latest advances in server hardware (motherboards, processors, memory, disks) available from Dell, HP and others. And we want our customers to be able to buy servers from their “regular hardware vendor”

Parascale’s software-only solution lets our customers to tune the disk capacity, CPU, RAM, I/O and network bandwidth independently—as required by the application at hand. Growth can be incremental—one disk drive or server at a time. You never have to discard hardware or licenses. Another useful benefit of a software-only solution is that other applications can coexist on the Parascale storage nodes, allowing data mining, trans-coding, encryption, or compression on the servers where the data resides. This is not possible with closed appliances.

What qualifies as “software-only” file storage solution? Our perspective is, first, that the software has to support standard network file access protocols like NFS, HTTP, or CIFS. You can store files in an RDBMS, but that doesn’t make it a software-only file management solution. Second, the disk drives must be direct-attached to the servers. Shared disk distributed or parallel filesystems (over SAN) are software products, but don’t qualify because they require specialized SAN hardware on the back end.

Finally, because all our engineering resources are focused on software, we’ve been able to innovate (with patents to prove it) and to deliver features like transparent, automated file migration (to eliminate server hot spots) and replication (to raise read bandwidth). And our roadmap promises a lot more innovation to follow!

Asked another way, where does Parascale fit in the market? Choose us if:

  • You want industry-standard hardware (e.g. because you want to run applications on the storage nodes, or because you have corporate hardware standards).
  • You need more bandwidth than one server/head can provide.
  • You need the benefits of data mobility across servers (e.g. migration to balance data and eliminate hot spots, replication to increase read bandwidth, smart load balancing to optimize system performance).

Lastly, Parascale aspires to be new and modern in its business model. When our product goes production, we plan to allow you to download our software to try it out at no cost. We’re confident you’ll like it. Our pricing is per-spindle, so you never have to deploy or pay for storage capacity before you need it. And if a drive fails, replace it with a new drive in the manufacturers’ current sweet-spot; we’re not trying to make money on advances by the disk drive manufacturers.

Hope I’ve addressed some of the questions posted. I applaud the thoughtful discussion that your post has prompted.



Comments welcome, of course.