<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>StorageMojo &#187; Cloud computing &amp; storage</title>
	<atom:link href="http://storagemojo.com/category/cloud-computing-storage/feed/" rel="self" type="application/rss+xml" />
	<link>http://storagemojo.com</link>
	<description>Data storage info &#38; analysis</description>
	<lastBuildDate>Mon, 21 May 2012 22:16:25 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Cleversafe: massive storage, massive patents</title>
		<link>http://storagemojo.com/2012/05/07/cleversafe-massive-storage-massive-patents/</link>
		<comments>http://storagemojo.com/2012/05/07/cleversafe-massive-storage-massive-patents/#comments</comments>
		<pubDate>Mon, 07 May 2012 15:15:54 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Cloud computing & storage]]></category>
		<category><![CDATA[Clusters]]></category>
		<category><![CDATA[Marketing]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2674</guid>
		<description><![CDATA[Spoke to Chris Gladwin, founder and CEO of Cleversafe at NAB 2012. Cleversafe had stopped communicating a few years ago &#8211; usually a bad sign &#8211; so an update was long overdue. When last heard from, Cleversafe had an ISP/MSP target market, offered an open-source version of their software, and focused on safely archiving confidential [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>Spoke to Chris Gladwin, founder and CEO of <a href="http://cleversafe.com/overview/how-cleversafe-works" target="_blank">Cleversafe</a> at NAB 2012. Cleversafe had stopped communicating a few years ago &#8211; usually a bad sign &#8211; so an update was long overdue.</p>
<p>When last heard from, Cleversafe had an ISP/MSP target market, offered an open-source version of their software, and focused on safely archiving confidential data on public networks. No more.</p>
<p><strong>A 50PB order</strong><br />
CEOs are professional optimists. But Chris&#8217;s story was good.</p>
<p>Their 1st order was 100TB. The 2nd, 50 petabytes. That is a lot of boxes to rack.</p>
<p>Now Cleversafe focuses on multi-petabyte orders. They can handle 5-7 such orders a year.</p>
<p><strong>Patents</strong><br />
But when they aren&#8217;t installing petabytes of disk, they&#8217;re writing patents. Hundreds of them.</p>
<p>Chris thought they were up to 268 patent applications. The USPTO shows 21 granted patents, including 7,904,475 <i>Virtualized data storage vaults on a dispersed data storage network</i>, 7,853,710 <i>Methods and devices for controlling the rate of a pull protocol</i>, 7,844,712 <i>Hybrid open-loop and closed-loop erasure-coded fragment retrieval process</i>, 7,818,518 <i>System for rebuilding dispersed data,&#8221; 7,818,430</i>, <i>Methods and systems for fast segment reconstruction</i>, and 7,574,579 <i>Metadata management system for an information dispersed storage system</i> along with another 181 applications yet to be granted.</p>
<p>At a conservative $25k per patent in legal and filing fees and lost engineering time, that&#8217;s $6.7 million. If there&#8217;s another startup as aggressive on patents, I haven&#8217;t heard of it.</p>
<p><strong>The StorageMojo take</strong><br />
It&#8217;s seems that most, if not all, of Cleversafe&#8217;s business comes from the US intelligence community, not commercial users. Otherwise we&#8217;d see reference sites and more interest from top-tier VCs. </p>
<p>Regardless, Cleversafe&#8217;s strategy of massive orders, massive patents and limited fulfillment is unlike any other in the industry. Is the limited fulfillment due to a complex product &#8211; the GPFS of scale-out storage &#8211; or a limited market?</p>
<p>As many HPC-focused firms have found, it can be difficult to shift from extremely specialized high-end government customers to commercial users. Companies that have, like <a href="http://panasas.com/" target="_blank">Panasas</a>, have had to work to keep their products general purpose, avoiding the honey-trap of fascinating but one-off designs.</p>
<p>Cleversafe&#8217;s pivot from the commercial market and open-source may reflect a 1st mover disadvantage: too early to the commercial market, they&#8217;ve been co-opted by the government market. But the bigger concern is whether or not that massive patent portfolio will stall development of better high-scale storage systems.</p>
<p>Cleversafe&#8217;s exit strategy seems to have almost as much to do with patents as it does with building a business. Are they the first storage company hoping for a buyout by a patent troll?</p>
<p><strong>Courteous comments welcome, of course.</strong> I&#8217;ve recently done work for Panasas and am working with a company &#8211; Amplidata &#8211; being sued by Cleversafe.</p>
<div class="twttr_button">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2012/05/07/cleversafe-massive-storage-massive-patents/&text=Cleversafe: massive storage, massive patents" target="_blank" title="Click here if you liked this article.">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2012/05/07/cleversafe-massive-storage-massive-patents/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The network is choking our storage</title>
		<link>http://storagemojo.com/2011/10/20/the-network-is-choking-our-storage/</link>
		<comments>http://storagemojo.com/2011/10/20/the-network-is-choking-our-storage/#comments</comments>
		<pubDate>Thu, 20 Oct 2011 17:03:08 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Cloud computing & storage]]></category>
		<category><![CDATA[Clusters]]></category>
		<category><![CDATA[Future Tech]]></category>
		<category><![CDATA[SAN, FC]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2533</guid>
		<description><![CDATA[Amazon Web Services architect James Hamilton has been posting on network issues for over a year and researching them much longer. As Ethernet becomes the de facto SAN technology, his views become more relevant to the larger storage market. Critique Part of Mr. Hamilton&#8217;s concern is the structure of the networking industry: the high margins; [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>Amazon Web Services architect James Hamilton has been <a href="http://perspectives.mvdirona.com/2011/10/01/ChangesInNetworkingSystems.aspx" target="_blank">posting</a> on network issues for over a year and researching them much longer. As Ethernet becomes the <i>de facto</i> SAN technology, his views become more relevant to the larger storage market.</p>
<p><strong>Critique</strong><br />
Part of Mr. Hamilton&#8217;s concern is the structure of the networking industry: the high margins; the dominance of a single player, Cisco; the closed technology; and the heavy vertical integration. All antithetical to the dynamics that have driven server costs down so successfully in the last 20 years.</p>
<p>These are issues the storage industry knows too well. But Mr. Hamilton is more concerned about the waste the current high-cost industry structure causes.</p>
<p>Waste?</p>
<p><strong>Workload placement</strong><br />
The cost of network bandwidth leads to network over-subscription. Networks are configured as tree topologies: the further you move from end nodes the worse the over subscription. </p>
<p>As described in the 2009 Microsoft Research paper <a href="http://research.microsoft.com/pubs/80693/vl2-sigcomm09-final.pdf" target="_blank">VL2: A Scalable and Flexible Data Center Network</a>:</p>
<blockquote><p>
. . . the capacity between different branches of the tree is typically over- subscribed by factors of 1:5 or more, with paths through the highest levels of the tree oversubscribed by factors of 1:80 to 1:240. This limits communication between servers to the point that it fragments the server pool — congestion and computation hot-spots are prevalent even when spare capacity is available elsewhere.
</p></blockquote>
<p>This throttles data center performance by limiting server-to-server bandwidth, fragmenting resources and reducing network utilization. The latter reflects the redundant paths needed in case of switch failure: ≈50% or more of costly data center bandwidth goes unused.</p>
<p>As might be expected, big Internet data centers like Amazon&#8217;s have complex and unpredictable workloads. They need lots of bandwidth between all servers all the time.</p>
<p><strong>A solution</strong><br />
The VL2 paper describes an experimental solution to these problems that includes <i>location-specific</i> and <i>application-specific</i> addressing, multi-path traffic load balancing and a novel directory design that efficiently handles lookups and updates to network mappings.</p>
<p>In an 75-node test cluster the design moved 2.75TB of data in 395 seconds &#8211; 94% of maximum network bandwidth &#8211; at a fraction of the cost of current enterprise networks. The paper calculates that a cloud-service scale network with no over-subscription could be built with commodity switches at <strong>1/14th the cost</strong> of a traditional data center Ethernet.</p>
<p>Whoa!</p>
<p><strong>The StorageMojo take</strong><br />
VC and engineering dollars follow high-growth markets. What Google, Amazon and Microsoft want, they get. With the rapid growth of public cloud services the network over-subscription problem will get solved. </p>
<p>Merchant silicon from Broadcom, Intel and Marvell is making a tried-and-true Moore&#8217;s Law attack on hardware cost. The protocol stack is tougher, but several open-source industry initiatives are under way with support from major companies. Progress will be slower than hoped, but within 3 years we&#8217;ll have a viable stack to build on.</p>
<p>Where does this leave the networking industry? That depends on where you sit.</p>
<p>Cisco will be the biggest loser, because they&#8217;ve been the biggest winner with the current model. They may need to pull an IBM and move big into services if they want to stick around. Ironically, Cisco&#8217;s UCS product line &#8211; which bakes in the tree-structured network &#8211; has further motivated broader industry action.</p>
<p>The rest of the industry can go after this emerging market with a lower-GM business model. Not all of them will, but it will be a critical success factor. </p>
<p>The big winner will be storage. Scale-out storage relies on spraying data across multiple racks for maximum availability, utilization and performance. Cheaper, faster, better scale-out networks will only drive storage demand.</p>
<p>For most of us this is an academic problem today. Lightly used systems &#8211; such as for backup and archiving &#8211; don&#8217;t see Amazon&#8217;s problems. But in 5 years this will be common even outside the public cloud providers.</p>
<p>Just as IT users have benefited from Google&#8217;s push on energy efficiency and much more, they will also benefit from much lower cost and more scalable networks.</p>
<p><strong>Courteous comments welcome, of course.</strong> I can&#8217;t help but continue to marvel at how dumb Cisco&#8217;s UCS has turned out to be. It&#8217;s a gift that keeps on giving.</p>
<div class="twttr_button">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2011/10/20/the-network-is-choking-our-storage/&text=The network is choking our storage " target="_blank" title="Click here if you liked this article.">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2011/10/20/the-network-is-choking-our-storage/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Storage @VMworld 2011</title>
		<link>http://storagemojo.com/2011/09/12/storage-vmworld-2011/</link>
		<comments>http://storagemojo.com/2011/09/12/storage-vmworld-2011/#comments</comments>
		<pubDate>Mon, 12 Sep 2011 16:53:32 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Cloud computing & storage]]></category>
		<category><![CDATA[Clusters]]></category>
		<category><![CDATA[Enterprise]]></category>
		<category><![CDATA[SSD/Flash Disk]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2519</guid>
		<description><![CDATA[VMworld is the best storage show I&#8217;ve seen in years. VMware&#8217;s severe storage problems leave users hungry for solutions &#8211; and your friendly neighborhood storage industry is happy to oblige. It&#8217;s almost as if VMware were owned by a storage company. Flash everywhere Fusion-io, Nimble Storage, Nimbus Data, Avere, Pure and more were talking about [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>VMworld is the best storage show I&#8217;ve seen in years. VMware&#8217;s severe storage problems leave users hungry for solutions &#8211; and your friendly neighborhood storage industry is happy to oblige.</p>
<p>It&#8217;s almost as if VMware were owned by a storage company.</p>
<p><strong>Flash everywhere</strong><br />
<a href="http://www.fusionio.com/" target="_blank">Fusion-io</a>, <a href="http://www.nimblestorage.com/" target="_blank">Nimble Storage</a>, <a href="http://www.nimbusdata.com/" target="_blank">Nimbus Data</a>, <a href="http://www.averesystems.com/" target="_blank">Avere</a>, <a href="http://www.purestorage.com/" target="_blank">Pure</a> and more were talking about how well flash supports VMware. Fixes VDI boot storms, deduped VMDKs, I/O bound servers and much more.</p>
<p><strong>Pure Storage</strong><br />
Here is <a href="http://www.purestorage.com/" target="_blank">Pure&#8217;s</a> Matt Kixmoeller giving a nifty demo in this 50 second video:</p>
<p><object width="500" height="306"><param name="movie" value="http://www.youtube.com/v/7_7ps2ci8tk?version=3"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/7_7ps2ci8tk?version=3" type="application/x-shockwave-flash" width="500" height="306" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>Not exactly sure what those thousand VMs were doing. Maybe Pure will comment.</p>
<p><strong>Falconstor</strong><br />
I lost track of <a href="http://www.falconstor.com/" target="_blank">Falconstor</a> due to their OEM focus and sprawling product line. New CEO Jim McNiel has refocused the company &#8211; with the help of former Cheyenne teammates &#8211; on backup, business continuity/DR, dedup and virtualization.</p>
<p>Their clustered Network Storage Server turns all of Fstor&#8217;s products into tin-wrapped software suitable for channel partners. Takeaway: forget what you knew about them; they are a new company.</p>
<p><strong><a href="http://www.virsto.com/" target="_blank">Virsto</a></strong><br />
While the release of their storage hypervisor for VMware makes them seem like a new company, Virsto has been shipping product for over a year, but on Hyper-V, not VMware. Microsoft lost interest in server virtualization and Virsto moved on.</p>
<p>Their product is a virtual appliance that:</p>
<blockquote><p>
. . . runs in each host, creating a transparent virtual storage layer that is thin provisioned, fully cluster-aware, supports very rapid snapshot and clone creation, and scales to support tens of thousands of high performance snapshots and clones.</p>
<p>Virsto . . . decouple[s] application performance from any dependence on the rotational latencies and seek times of underlying disk associated with random writes. All random writes are sequentialized and written directly to a transparent logging device . . . and then asynchronously de-staged to primary storage. . . .
</p></blockquote>
<p>Net/net: high performance virtual storage regardless of underlying physical storage. Virsto offers a free trial &#8211; if you try it, let me know how it works.</p>
<p><strong>But wait! There&#8217;s more!</strong><br />
Cloud-related products from <a href="http://www.storsimple.com/" target="_blank">StorSimple</a>, <a href="http://amax.com/default.asp" target="_blank">AMAX</a> and <a href="http://raidundant.com/v2/" target="_blank">Raidundant</a> continue to pick at the problem of how/when/where cloud integrates with the enterprise.</p>
<p><strong>The StorageMojo take</strong><br />
Many cool products and ideas. The storage problems of many virtual machines are not unlike those of earlier time-shared virtual memory systems, but the scale is much greater. </p>
<p>And when the scale is greater the problem is fundamentally different. As virtualization grows we&#8217;ll need to see more creative answers beyond deduplication and flash.</p>
<p><strong>Courteous comments welcome, of course.</strong> Message to SNIA: storage networking is passé. Time to retool for the world of virtual machines, noSQL databases, scale-out storage and flash-enabled architectures. New name would be a start.</p>
<div class="twttr_button">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2011/09/12/storage-vmworld-2011/&text=Storage @VMworld 2011 " target="_blank" title="Click here if you liked this article.">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2011/09/12/storage-vmworld-2011/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Beta hunt: Java Platform as a Service</title>
		<link>http://storagemojo.com/2011/08/05/beta-hunt-java-platform-as-a-service/</link>
		<comments>http://storagemojo.com/2011/08/05/beta-hunt-java-platform-as-a-service/#comments</comments>
		<pubDate>Fri, 05 Aug 2011 23:33:23 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Cloud computing & storage]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2500</guid>
		<description><![CDATA[Running &#8211; or planning to run &#8211; some bigtime Java apps? A Silicon Valley startup named Cumulogic is looking for a few good beta testers to help them wring out their Java Paas. Their ideal tester can use a Java PaaS running on either vSphere, Eucalyptus or Cloud.com private clouds or Amazon&#8217;s EC2. They currently [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>Running &#8211; or planning to run &#8211; some bigtime Java apps? A Silicon Valley startup named Cumulogic is looking for a few good beta testers to help them wring out their Java Paas.</p>
<p>Their ideal tester can use a Java PaaS running on either vSphere, Eucalyptus or Cloud.com private clouds <i>or</i> Amazon&#8217;s EC2. They currently support Ngnix 0.8, Apache 2.x, Tomcat 7.x, JBoss 4.x-6.x, and MySQL 5.x, MongoDB as well as connections to an Oracle database.</p>
<p>They know that a lot of people run WebLogic or Websphere and have those on their roadmap, but this is where they are today. If you have a Websphere cloud license they can support that.</p>
<p>Their control panel seemed reasonably complete in the demo I saw, but that&#8217;s the point of demos, isn&#8217;t it? If everything worked perfectly it wouldn&#8217;t be a beta.</p>
<p><strong>The StorageMojo take</strong><br />
PaaS is a sweet spot: apps + data + computes all in one place. Is Java on PaaS an even sweeter spot? You tell me.</p>
<p>But the beta tester has an important role because development teams don&#8217;t know what they don&#8217;t know. The model user they have in mind is, in my experience, rarely seen in the wild. </p>
<p>It is that rawest market feedback dev teams get. If Java is important to you please <a href="http://cumulogic.com/epaaswebsite/default/registerAction?action=register" target="_blank">register for the beta</a>. Click around and you can find out more about them</p>
<p>If you do beta it I&#8217;d love to hear about your experience in the comments.</p>
<p><strong>Courteous comments welcome, of course.</strong>  </p>
<div class="twttr_button">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2011/08/05/beta-hunt-java-platform-as-a-service/&text=Beta hunt: Java Platform as a Service" target="_blank" title="Click here if you liked this article.">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2011/08/05/beta-hunt-java-platform-as-a-service/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The per-slot cost metric</title>
		<link>http://storagemojo.com/2011/07/25/the-per-slot-cost-metric/</link>
		<comments>http://storagemojo.com/2011/07/25/the-per-slot-cost-metric/#comments</comments>
		<pubDate>Mon, 25 Jul 2011 20:21:00 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Cloud computing & storage]]></category>
		<category><![CDATA[Management]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2466</guid>
		<description><![CDATA[Commenters on the last post &#8211; Open source storage array &#8211; helped crystallize an idea that&#8217;s been lurking for years: comparing disk storage hardware on per-slot price. The Backblaze box, which costs about $50/slot, got a comment that said, in effect, &#8220;it doesn&#8217;t have the features of a $200/slot box.&#8221; Good! But the comment raised [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>Commenters on the last post &#8211; <a href="http://storagemojo.com/2011/07/20/open-source-storage-array/" target="_blank">Open source storage array</a> &#8211; helped crystallize an idea that&#8217;s been lurking for years: comparing disk storage hardware on per-slot price. The Backblaze box, which costs about $50/slot, got a comment that said, in effect, &#8220;it doesn&#8217;t have the features of a $200/slot box.&#8221; Good! </p>
<p>But the comment raised an interesting point: since we all use the same disks from the same few &#8211; and soon to be fewer &#8211; manufacturers, isn&#8217;t the cost of the tin we wrap them in a key metric? Let&#8217;s call it PSC &#8211; Per Slot Cost.</p>
<p>Some advantages:</p>
<ul>
<li><strong>Focus on value-add.</strong> We know how many disk slots there are in a storage system. We know how much disks cost. Therefore, the per-slot price tells us what the vendor&#8217;s value-add per disk is &#8211; or what we&#8217;re supposed to think it is.</li>
<li><strong>Increases pricing contrast.</strong> Disk costs are typically 10-15% of the price of a mid-to-high end array. The number of disk slots in those arrays vary, as do individual disk capacities. These variables obscure what the vendor is asking for their value add.</li>
<li><strong>Cleaner comparisons.</strong> As a corollary to the previous point, PSC makes it easier to compare  architecturally similar systems &#8211; SAS vs SAS, hybrid SSD/SATA systems, RAID 6 systems &#8211; whose hardware cost structures should be similar.</li>
<li><strong>Focus on software value.</strong> Since most storage systems &#8211; even high-end systems &#8211; run on commodity hardware, the biggest price variable is in software. Isn&#8217;t that where we <i>should</i> focus?
</ul>
<p><strong>The cloud storage angle</strong><br />
PSC should be useful for market segmentation. Instead of dumping arrays into entry-level price buckets &#8211; such as $75-$100k or $/GB &#8211; the PSC should track with the value of the stored data. </p>
<p>Expect to see segments range from Bulk (the Backblaze segment) to Heavy Transactional (traditional big iron) with yet-to-be-named segments between. But the most important use for PSC is in highly-scalable architectures in the public vs private cloud storage arena. </p>
<p>Cloud architectures are distinguished by the fact that the larger they scale, the lower their PSC. This is partly a function of economic necessity &#8211; who can afford 2 dozen PB of Symm? &#8211; and largely due to their use of software-based object replication instead of RAID. </p>
<p>When your storage is cheap, you can afford triple replication. And when you have massive numbers of boxes &#8211; and at least 2 data centers &#8211; you can have strong disaster tolerance. So large-scale cloud suppliers have motive and opportunity to reduce PSC. </p>
<p>The private cloud space is where the calculus gets interesting. Many observers dismiss the private cloud concept because they can&#8217;t possibly compete with Amazon, Microsoft and Google on scale or cost, including PSC. </p>
<p><strong>The StorageMojo take</strong><br />
There is a private cloud market because there are other issues, such as network latency, and the commercialization of high-scale software such as Hadoop, that make it possible for any focused billion-dollar company to build a competitive  cloud infrastructure. The hardware is already a commodity, and many of the improvements that Google 1st pushed, such as more efficient power supplies, are now widely available.</p>
<p>The bigger issue for competitive private clouds is the enterprise IT mindset that lacks the skills to specify and manage them. This is where PSC comes in: it allows CFOs to compare their costs to best-in-breed cloud providers in a simple way.</p>
<p>PSC is just a metric, not <i>the</i> metric. The big guys are optimizing things &#8211; like power distribution &#8211; that won&#8217;t move the needle for smaller players. </p>
<p>But if you use commodity hardware then you should focus on the software. And since every big player is already running on commodity hardware &#8211; a Good Thing, BTW &#8211; let&#8217;s focus on getting software that delivers business value. To the extent that PSC helps decision-makers do that, it will help the industry shift the focus from things like $/GB to a higher-level discussion.</p>
<p><strong>Courteous comments welcome, of course.</strong> I just paid $250 per slot for an array with 1 controller, 1 fan and 1 Thunderbolt connection to my 1 desktop. Yes, I could have done better &#8211; if I didn&#8217;t want Thunderbolt. So PSC doesn&#8217;t trump all.</p>
<div class="twttr_button">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2011/07/25/the-per-slot-cost-metric/&text=The per-slot cost metric " target="_blank" title="Click here if you liked this article.">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2011/07/25/the-per-slot-cost-metric/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Open source storage array</title>
		<link>http://storagemojo.com/2011/07/20/open-source-storage-array/</link>
		<comments>http://storagemojo.com/2011/07/20/open-source-storage-array/#comments</comments>
		<pubDate>Thu, 21 Jul 2011 00:37:44 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Cloud computing & storage]]></category>
		<category><![CDATA[Clusters]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2458</guid>
		<description><![CDATA[Most business files are only opened a few times, yet remain valuable enough to keep on line, just in case. That cold data is normally stored on high-performance, high-price NAS boxes at $$/GB. Why? 2 years ago Backblaze, an online backup provider, open-sourced their storage pod design: 45 drives in a box (see Build a [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>Most business files are only opened a few times, yet remain valuable enough to keep on line, just in case. That cold data is normally stored on high-performance, high-price NAS boxes at $$/GB.</p>
<p>Why?</p>
<p>2 years ago <a href="http://www.backblaze.com/" target="_blank">Backblaze</a>, an online backup provider, open-sourced their storage pod design: 45 drives in a box (see <a href="http://www.zdnet.com/blog/storage/build-a-raid-6-array-for-100tb/603" target="_blank">Build a RAID 6 array for $100/TB</a>). Now they&#8217;re back with v2: 45 3TB drives in a box with higher performance.</p>
<p>Backblaze now has over 16PB of storage pods in production.<br />
<a href="http://storagemojo.com/wp-content/uploads//2011/07/backblaze_computer_room.jpg"><img src="http://storagemojo.com/wp-content/uploads//2011/07/backblaze_computer_room.jpg" alt="" title="backblaze_computer_room" width="470" height="337" class="aligncenter size-full wp-image-2460" /></a><br />
<strong>Now for the good news</strong><br />
Backblaze isn&#8217;t in the box building business. They designed the storage pod for their backup business and released the plans out of the goodness of their hearts and for the free publicity.</p>
<p>I&#8217;ve thought that this could be a viable business for someone who <i>doesn&#8217;t</i> want to be the next NetApp or Isilon. Someone happy to build and ship boxes on a cost-plus basis to people who understand and can support a fault-tolerant software layer above the box, but who don&#8217;t have time to chase down miscellaneous hardware from vendors who prefer to sell in bulk.</p>
<p>That vendor has emerged: <a href="http://protocase.com/products/index.php?e=Backblaze" target="_blank">Protocase</a>, the quick-turn enclosure shop that builds Backblaze&#8217;s enclosures.</p>
<p>I spoke to Protocase co-founder Doug Milburn &#8211; a PhD in mechanical engineering &#8211; today. Protocase will announce a complete just-add-drives storage pod: assembled, tested and software loaded box. Look for it in 2-4 weeks, priced at ≈$6k. With another $5500 for 3TB drives, it will come in at less than $90 per raw TB. </p>
<p>Why no drives? That&#8217;s the lion&#8217;s share of the cost and also the fastest to decline in price. They don&#8217;t need the inventory exposure and tech savvy shoppers can probably do better anyway. BTW, Backblaze has had good experience with the Hitachi HDS5C3030ALA630 drive.</p>
<p><strong>The StorageMojo take</strong><br />
This will help energize the private cloud market by reducing the entry price. Amazon and Google don&#8217;t use NetApp or EMC. Why should you?</p>
<p>And the savings over renting cloud storage can be substantial as this Backblaze chart suggests:<br />
<a href="http://storagemojo.com/wp-content/uploads//2011/07/backblaze_pb_cost.jpg"><img src="http://storagemojo.com/wp-content/uploads//2011/07/backblaze_pb_cost.jpg" alt="" title="backblaze_pb_cost" width="470" height="368" class="aligncenter size-full wp-image-2461" /></a><br />
True, Amazon provides many more services, but if you need petabytes for mini-bucks, this is hard to beat.</p>
<p><strong>Courteous comments welcome, of course.</strong> Read about the v2 storage pod at the Backblaze <a href="http://blog.backblaze.com/2011/07/20/petabytes-on-a-budget-v2-0revealing-more-secrets" target="_blank">blog post</a>. Or get the shorter version in my ZDnet post <a href="http://www.zdnet.com/blog/storage/build-a-135tb-array-for-7384/1453 target="_blank">&#8220;Build a 135TB array for $7,384</a>. </p>
<div class="twttr_button">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2011/07/20/open-source-storage-array/&text=Open source storage array" target="_blank" title="Click here if you liked this article.">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2011/07/20/open-source-storage-array/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
		<item>
		<title>Iron Mountain punts digital storage</title>
		<link>http://storagemojo.com/2011/05/10/iron-mountain-punts-digital-storage/</link>
		<comments>http://storagemojo.com/2011/05/10/iron-mountain-punts-digital-storage/#comments</comments>
		<pubDate>Tue, 10 May 2011 19:31:27 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Cloud computing & storage]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2370</guid>
		<description><![CDATA[Iron Mountain plans to either sell or shutter its digital archiving, eDiscovery and online backup and recovery solutions. Why? Investors driving management IM has been under pressure from a couple of funds that invested in the company and want board representation. The investors see a company that has built a national brand in a traditionally [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>Iron Mountain plans to either sell or shutter its digital archiving, eDiscovery and online backup and recovery solutions. Why?</p>
<p><strong>Investors driving management</strong><br />
IM has been under pressure from a couple of funds that invested in the company and want board representation. The investors see a company that has built a national brand in a traditionally fragmented industry, but whose results have been hurt by &#8211; among other projects &#8211; its digital storage business.</p>
<p>As one them said in an <a href="http://edgar.sec.gov/Archives/edgar/data/904495/000119312511061572/ddfan14a.htm" target="_blank">SEC filing</a>:</p>
<blockquote><p>
Management has done a commendable job building the Company’s North American Physical business, which currently represents 69% of 2010 Revenues and 86% of 2010 OIBDA. This business has an industry-leading market position, 44% margins and a sustainable, recurring revenue stream. . . .</p>
<p>Over the last decade, the Company has spent $2.7bn on its weak International Physical and ailing Worldwide Digital businesses, <strong>neither of which has come close to earning the margins or returns</strong> on capital of the North American Physical business. . . . </p>
<p>To us it hardly seems surprising that IRM is not competitively positioned against technology titans like EMC or Google in the digital marketplace.
</p></blockquote>
<p>(emphasis added) (OIBDA=operating income before depreciation &#038; adjustments)</p>
<p><strong>Update:</strong> Well, that didn&#8217;t take long. According to wide-awake reader Dave &#8211; see comments &#8211; IM has reached a &#8220;definitive agreement&#8221; to sell its digital business. Note to new owners: good luck! <strong>End update.</strong></p>
<p><strong>A long and winding road</strong><br />
IM acquired a developer of PC data protection, Connected, in November 2004. Since then they&#8217;ve bought LiveVault, DigiGuard, Anamnis, Accutrac, RMS Services, Stratify and, a year ago, Mimosa Systems.</p>
<p>In all, IM spent hundreds of millions on software companies. And, evidently, they spent a lot more on hardware: in a 2002 <a href="http://www.thefreelibrary.com/Iron+Mountain+Places+EMC+Storage+Infrastructure+At+Core+of+Digital...-a081797205" target="_blank">press release</a> from EMC, IM&#8217;s CIO notes</p>
<blockquote><p>
. . . the advanced architecture of EMC systems and software makes it cost-efficient to grow our information infrastructure. Our EMC information storage capacity has nearly tripled in the past three years, yet the costs associated with managing this information have essentially stayed the same. . . .</p>
<p>For their Digital Archives division, EMC Celerra networked-attached storage systems will provide a high availability environment for Iron Mountain&#8217;s File Shares and Home Services.
</p></blockquote>
<p>Today&#8217;s commodity scale-out storage was only in the planning stages in 2002. But selecting EMC &#8211; a premium-priced provider &#8211; suggests that IM got off on the wrong foot and never recovered.</p>
<p><strong>Pricing tells a tale</strong><br />
Looking at IM&#8217;s pricing is suggestive. They were charging a minimum of $0.35/GB/mo for Connected BackUp &#8211; 2-3x Amazon &#8211; plus the cost of initial setup and other server and client software charges. </p>
<p><strong>The StorageMojo take</strong><br />
At those rates IM should have been profitable using anything. Therefore they were losing business to more cost-effective suppliers and earning anemic profits on the &#8211; heavily discounted? &#8211; business they did win.</p>
<p>It is no accident that Amazon and Google engineered their own storage. IM lacked the expertise to do it themselves and ended up in a cost bind.</p>
<p>It is a cautionary tale for would-be cloud storage providers: either get your costs competitive with AWS or get a clear differentiation. Most, like Nirvanix, Nexenta, Zetta and Atmos are going with the latter. </p>
<p><strong>Courteous comments welcome, of course.</strong></p>
<div class="twttr_button">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2011/05/10/iron-mountain-punts-digital-storage/&text=Iron Mountain punts digital storage " target="_blank" title="Click here if you liked this article.">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2011/05/10/iron-mountain-punts-digital-storage/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Amazon&#8217;s EBS outage</title>
		<link>http://storagemojo.com/2011/04/29/amazons-ebs-outage/</link>
		<comments>http://storagemojo.com/2011/04/29/amazons-ebs-outage/#comments</comments>
		<pubDate>Fri, 29 Apr 2011 17:26:37 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Cloud computing & storage]]></category>
		<category><![CDATA[Clusters]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2360</guid>
		<description><![CDATA[Amazon&#8217;s outage was caused by a failure of the underlying storage &#8211; the Elastic Block Storage. Here&#8217;s what they learned. EBS The Elastic Block Store (EBS) is a distributed and replicated storage optimized for consistent and low latency I/O from EC2 instances. EBS runs on clusters that store data and serve requests and a set [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>Amazon&#8217;s outage was caused by a failure of the underlying storage &#8211; the Elastic Block Storage. Here&#8217;s what they learned.</p>
<p><strong>EBS</strong><br />
The Elastic Block Store (EBS) is a distributed and replicated storage optimized for consistent and low latency I/O from EC2 instances. EBS runs on clusters that store data and serve requests and a set of control services that coordinate and propagate I/Os.</p>
<p>Each EBS cluster consists of EBS nodes where data is replicated and I/Os are served. Nodes are connected by 2 networks: a primary high-bandwidth network for traffic between the EBS nodes and EC2 server instances; and a slower replication network intended as a backup and for reliable internode communication.</p>
<p>Newly written data is replicated ASAP. An EBS node searches the cluster for a node with enough capacity, connects to it and replicates the data, usually in milliseconds.</p>
<p>If connectivity to a node it is replicating to is lost the node assumes the other node failed and tries to find another node to replicate the data. In the meantime it holds onto all data until it can confirm the data is replicated.</p>
<p><strong>The outage</strong><br />
During a network change on April 21 to upgrade primary network capacity a mistake occurred: the primary network data traffic was shifted to the slower secondary network.</p>
<p>The secondary network couldn&#8217;t handle the traffic which isolated many nodes in the cluster. Losing contact with nodes they were replicating to the remaining EBS nodes sought new nodes, but the few remaining nodes were quickly overwhelmed in a retry storm.</p>
<p>The now degraded secondary network then slammed the coordinating control services. Configured with a long timeout the retry requests backed up and the control services suffered thread starvation. </p>
<p>Once a large number of I/O requests were backed up the control services had no ability to service I/O requests and began to fail I/O requests from other Amazon availability zones. Within two hours the Amazon team had identified this issue and disabled all new <code>create volume</code> requests in the cluster. </p>
<p>But then another bug kicked in.</p>
<p>A <a href="http://en.wikipedia.org/wiki/Race_condition" target="_blank">race condition</a> in EBS caused them to fail when closing a large number of replication requests. Because there were so many replication requests the race condition caused even more EBS notes to fail, re-creating the need to replicate even more data and again the control services were overwhelmed.</p>
<p><strong>Recovery</strong><br />
The Amazon team get control of the replication storms in about 12 hours. Then the problem was recovering customer data.</p>
<p>Amazon optimizes its systems to protect customer data. When a node fails it is not reused until its data is replicated.</p>
<p>But since so many nodes were failed the only way to ensure no customer data was lost was by adding more physical capacity &#8211; no easy chore &#8211; but that wasn&#8217;t all.</p>
<p>The replication mechanisms had been throttled to control the storm, so adding physical capacity also meant delicate management of the many queued replication requests. It took the team 2 days to implement a process.</p>
<p><strong>Amazon Relational Database Service</strong><br />
The Amazon Relational Database Service (RDS) uses EBS for database and log storage. RDS can be configured to operate within a single Amazon zone or replicated across multiple zones. Customers with a single zone RDS were quite likely to be affected, but a 2.5% of multi-zone RDS customers were affected as well due to another bug.</p>
<p><strong>Lessons learned</strong><br />
The network upgrade process will be further automated to prevent a similar mistake. But the more important issue is to keep a cluster from entering a replication storm. One factor is to increase the amount of free capacity in each EBS cluster.</p>
<p>Retry logic will be changed as well to back off faster to focus on reestablishing connections first before more retries. And of course, the race condition bug will be fixed.</p>
<p>Finally, Amazon has learned it must improve the isolation between zones. They will tune timeout logic to prevent thread exhaustion, increase control services awareness of zone loads and, finally, move more control services into each EBS cluster.</p>
<p><strong>The StorageMojo take</strong><br />
Data center opponents of cloud computing will point with alarm to this incident to make the case that they are still needed. But they forget that today&#8217;s enterprise gear is reliable only because of the many failures that led to better error handling.</p>
<p>While painful for the affected, the Amazon team&#8217;s response shows a level of openness and transparency that few enterprise infrastructure vendors ever display. Of course, that is due to the public nature of these large cloud failures; nevertheless the outcome is commendable.</p>
<p>But the battle is not only between large public clouds and private enterprise infrastructures, but between architectures. Traditionally, enterprise infrastructures have focused on increasing MTBF. Cloud architectures, on the other hand, have focused on fast MTTR &#8211; Mean Time To Repair.</p>
<p>What can be scaled up can also be scaled down. Not every application is suitable for public cloud hosting. But small-scale, commodity-based, self managing infrastructures are very doable. They are the bigger threat to the large proprietary hardware vendors of today.</p>
<p><strong>Courteous comments welcome, of course.</strong> I speculated in <a href="http://www.zdnet.com/blog/storage/amazons-experience-fault-tolerance-and-fault-finding/1354" target="_blank"> Amazon&#8217;s experience: fault tolerance and fault finding</a> about the cause of the failure, but I was wrong. A failure precipitated by a network upgrade? Way-y-y too simple.</p>
<div class="twttr_button">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2011/04/29/amazons-ebs-outage/&text=Amazon's EBS outage" target="_blank" title="Click here if you liked this article.">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2011/04/29/amazons-ebs-outage/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Google&#8217;s Megastore</title>
		<link>http://storagemojo.com/2011/04/20/googles-megastore/</link>
		<comments>http://storagemojo.com/2011/04/20/googles-megastore/#comments</comments>
		<pubDate>Wed, 20 Apr 2011 16:50:29 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Cloud computing & storage]]></category>
		<category><![CDATA[Clusters]]></category>
		<category><![CDATA[Information Management]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2349</guid>
		<description><![CDATA[Megastore handles over 3 billion writes and 20 billion reads daily on almost 8 PB of primary data across many global data centers. In a paper by Jason Baker, Chris Bond, James C. Corbett, JJ Furman, Andrey Khorlin, James Larson, Jean-Michel Léon, Yawei Li, Alexander Lloyd, Vadim Yushprakh titled Megastore: Providing Scalable, Highly Available Storage [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>Megastore handles over 3 billion writes and 20 billion reads daily on almost 8 PB of primary data across many global data centers. </p>
<p>In a paper by Jason Baker, Chris Bond, James C. Corbett, JJ Furman, Andrey Khorlin, James Larson, Jean-Michel Léon, Yawei Li, Alexander Lloyd, Vadim Yushprakh titled <a href="http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf" target="_blank">Megastore: Providing Scalable, Highly Available Storage for Interactive Services</a> Google engineers describe how it works. From the abstract:</p>
<blockquote><p>
Megastore is a storage system developed to meet the requirements of today&#8217;s interactive online services. Megastore blends the scalability of a NoSQL data store with the convenience of a traditional RDBMS in a novel way, and provides both strong consistency guarantees and high-availability. We provide fully serializable ACID semantics within fine-grained partitions of data. This partitioning allows us to synchronously replicate each write across a wide area network with reasonable latency and support seamless failover between data centers.
</p></blockquote>
<p><strong>The mission</strong><br />
Support Internet apps such as Google&#8217;s AppEngine. </p>
<ul>
<li>Scale to millions of users</li>
<li>Responsive despite Internet latencies to impatient users</li>
<li>Easy for developers</li>
<li>Fault resilience from drive failures to data center loss and everything in between</li>
<li>Low-latency synchronous replication to distant sites</li>
</ul>
<p><strong>The how</strong><br />
Scale by partitioning the data store and replicating each partition separately, providing full ACID semantics within partitions but limited consistency guarantees across them. Offer some traditional database features if they scale with tolerable latency.</p>
<p>The key assumptions are that data for many apps can be partitioned, for example by user, and that a selected set of DB features can make developers productive.</p>
<p><strong>Availability and scale</strong><br />
To achieve availability and global scale the designers implemented two key architectural features:</p>
<ul>
<li>For availability, an asynchronous log replicator optimized for long-distance</li>
<li>For scale, data partitioned into small databases each with its own replicated log</li>
</ul>
<p>Rather than implement a master/slave or optimistic replication strategy, the team decided to use Paxos, a consensus algorithm that does not require a master, with a novel extension. A single Paxos log would soon become a bottleneck with millions of users so each partition gets its own replicated Paxos log.</p>
<p>Data is partitioned into entity groups which are synchronously replicated over a wide area while the data itself is stored in NoSQL storage. ACID transaction records within the entities are replicated using Paxos.</p>
<p>For transactions across entities, the synchronous replication requirement is relaxed and an asynchronous message queue is used. Thus it&#8217;s key that entity group boundaries reflect application usage and user expectations.</p>
<p><strong>Entities</strong><br />
An e-mail account is a natural entity. But defining other entities is more complex.</p>
<p>Geographic data lacks natural granularity. For example, the globe is divided into non-overlapping entities. Changes across these geographic entities use (expensive) two-phase commits.</p>
<p>The design problem: entities large enough to make two-phase commits uncommon but small enough to keep transaction rates low.</p>
<p>Each entity has a root table and may have child tables. Each child table has a single root table. Example: a user&#8217;s root table may have each of the user&#8217;s photo collections as a child. Most applications find natural entity group boundaries.</p>
<p><strong>API</strong><br />
The insight driving the API is that the big win is scalable performance rather than a rich query language. Thus a focus on controlling physical locality and hierarchical layouts.</p>
<p>For example, joins are implemented in application code. Queries specify scans or lookups against particular tables and indexes. Therefore, the application needs to understand the data schema to perform well.</p>
<p><strong>Replication</strong><br />
Megastore uses Paxos to manage synchronous replication. But in order to make Paxos practical despite high latencies the team developed some optimizations:</p>
<ul>
<li><strong>Fast reads.</strong> Current reads are usually from local replicas since most writes succeed on all replicas.</li>
<li><strong>Fast writes.</strong> Since most apps repeatedly write from the same region, the initial writer is granted priority for further replica writes. Using local replicas and reducing write contention for distant replicas minimizes latency.</li>
<li><strong>Replica types.</strong> In addition to full replicas Megastore has 2 other replica types:
<ul>
	<i>witness replicas</i>. Witnesses vote in Paxos rounds and store the write-ahead log but do not store entity data or indexes to keep storage costs low. They are also tiebreakers when isn&#8217;t a quorum.<br />
	<i>Read-only replicas</i> are the inverse: nonvoting replicas that contain full snapshots of the data. Their data may be slightly stale but they help disseminate the data over a wide area without slowing writes.</li>
</ul>
</ul>
<p><strong>Architecture</strong><br />
What does Megastore look like in practice? Here&#8217;s an example. </p>
<p><a href="http://storagemojo.com/wp-content/uploads//2011/04/megastore_arch.png"><img src="http://storagemojo.com/wp-content/uploads//2011/04/megastore_arch.png" alt="" title="megastore_arch" width="460" height="310" class="aligncenter size-full wp-image-2350" /></a></p>
<p>A Megastore client library is installed on the app server. It implements Paxos and other algorithms such as read replica selection. The app server has a local replica written to a local <a href="http://storagemojo.com/2006/09/07/googles-bigtable-distributed-storage-system-pt-i/" target="_blank">BigTable</a> instance.</p>
<p>A <i>coordinator server</i> tracks a set of entity groups and observes all Paxos writes. The coordinator is simpler than BigTable and serves local reads.</p>
<p>Concurrent with writing local data to BigTable and the coordinator the Megastore library is also writing to a second full replica: a replication server and a second coordinator. The stateless replication servers handle the writes to the remote big table while the lower latency coordinator handles any reads from the remote replica.</p>
<p>Failures may leave writes abandoned or in an uncertain state. The replication servers scan for incomplete writes and offer no op values via Paxos to complete the.</p>
<p><strong>Availability</strong><br />
As coordinator servers do most local reads their availability is critical to maintaining Megastore&#8217;s performance. The coordinators use an out-of-band protocol to track other coordinators and use Google&#8217;s Chubby distributed lock service to obtain remote locks. If the coordinator loses a majority of its locks it will consider all entities in its purview to be out of date until the locks are regained and the coordinator is current.</p>
<p>There are a variety of network and race conditions that can affect coordinator availability. The team believes the simplicity of the coordinator architecture and their light network traffic makes the availability risks acceptable.</p>
<p><strong>Performance</strong><br />
Because Megastore is geographically distributed, application servers in different locations may initiate writes to the same end entity group simultaneously. Only one of them will succeed and the other writers will have to retry.</p>
<p>Limiting writes to a few per second per entity group makes contention insignificant, e-mail for example. </p>
<p>For multiuser applications with higher write requirements developers can shard entity groups more finely or batch user operations into fewer transactions. Fine-grained advisory locks and sequencing transactions are other techniques to handle higher write loads.</p>
<p><strong>The real world</strong><br />
Megastores been deployed for several years and more than 100 production applications using today. The paper provides these figures on availability and average latencies.</p>
<p><a href="http://storagemojo.com/wp-content/uploads//2011/04/megastore_availability_dist.png"><img src="http://storagemojo.com/wp-content/uploads//2011/04/megastore_availability_dist.png" alt="" title="megastore_availability_dist" width="416" height="327" class="aligncenter size-full wp-image-2351" /></a><br />
<a href="http://storagemojo.com/wp-content/uploads//2011/04/megastore_avg_latencies.png"><img src="http://storagemojo.com/wp-content/uploads//2011/04/megastore_avg_latencies.png" alt="" title="megastore_avg_latencies" width="418" height="343" class="aligncenter size-full wp-image-2352" /></a></p>
<p>The high availability of the system architecture creates a nice-to-have problem: small transient errors on top of persistent uncorrected problems can cause much larger problems. </p>
<p>Fault tolerance makes finding underlying faults more difficult. The price of fault tolerance is eternal vigilance.</p>
<p>As the architecture diagram suggests Megastore doesn&#8217;t manage BigTable. Developers  must optimize the storage for their app.</p>
<p><strong>The StorageMojo take</strong><br />
As Brewer&#8217;s <a href="http://en.wikipedia.org/wiki/CAP_theorem" target="_blank">CAP theorem</a> showed, a distributed system can&#8217;t provide consistency, availability and partition tolerance to all nodes at the same time. But this paper shows that by making smart choices we can get darn close as far as human users are concerned.</p>
<p>If Microsoft Office &#8211; or an open-source analog &#8211; could plug into a productized version of Megastore this could become popular for private cloud implementations: LAN performance in the office and global availability on the road. What&#8217;s not to like?</p>
<p>But whether that happens or not, the paper demonstrates again the value of Internet scale infrastructure thinking. Enterprise vendors would never have developed Megastore, but now that we&#8217;ve seen it work we can begin applying its principles to smaller scale problems.</p>
<p><strong>Courteous comments welcome, of course.</strong>  If this overview intrigues I urge you to read the entire paper as there are some interesting pieces I&#8217;ve left out.</p>
<div class="twttr_button">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2011/04/20/googles-megastore/&text=Google's Megastore" target="_blank" title="Click here if you liked this article.">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2011/04/20/googles-megastore/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>A local sandbox for cloud storage</title>
		<link>http://storagemojo.com/2010/12/03/a-local-sandbox-for-cloud-storage/</link>
		<comments>http://storagemojo.com/2010/12/03/a-local-sandbox-for-cloud-storage/#comments</comments>
		<pubDate>Fri, 03 Dec 2010 16:20:10 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Cloud computing & storage]]></category>
		<category><![CDATA[Enterprise]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2218</guid>
		<description><![CDATA[Talked to a startup the other day that looks interesting &#8211; Zettar. It wasn&#8217;t the name that caught my attention. Object storage is big in clouds. But objects aren&#8217;t compatible with standard apps: Powerpoint expects files, not objects. Cloudstores like Amazon and Azure aren&#8217;t compatible with each other. Lock-in. And you can&#8217;t run an internal [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>Talked to a startup the other day that looks interesting &#8211; <a href="http://www.zettar.com/zettar/" target="_blank">Zettar</a>. It wasn&#8217;t the name that caught my attention.</p>
<p>Object storage is big in clouds. But objects aren&#8217;t compatible with standard apps: Powerpoint expects files, not objects.</p>
<p>Cloudstores like Amazon and Azure aren&#8217;t compatible with each other. Lock-in. </p>
<p>And you can&#8217;t run an internal cloud that is compatible with them either &#8211; they won&#8217;t sell you their software. </p>
<p>Double-secret lock-in. </p>
<p>Zettar founder Chin Fang&#8217;s idea is 2-fold.</p>
<ul>
<li>A virtual file system that front-ends several cloud services.</li>
<li>Software that duplicates server-side cloud storage services.</li>
</ul>
<p><strong>The pitch:</strong></p>
<blockquote><p>
The Zettar ZCloud Virtual Appliance (ZCloud) enables you to setup an Amazon S3 sandbox instantly on any computer, even a tiny netbook. Developers and QA engineers can use it to prototype, analyze, test, and stage a cloud application locally, before rolling it out onto Amazon Web Services (AWS). Thus, ZCloud can improve, simplify and speed up your development, and minimize AWS S3 development costs.
</p></blockquote>
<p><strong>The StorageMojo take</strong><br />
Seems like a good idea. And Chin&#8217;s team did a good job on their web site.</p>
<p>Domesticating cloud storage will be a continuing process for the next 10 years. Zettar&#8217;s Z-cloud is a good start.</p>
<p>If you try out their software please tell us how it goes, I&#8217;m sure others would be interested.</p>
<p><strong>Courteous comments welcome, of course.</strong> Don&#8217;t confuse Zettar with cloud NAS provider <a href="http://www.zetta.net/index.php" target="_blank">Zetta</a> a cloud enterprise NAS provider who just raised another $11.5 million. I like them too.</p>
<div class="twttr_button">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2010/12/03/a-local-sandbox-for-cloud-storage/&text=A local sandbox for cloud storage" target="_blank" title="Click here if you liked this article.">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2010/12/03/a-local-sandbox-for-cloud-storage/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Objectively speaking: the future of objects</title>
		<link>http://storagemojo.com/2010/10/18/objectively-speaking-the-future-of-objects/</link>
		<comments>http://storagemojo.com/2010/10/18/objectively-speaking-the-future-of-objects/#comments</comments>
		<pubDate>Mon, 18 Oct 2010 23:08:20 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Cloud computing & storage]]></category>
		<category><![CDATA[Enterprise]]></category>
		<category><![CDATA[Future Tech]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2184</guid>
		<description><![CDATA[One infrastructure to rule them all discussed the emerging enterprise need for a single, scalable file storage infrastructure. But what infrastructure? Some background to this is last year&#8217;s Cloud Quadrant and this year&#8217;s Why private clouds are part of the future. Block and file For decades direct-attached block-based storage was the only option. The &#8217;80s [...]]]></description>
			<content:encoded><![CDATA[<p></p><p><a href="http://storagemojo.com/2010/09/10/one-infrastructure-to-rule-them-all/" target="_blank">One infrastructure to rule them all</a> discussed the emerging enterprise need for a single, scalable file storage infrastructure. But what infrastructure?</p>
<p>Some background to this is last year&#8217;s <a href="http://storagemojo.com/2009/09/28/the-cloud-quadrant/" target="_blank">Cloud Quadrant</a> and this year&#8217;s <a href="http://storagemojo.com/2010/02/05/why-private-clouds-are-part-of-the-future/" target="_blank">Why private clouds are part of the future</a>. </p>
<p><a href="http://storagemojo.com/wp-content/uploads//2010/10/Cloud-quadrant-plain-diagram.jpg"><img src="http://storagemojo.com/wp-content/uploads//2010/10/Cloud-quadrant-plain-diagram.jpg" alt="" title="Cloud quadrant plain diagram" width="480" height="464" class="aligncenter size-full wp-image-2188" /></a></p>
<p><strong>Block and file</strong><br />
For decades direct-attached block-based storage was the only option. The &#8217;80s introduced file-based storage. Much of storage systems growth in the last 15 years has been in file servers.</p>
<p>New systems, be they video, sensor or social, are producing massive collections of files at an accelerating rate. The rapid development of lower cost mobile computing devices – smartphones, iPad&#8217;s, netbooks and Android tablets – mean that content consumption and production will be a major source of file growth. The long tail of content demand means that the variety of online content will grow &#8211; especially as the cost of storage declines.</p>
<p><strong>Private cloud</strong><br />
The larger issue is the need to keep this fast-growing information online for years, despite rapid change in the underlying storage, network and computing infrastructures. File data must become independent of our storage and server choices. </p>
<p>As stores grow data migration becomes less feasible. Rip &#8216;n replace gives way to in-place upgrades. </p>
<p>Achieving <i>that</i> means moving to an object storage paradigm. How do we know this will happen? Because it already has. </p>
<p>Object stores at Google and Amazon Web Services are already among the largest storage infrastructures in the world. AWS alone stores over 100 billion objects today. Hundreds of millions of people use object storage every day &#8211; and don&#8217;t even know it.</p>
<p><strong>What is object storage? </strong><br />
Object storage instantiations vary in detail and supported features. However, all object storage has two key characteristics:<br />
	–Individual objects are accessed by a global handle. The handle may, for example, be a hash, a key or a something like a URL.<br />
	–Extended metadata. The extended metadata content goes beyond that of traditional file systems and may include additional security and content validation as well as presentation, decompression or other information relating to the content, production or value of the enclosed file.</p>
<p>Like files, objects contain data. But they lack key features that would make them files. They don’t have:<br />
	-Hierarchy. Not only are all objects created equal, they all remain at the same level. You can’t put one object inside another.<br />
	-Names. At least, not human-type names like Claudia_Schiffer or 2006_Taxes.</p>
<p>A user-facing component provides those missing elements. You decide which files belong in which folders. You give the files names. You decide which users have access to which files and what those users can do with those files. </p>
<p>Those choices are embedded in the object metadata so they can be presented as you have organized them. But if you have the object&#8217;s handle you can access it directly.</p>
<p>All objects look alike. Some are bigger and some are smaller, but until we get them dressed and named, they aren’t files. Yet they are a lot closer to files than blocks are. Which means that if you choose to manage objects you no longer have to worry about blocks.</p>
<p>Essentially then, objects are files with an address &#8211; instead of a pathname &#8211; and extra metadata. Unlike distributed file systems &#8211; where the metadata is stored in a metadata server. The metadata server keeps track the location of the data on the storage servers.</p>
<p>Some file storage systems are built on object storage repositories. Legacy APIs make it a  requirement for many applications, but URL-style access through HTTP is more flexible in the long run.</p>
<p><strong>Crossing the implementation chasm</strong><br />
While the economics of objects are obvious at scale, they are less compelling at the beginning of a typical enterprise project. It is easier to buy another file server than to worry about long-term architecture. </p>
<p>Here&#8217;s a rough diagram of the relative scalability of storage options:</p>
<p><a href="http://storagemojo.com/wp-content/uploads//2010/10/cloud_quadrant_object.jpg"><img src="http://storagemojo.com/wp-content/uploads//2010/10/cloud_quadrant_object.jpg" alt="" title="cloud_quadrant_object" width="485" height="452" class="aligncenter size-full wp-image-2190" /></a></p>
<p>When under-12-month paybacks are expected, who will buy an object storage infrastructure? The simple answer is that as object stores become better known and startup costs are reduced, more companies will buy them. Archives will be the first market. The longer answer is that as public cloud projects are brought inside, object stores will receive them. </p>
<p><strong>The StorageMojo take</strong><br />
As organizations amass large file collections, the economies of scale and management for object storage will become apparent. Savvy architects will add commodity-based scale-out object storage to their tool kit. </p>
<p>HDS, NetApp and HP have recently added modern object stores to their product lines. And rumor has it EMC will too, either by getting Atmos to work or by buying Isilon. </p>
<p><strong>Courteous comments welcome, of course.</strong> Still don&#8217;t like the name object, but I&#8217;ll get over it.</p>
<div class="twttr_button">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2010/10/18/objectively-speaking-the-future-of-objects/&text=Objectively speaking: the future of objects" target="_blank" title="Click here if you liked this article.">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2010/10/18/objectively-speaking-the-future-of-objects/feed/</wfw:commentRss>
		<slash:comments>22</slash:comments>
		</item>
		<item>
		<title>Calling all grad students</title>
		<link>http://storagemojo.com/2010/10/04/calling-all-grad-students/</link>
		<comments>http://storagemojo.com/2010/10/04/calling-all-grad-students/#comments</comments>
		<pubDate>Mon, 04 Oct 2010 12:20:12 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Cloud computing & storage]]></category>
		<category><![CDATA[Future Tech]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2160</guid>
		<description><![CDATA[The friendly folks at Scality have put up $100,000 to encourage open source development of useful cloud storage bits. It&#8217;s open to anyone, not just grad students. Yup, it&#8217;s corporate self-interest at work &#8211; Scality sells object-based cloud storage software &#8211; but they&#8217;re taking an enlightened approach. The resulting code will be open source and [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>The friendly folks at <a href="http://www.scality.com/" target="_blank">Scality</a> have put up $100,000 to encourage open source development of useful cloud storage bits. It&#8217;s open to anyone, not just grad students.</p>
<p>Yup, it&#8217;s corporate self-interest at work &#8211; Scality sells object-based cloud storage software &#8211; but they&#8217;re taking an enlightened approach. The resulting code will be open source and is intended to work with a variety of object-based cloud storage services &#8211; such as S3 &#8211; not just theirs. </p>
<p><strong>Real money</strong><br />
The defined projects have bonuses of $2,000 to $10k.  The projects include a <a href="http://scop.scality.com/2010/09/gallery-3-sd_gallery.html" target="_blank">Gallery plugin</a> for the </p>
<blockquote><p>
. . . full replacement of the underlying filesystem based storage of content/objects with object storage using the REST interface . . . .
</p></blockquote>
<p>There&#8217;s a <a href="http://scop.scality.com/2010/09/wordpress-plugin.html" target="_blank">WordPress plugin</a> project to add an object storage backend to the popular CMD. The $10,000 prize goes for a <a href="http://scop.scality.com/2010/09/kvm-virtualization-storage-engine-sd_linuxkvm.html" target="_blank">KVM virtualization storage engine</a> for the Linux Kernel Volume Manager that provides:</p>
<blockquote><p>
. . . block level storage volumes that can be attached to KVM virtual machines. The solution should not require any central node, for example, no central meta-data server and provide a completely stateless operation model.
</p></blockquote>
<p>Here&#8217;s the <a href="http://scop.scality.com/scop-drops-bounty-list.html" target="_blank">list of defined projects</a>.</p>
<p><strong>Even better</strong><br />
And almost ⅔ of the money remains uncommitted. If you have an idea for domesticating object storage in the cloud &#8211; propose it.</p>
<p><strong>The StorageMojo take</strong><br />
Objects are the future of large-scale storage. If cutting edge stuff gets your heart pumping, this is a good place to start.</p>
<p>Or just collect some cash and move on. Your choice.</p>
<p>Feel free to ask questions in the comments. I&#8217;ll ping the Scality guys to get answers.</p>
<p><strong>Courteous comments welcome, of course.</strong> I&#8217;ve done some work for Scality and like the team. I&#8217;m also wondering why Amazon hasn&#8217;t done something like this.</p>
<div class="twttr_button">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2010/10/04/calling-all-grad-students/&text=Calling all grad students" target="_blank" title="Click here if you liked this article.">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2010/10/04/calling-all-grad-students/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Cloud&#8217;s app killer  </title>
		<link>http://storagemojo.com/2010/08/05/clouds-app-killer%e2%80%a8%e2%80%a8/</link>
		<comments>http://storagemojo.com/2010/08/05/clouds-app-killer%e2%80%a8%e2%80%a8/#comments</comments>
		<pubDate>Fri, 06 Aug 2010 04:23:35 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Cloud computing & storage]]></category>
		<category><![CDATA[Future Tech]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2105</guid>
		<description><![CDATA[Concall today with Bryan Cantrill, the smart guy behind Dtrace. Dtrace was the engine behind Sun&#8217;s Oracle&#8217;s Fishworks server and application monitor. Dtrace has also been incorporated into OS X. Bryan left Oracle last week and started Monday at Joyent the cloud infrastructure provider, as VP of engineering. Why? Bryan is an instrumentation geek. He [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>Concall today with Bryan Cantrill, the smart guy behind <a href="http://en.wikipedia.org/wiki/DTrace" target="_blank">Dtrace</a>. Dtrace was the engine behind <strike>Sun&#8217;s</strike> Oracle&#8217;s Fishworks server and application monitor. Dtrace has also been incorporated into OS X.</p>
<p>Bryan left Oracle last week and started Monday at <a href="http://www.joyent.com/" target="_blank">Joyent</a> the cloud infrastructure provider, as VP of engineering. Why?</p>
<p>Bryan is an instrumentation geek. He really wants to know what&#8217;s going on. Instrumentation in the cloud is the next big challenge.</p>
<p>That makes sense: there are so many moving parts that understanding and resolving performance and availability issues will be critical to the widespread adoption of cloud. </p>
<p><strong>Tech epiphanies</strong><br />
Bryan described 3 technology epiphanies that he&#8217;s enjoyed. The 1st was when he saw Java for the first time back in 1995. The 2nd was when he saw a Ruby on Rails video about deploying a web app.</p>
<p>His 3rd epiphany came recently when he saw something called node.js. Developed by Ryan Dahl it turns the JavaScript paradigm on its head: node.js runs on the server, not the client.</p>
<p><strong>Latency bubbles</strong><br />
We know that server I/O latency can kill performance. It&#8217;s even worse in the cloud.</p>
<p>A single bad drive can hose a server if the app is holding locks. What if you have a webpage that relies on five different Web services, or as many Amazon pages do, 150 services?</p>
<p>You need an infrastructure that is resilient in the face of long latency while maintaining high throughput. Bryan says that most failures are not hard failures but are latency bubbles that cascade out and lock up the rest of the infrastructure.</p>
<p>Ryan took Google&#8217;s of V8 JavaScript engine and extended it so you can handle long latency events. Without locking up the server.</p>
<p>Ryan does a fine job <a href="http://www.youtube.com/watch?v=F6k8lTrAE2g" target="_blank">introducing node.js</a> in a 1 hour Google Tech Talk last week. He outlined how to build a server that can handle 10,000 or more users. His goal with node.js was to make it easy to write high-performance servers.</p>
<p><a href="http://storagemojo.com/wp-content/uploads//2010/08/nodejs_architecture1.jpg"><img src="http://storagemojo.com/wp-content/uploads//2010/08/nodejs_architecture1.jpg" alt="" title="nodejs_architecture" width="470" height="404" class="aligncenter size-full wp-image-2113" /></a></p>
<p>There is an arms race out there for performance – Google, Apple, Mozilla, Opera, Microsoft – to win the hearts and eyeballs of hundreds of millions of consumers. Fickle consumers.</p>
<p>Node.js only exposes nonblocking asynchronous interfaces to the programmer. It has very few abstractions. Its power lies in the fact that it moves you away from certain interfaces like synchronous I/O that you shouldn&#8217;t do.</p>
<p>You don&#8217;t have to worry about some event completing and taking over while you&#8217;re in the middle of something else. Each node.js is a single thread. If you want to do more work you start multiple node.js instances and let the kernel do the load balancing.</p>
<p>Memory isolation is enforced at the process boundary. The kernel manages it, not the coder. That&#8217;s a good thing.</p>
<p><strong>The StorageMojo take</strong><br />
Latency is the app killer of the cloud. The current cloud focus on write once/read never apps reflects that.</p>
<p>The fight against latency proceeds on many fronts: storage; network; CPU; and software. <a href="http://www.asankya.com/" target="_blank">Asankya</a> and others have good ideas for reducing Internet latency. Flash architectures are undergoing rapid evolution. Multicore and multiprocessor servers are attacking throughput.</p>
<p>Node.js is a big step in the right direction. Removing the dependency is that synchronous I/O create means any more resilient and higher performance infrastructure. Ryan reports that a Japanese website is already running several hundred thousand users on node.js instances.</p>
<p>As for Bryan, he&#8217;ll bring the same intelligence and energy to Joyent that he brought to Dtrace and Fishworks. Expect more great things.</p>
<p><strong>Courteous comments welcome, of course.</strong> <strong>Update:</strong> The other smart guys behind Dtrace are the redoubtable <a href="http://blogs.sun.com/ahl/category/DTrace" target="_blank"> Adam Leventhal</a>and Mike Shapiro.</p>
<div class="twttr_button">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2010/08/05/clouds-app-killer%e2%80%a8%e2%80%a8/&text=Cloud's app killer  " target="_blank" title="Click here if you liked this article.">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2010/08/05/clouds-app-killer%e2%80%a8%e2%80%a8/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>A cloud app for the masses</title>
		<link>http://storagemojo.com/2010/07/16/a-cloud-app-for-the-masses/</link>
		<comments>http://storagemojo.com/2010/07/16/a-cloud-app-for-the-masses/#comments</comments>
		<pubDate>Fri, 16 Jul 2010 22:23:24 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Cloud computing & storage]]></category>
		<category><![CDATA[Future Tech]]></category>
		<category><![CDATA[Video]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2088</guid>
		<description><![CDATA[Cloud computing gets a bad rap because it can&#8217;t replace corporate data centers for mission critical apps. But new computing paradigms never do that: it is the new capabilities they enable that drive adoption. Case in point: transcoding. Why? Anyone who shoots video soon discovers that changing from, say, AVCHD to an editing-friendly codec and [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>Cloud computing gets a bad rap because it can&#8217;t replace corporate data centers for mission critical apps. But new computing paradigms never do that: it is the new capabilities they enable that drive adoption. Case in point: transcoding.</p>
<p><strong>Why?</strong><br />
Anyone who shoots video soon discovers that changing from, say, AVCHD to an editing-friendly codec and then to H.264 for distribution takes a lot of compute cycles. Conversion from one codec to another is called <i>transcoding</i>. It is the price we pay for high quality compressed content. </p>
<p>Compression and format conversion are necessary because highly compressed video &#8211; the kind most camcorders shoot &#8211; isn&#8217;t easy to edit. And the stuff that&#8217;s easy to edit has large files that chew up bandwidth and storage.</p>
<p>So we transcode. Add to that the number of formats we use &#8211; ranging from iPhones to flash to SD and 1080p &#8211; and transcoding is a major CPU cycle sink.</p>
<p>Fortunately, transcoding can be a highly parallel operation. A frame &#8211; or a series of frames &#8211; can be divided and split among multiple cores and CPUs.</p>
<p><strong>Where?</strong><br />
Where can you find a lot of CPUs for a quick job? Right, the cloud. Which is why there are a number of online services that front-end Amazon Web Services to provide transcoding.</p>
<p>I spoke to the CEO of startup <a href="http://zencoder.com/" target="_blank">Zencoder</a>, Jon Dahl to learn more.</p>
<p><strong>Zencoder</strong><br />
Zencoder is a transcoding service provider that uses Amazon as a cloud provider. The Zencoder team has developed transcoding infrastructure for several startups and finally decided to build a general-purpose service.</p>
<p>While they use open source software in their stack &#8211; as do most transcoding providers &#8211; their major value-add is in a high-performance scalable interface. Handling 100,000 concurrent transcodes is non-trivial.</p>
<p>They also look out for problems common in transcoding such as audio/video getting out of sync and aspect ratio distortion. They can transcode 1080p faster than real time. And they&#8217;ve licensed the proprietary formats as well.</p>
<p>Amazon offers Linux as a service and a file service. S3&#8242;s files are limited to 5 GB, but that isn&#8217;t a problem for Zencoder: customers can specify input and output locations, bypassing Amazon storage.</p>
<p>Also they don&#8217;t transcode Mac ProRes &#8211; Final Cut Pro&#8217;s preferred editing format &#8211; today. But they do handle QuickTime movies. </p>
<p><strong>The StorageMojo take</strong><br />
So the glass house doesn&#8217;t want to outsource cloud infrastructure. Who cares? They&#8217;re the last to adopt new technology anyway.</p>
<p>It is apps like transcoding that drive the business. In 5 years much, perhaps most, transcoding will be cloud-based.</p>
<p>Before the digital video craze in the last 5 years there wasn&#8217;t much demand for transcoding. But today, with HD video smartphones, millions are producing videos that they want to share and save.</p>
<p>Your smartphone won&#8217;t have the cycles to do it, but the cloud does. Expect transcoding vendors to add new features, such as noise-reduction or sharpening.</p>
<p>Business units are discovering the power of short videos to inform, train, persuade and excite. All at a fraction of the cost of 4-color brochures.</p>
<p>The outlook for storage vendors is mixed. Yes, much more storage will be sold &#8211; but cost-conscious cloud managers will be buying it. And as more new services develop on the cloud, consumers will be as hazy about &#8220;local&#8221; and &#8220;cloud&#8221; as they are about &#8220;memory&#8221; and &#8220;disk&#8221; today. Branding nightmare, but that&#8217;s where those petabytes will be.</p>
<p><strong>Courteous comments welcome, of course.</strong> </p>
<div class="twttr_button">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2010/07/16/a-cloud-app-for-the-masses/&text=A cloud app for the masses" target="_blank" title="Click here if you liked this article.">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2010/07/16/a-cloud-app-for-the-masses/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
	</channel>
</rss>

