<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>StorageMojo &#187; Architecture</title>
	<atom:link href="http://storagemojo.com/category/architecture/feed/" rel="self" type="application/rss+xml" />
	<link>http://storagemojo.com</link>
	<description>Data storage info &#38; analysis</description>
	<lastBuildDate>Fri, 20 Jan 2012 06:10:36 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Learning from customers</title>
		<link>http://storagemojo.com/2011/12/07/learning-from-customers/</link>
		<comments>http://storagemojo.com/2011/12/07/learning-from-customers/#comments</comments>
		<pubDate>Wed, 07 Dec 2011 20:05:07 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Enterprise]]></category>
		<category><![CDATA[SSD/Flash Disk]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2563</guid>
		<description><![CDATA[EMC&#8217;s Chuck Hollis blogged about The Vendor Beating a couple of months ago. The unspoken question in the post is &#8220;how do we understand what customers are telling us?&#8221; He writes As an employee of a large IT vendor, I&#8217;ve been at the receiving end of a reasonable number of vendor beatings. Occasionally it&#8217;s richly [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>EMC&#8217;s Chuck Hollis <a href="http://chucksblog.emc.com/chucks_blog/2011/09/the-vendor-beating.html" target="_blank">blogged</a> about <i>The Vendor Beating</i> a couple of months ago. The unspoken question in the post is &#8220;how do we understand what customers are telling us?&#8221;</p>
<p>He writes</p>
<blockquote><p>
As an employee of a large IT vendor, I&#8217;ve been at the receiving end of a reasonable number of vendor beatings.</p>
<p><i>Occasionally it&#8217;s richly deserved</i>. But, sometimes, it&#8217;s masking a deeper set of issues that have very little to do any vendor whatsoever.
</p></blockquote>
<p>Unhappy customers, like unhappy families, are all unhappy in their own way. This customer appeared to be overstaffed, under-skilled and poorly managed.</p>
<p><strong>Interpretation</strong><br />
Interpreting customer complaints and behavior is hard. When companies can&#8217;t decipher what customers want &#8211; which is usually what the company <i>isn&#8217;t</i> selling &#8211; it is easy and dangerous to tune them out. </p>
<p>Customers can tell you things about your company and products that you can&#8217;t directly discover for yourself, but what customers say may be different from what they think. And both are influenced by the customer&#8217;s context, which can include company politics, prior vendor experiences, knowledge deficits and employee level.</p>
<p><strong>Diagnosis</strong><br />
Steve Jobs once said that customers don&#8217;t know what they want until you show it to them. Customers know what would improve the current product in the current use case, but they can&#8217;t imagine bringing multiple novel technologies to bear on a much broader problem.</p>
<p>Tablet computers flopped for years until the iPad crystalized the market. Everyone saw the tablet problems: thick; heavy; slow; clunky UI; poor battery life; and, thanks to low volumes, cost. Incremental improvements &#8211; faster processors, more RAM, larger disks &#8211; didn&#8217;t help.</p>
<p>Tablets required a deep rethinking and application of several novel technologies &#8211; flash, gestures, CNC case milling, an app store and an energy-efficient OS &#8211; to create a compelling user experience. </p>
<p>The iPad illustrates the problem of listening to customers: they described symptoms and suggest fixes, but couldn&#8217;t articulate the underlying problem: how the use case differs from desktop and notebook PCs. That requires an act of imagination, not transcription.</p>
<p><strong>The StorageMojo take</strong><br />
In Chuck&#8217;s post an EMC presales engineer identified the root cause of the customer&#8217;s pain:</p>
<blockquote><p>
. . . the database environment had grown willy-nilly over the years &#8212; it wasn&#8217;t laid out well, the queries weren&#8217;t particularly well written, and so on.</p>
<p>Sure, there were things we could do on the storage side (e.g. faster storage, better layouts, etc.), but it was a bigger issue than just storage performance.
</p></blockquote>
<p>But the larger question is: with high-speed and high-capacity SSDs, why isn&#8217;t this customer moving to an infrastructure that doesn&#8217;t need this fancy tuning? EMC can&#8217;t manage the fight between DBAs and storage admins, but they could be making it less contentious.</p>
<p>From within the EMC ecosystem the solution is clear: more training, professional services and faster gear. But from the outside the question is: who is building &#8220;it just works&#8221; high performance storage? </p>
<p><strong>Courteous comments welcome, of course.</strong> I admire Tucci&#8217;s innovative EMC business model: outbid everyone else for chasm-crossing companies; give them global distribution and support; and watch the bucks roll in. It may not be innovative <i>technically</i> but it is innovative.</p>
<div style="clear:both;margin-bottom:5px;">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2011/12/07/learning-from-customers/&text=Learning from customers" target="_blank" title="Click here if you liked this article">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2011/12/07/learning-from-customers/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>How fault tolerant are SANs?</title>
		<link>http://storagemojo.com/2011/11/07/how-fault-tolerant-are-sans/</link>
		<comments>http://storagemojo.com/2011/11/07/how-fault-tolerant-are-sans/#comments</comments>
		<pubDate>Mon, 07 Nov 2011 16:11:36 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Enterprise]]></category>
		<category><![CDATA[SAN, FC]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2548</guid>
		<description><![CDATA[Reader Kyle asks a good question: SANs are advertised up the wazoo as having lots of internal redundancy such as redundant power, redundant controllers, etc. I&#8217;ve spent enough time with redundancy to know that having two pieces of hardware often doesn&#8217;t cut it. I was wondering what the real story is from someone who has [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>Reader Kyle asks a good question:</p>
<blockquote><p>
SANs are advertised up the wazoo as having lots of internal redundancy such as redundant power, redundant controllers, etc. I&#8217;ve spent enough time with redundancy to know that having two pieces of hardware often doesn&#8217;t cut it. I was wondering what the real story is from someone who has spent a lot of time in the storage space. Do complete SAN failures really pretty much *never* happen or are they just on the rare side? If so what are the common points of failure? Perhaps people, the OS, non-redundant hardware parts?
</p></blockquote>
<p>Please, SAN folks, tell StorageMojo readers your experience. In the meantime, here&#8217;s </p>
<p><strong>The StorageMojo take</strong><br />
Kyle asks 2 questions: how reliable and available are the individual <i>devices</i> that make up a SAN and how reliable and available is the <i>system</i> &#8211; the SAN as a whole.</p>
<p>Redundancy is aimed at ensuring availability. Because of the redundancy&#8217;s greater component count you also have more failures. </p>
<p>Failures of redundant components shouldn&#8217;t affect availability &#8211; assuming, that is, that failures are not correlated. That assumption turned out not to be true of RAID arrays, making them less available than advertised.</p>
<p>How much redundancy is enough? Customers generally prefer triple redundancy if they can afford it, partly for availability and partly for performance: losing ⅓rd of system performance is less painful than ½. But for the moonshots, NASA chose quintuple redundancy on critical systems.</p>
<p>Yet I&#8217;d guess that most are more concerned about SAN <i>system</i> availability &#8211; which includes not only what we consider SAN hardware, but also the server-side HBAs, drivers and management software. It is here that the nastiest bugs lurk: untestable interactions between applications, drivers, firmware and architecture that bite us hard &#8211; and crash entire SANs.</p>
<p>But what is <i>your</i> experience, gentle reader? Many of us would like to know. </p>
<p><strong>Courteous comments welcome, of course.</strong> <strong>Update</strong>: Bayesian analysis is the best tool to evaluate system-level availability, as noted in this <a href="http://www.youtube.com/watch?v=UzTlN5qwzco" target="_blank">StorageMojo video</a>. Sadly, the tool referred to is no longer online. Anyone want to take a whack at a new one?</p>
<div style="clear:both;margin-bottom:5px;">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2011/11/07/how-fault-tolerant-are-sans/&text=How fault tolerant are SANs? " target="_blank" title="Click here if you liked this article">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2011/11/07/how-fault-tolerant-are-sans/feed/</wfw:commentRss>
		<slash:comments>25</slash:comments>
		</item>
		<item>
		<title>Ask StorageMojo: 80,000 mailboxes need help</title>
		<link>http://storagemojo.com/2011/11/02/ask-storagemojo-80000-mailboxes-need-help/</link>
		<comments>http://storagemojo.com/2011/11/02/ask-storagemojo-80000-mailboxes-need-help/#comments</comments>
		<pubDate>Wed, 02 Nov 2011 16:00:28 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Enterprise]]></category>
		<category><![CDATA[NAS, IP, iSCSI]]></category>
		<category><![CDATA[SSD/Flash Disk]]></category>
		<category><![CDATA[Virtualization]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2543</guid>
		<description><![CDATA[A StorageMojo reader has a problem. Can you help? Our mail hub (80,000+ mailboxes) is virtualized with vSphere 4.1 with Red Hat Enterprise Linux 5 x64 and Dovecot 2.0 [an open source IMAP/POP3 email server for Linux/UNIX-like systems]. We are using HP LeftHand Networks P4300 iSCSI storage in a &#8220;network RAID10 setup of RAID10 storage&#8221; [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>A StorageMojo reader has a problem. Can you help?</p>
<blockquote><p>
Our mail hub (80,000+ mailboxes) is virtualized with vSphere 4.1 with Red Hat Enterprise Linux 5 x64 and <a href="http://dovecot.org/index.html" target="_blank">Dovecot 2.0</a> [an open source IMAP/POP3 email server for Linux/UNIX-like systems]. We are using HP LeftHand Networks P4300 iSCSI storage in a &#8220;network RAID10 setup of RAID10 storage&#8221; for Dovecot indexes and multiple &#8220;networks RAID1 of RAID5 storage&#8221; for actual mailboxes.</p>
<p>This is my take: our Dovecot indexes are getting hammered with lots of small I/O requests, about 8,000 IOPS continuous during 8-working-hour days, 75% write. Indexes are fairly small (50 GB) and expected to grow to 100-150 GB, but need a lot of random I/O. We need real-time replication in storage (LeftHand is ok for us) and we think that SSD should shine in this situation. Bandwidth is not a problem (200-300 megabits of indexes traffic, but we need more IOPs).</p>
<p>The problem is the indexes, but our total mailbox capacity is expected to grow to 6 TB compressed using zlib compression in Dovecot.</p>
<p>We want to buy a storage appliance with the following requirements:</p>
<ul>
<li>Vsphere 4.1 &#038; 5 certified storage, VAAI enabled (if possible)</li>
<li>iSCSI (1 gbps)</li>
<li>High number of IOPS (at least 12,000+, most of them writes)</li>
<li>Small size (200 GB)</li>
<li>Fault tolerant (RAID, battery-backed write cache, power supply, fans, multiple gigabit uplinks, synchronous replication)</li>
<li>Cheap (less than $30k the full setup)</li>
</ul>
<p>We want to buy at the beginning of 2012. Any product that fits?
</p></blockquote>
<p><strong>The StorageMojo take</strong><br />
Suspect price will be the most significant limiter. But the respondent only needs index storage not the whole shooting match. He&#8217;s pretty happy with LeftHand for mailbox storage.</p>
<p>But if we can solve both problems for him, why not? If he should relax some constraint, feel free to suggest it.</p>
<p>He&#8217;ll be watching the comments, so if you have questions please ask them. I&#8217;ll be following the comments as well.</p>
<p><strong>Courteous comments welcome, of course.</strong> His email was edited for clarity.</p>
<div style="clear:both;margin-bottom:5px;">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2011/11/02/ask-storagemojo-80000-mailboxes-need-help/&text=Ask StorageMojo: 80,000 mailboxes need help " target="_blank" title="Click here if you liked this article">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2011/11/02/ask-storagemojo-80000-mailboxes-need-help/feed/</wfw:commentRss>
		<slash:comments>47</slash:comments>
		</item>
		<item>
		<title>The network is choking our storage</title>
		<link>http://storagemojo.com/2011/10/20/the-network-is-choking-our-storage/</link>
		<comments>http://storagemojo.com/2011/10/20/the-network-is-choking-our-storage/#comments</comments>
		<pubDate>Thu, 20 Oct 2011 17:03:08 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Cloud computing & storage]]></category>
		<category><![CDATA[Clusters]]></category>
		<category><![CDATA[Future Tech]]></category>
		<category><![CDATA[SAN, FC]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2533</guid>
		<description><![CDATA[Amazon Web Services architect James Hamilton has been posting on network issues for over a year and researching them much longer. As Ethernet becomes the de facto SAN technology, his views become more relevant to the larger storage market. Critique Part of Mr. Hamilton&#8217;s concern is the structure of the networking industry: the high margins; [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>Amazon Web Services architect James Hamilton has been <a href="http://perspectives.mvdirona.com/2011/10/01/ChangesInNetworkingSystems.aspx" target="_blank">posting</a> on network issues for over a year and researching them much longer. As Ethernet becomes the <i>de facto</i> SAN technology, his views become more relevant to the larger storage market.</p>
<p><strong>Critique</strong><br />
Part of Mr. Hamilton&#8217;s concern is the structure of the networking industry: the high margins; the dominance of a single player, Cisco; the closed technology; and the heavy vertical integration. All antithetical to the dynamics that have driven server costs down so successfully in the last 20 years.</p>
<p>These are issues the storage industry knows too well. But Mr. Hamilton is more concerned about the waste the current high-cost industry structure causes.</p>
<p>Waste?</p>
<p><strong>Workload placement</strong><br />
The cost of network bandwidth leads to network over-subscription. Networks are configured as tree topologies: the further you move from end nodes the worse the over subscription. </p>
<p>As described in the 2009 Microsoft Research paper <a href="http://research.microsoft.com/pubs/80693/vl2-sigcomm09-final.pdf" target="_blank">VL2: A Scalable and Flexible Data Center Network</a>:</p>
<blockquote><p>
. . . the capacity between different branches of the tree is typically over- subscribed by factors of 1:5 or more, with paths through the highest levels of the tree oversubscribed by factors of 1:80 to 1:240. This limits communication between servers to the point that it fragments the server pool — congestion and computation hot-spots are prevalent even when spare capacity is available elsewhere.
</p></blockquote>
<p>This throttles data center performance by limiting server-to-server bandwidth, fragmenting resources and reducing network utilization. The latter reflects the redundant paths needed in case of switch failure: ≈50% or more of costly data center bandwidth goes unused.</p>
<p>As might be expected, big Internet data centers like Amazon&#8217;s have complex and unpredictable workloads. They need lots of bandwidth between all servers all the time.</p>
<p><strong>A solution</strong><br />
The VL2 paper describes an experimental solution to these problems that includes <i>location-specific</i> and <i>application-specific</i> addressing, multi-path traffic load balancing and a novel directory design that efficiently handles lookups and updates to network mappings.</p>
<p>In an 75-node test cluster the design moved 2.75TB of data in 395 seconds &#8211; 94% of maximum network bandwidth &#8211; at a fraction of the cost of current enterprise networks. The paper calculates that a cloud-service scale network with no over-subscription could be built with commodity switches at <strong>1/14th the cost</strong> of a traditional data center Ethernet.</p>
<p>Whoa!</p>
<p><strong>The StorageMojo take</strong><br />
VC and engineering dollars follow high-growth markets. What Google, Amazon and Microsoft want, they get. With the rapid growth of public cloud services the network over-subscription problem will get solved. </p>
<p>Merchant silicon from Broadcom, Intel and Marvell is making a tried-and-true Moore&#8217;s Law attack on hardware cost. The protocol stack is tougher, but several open-source industry initiatives are under way with support from major companies. Progress will be slower than hoped, but within 3 years we&#8217;ll have a viable stack to build on.</p>
<p>Where does this leave the networking industry? That depends on where you sit.</p>
<p>Cisco will be the biggest loser, because they&#8217;ve been the biggest winner with the current model. They may need to pull an IBM and move big into services if they want to stick around. Ironically, Cisco&#8217;s UCS product line &#8211; which bakes in the tree-structured network &#8211; has further motivated broader industry action.</p>
<p>The rest of the industry can go after this emerging market with a lower-GM business model. Not all of them will, but it will be a critical success factor. </p>
<p>The big winner will be storage. Scale-out storage relies on spraying data across multiple racks for maximum availability, utilization and performance. Cheaper, faster, better scale-out networks will only drive storage demand.</p>
<p>For most of us this is an academic problem today. Lightly used systems &#8211; such as for backup and archiving &#8211; don&#8217;t see Amazon&#8217;s problems. But in 5 years this will be common even outside the public cloud providers.</p>
<p>Just as IT users have benefited from Google&#8217;s push on energy efficiency and much more, they will also benefit from much lower cost and more scalable networks.</p>
<p><strong>Courteous comments welcome, of course.</strong> I can&#8217;t help but continue to marvel at how dumb Cisco&#8217;s UCS has turned out to be. It&#8217;s a gift that keeps on giving.</p>
<div style="clear:both;margin-bottom:5px;">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2011/10/20/the-network-is-choking-our-storage/&text=The network is choking our storage " target="_blank" title="Click here if you liked this article">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2011/10/20/the-network-is-choking-our-storage/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>RAMCloud is the new flash</title>
		<link>http://storagemojo.com/2011/10/05/ramcloud-is-the-new-flash/</link>
		<comments>http://storagemojo.com/2011/10/05/ramcloud-is-the-new-flash/#comments</comments>
		<pubDate>Thu, 06 Oct 2011 01:03:30 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Clusters]]></category>
		<category><![CDATA[SSD/Flash Disk]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2529</guid>
		<description><![CDATA[Sometimes in the midst of the endless tweaking needed to maximize storage performance one just wants to say &#8220;screw it! Put everything in RAM!&#8221; And that&#8217;s just what RAMCloud does. Disk is the new tape, flash the new disk, DRAM the new flash. RAMCloud is a research paper (pdf) and an open software project. The [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>Sometimes in the midst of the endless tweaking needed to maximize storage performance one just wants to say &#8220;screw it! Put everything in RAM!&#8221; And that&#8217;s just what RAMCloud does.</p>
<p><strong> Disk is the new tape, flash the new disk, DRAM the new flash.</strong><br />
RAMCloud is a <a href="http://www.stanford.edu/~ouster/cgi-bin/papers/ramcloud.pdf" target="_blank">research paper</a> (pdf) and an <a href="http://fiz.stanford.edu:8081/display/ramcloud/Home" target="_blank">open software project</a>. The goal is enterprise-class availability with every bit of active data stored in DRAM, not disk or flash, for maximum performance. It is a key-value object store today, though as pure software that could change.</p>
<p>It&#8217;s the brainchild of John Ousterhout, a Stanford prof who invented Tcl back in the 80s at Berkeley. </p>
<p><strong>Isn&#8217;t DRAM volatile and costly?</strong><br />
Right on both counts, grasshopper, so RAMCloud isn&#8217;t a 1 for 1 disk-style architecture. No Google FS-style triple replication here, or RAID-style erasure coding.</p>
<p>Instead RAMCloud uses <i>buffered logging</i>:</p>
<blockquote><p>
. . . a single copy of each object is stored in DRAM of a primary server and copies are kept on the disks of two or more backup servers; each server acts as both primary and backup. However, the disk copies are not updated synchronously during write operations. Instead, the primary server updates its DRAM and forwards log entries to the backup servers, where they are stored temporarily in DRAM.
</p></blockquote>
<p>Instead of working around crashes &#8211; using multiple object copies as scale-out storage does &#8211; RAMCloud recovers lost data from the DRAM logs or disk drives to replicate the lost data at high speed. That&#8217;s possible because all the log data is in DRAM or spread across many disks. </p>
<p>In a recent paper (<a href="http://www.stanford.edu/~ouster/cgi-bin/papers/ramcloud-recovery.pdf" target="_blank">Fast Crash Recovery in  RAMCloud</a>) (pdf) Diego Ongaro, Stephen M. Rumble, Ryan Stutsman, John Ousterhout, and Mendel Rosenblum (co-founder of VMware) go into more detail on this critical feature. </p>
<p>The key elements are:</p>
<ul>
<li><strong>Scale.</strong> Servers scatter their backup data across all other servers so thousands of disks can serve the recovery.</li>
<li><strong>Log-structure. </strong> Reduces complexity and offers high performance.</li>
<li><strong>Randomization.</strong> Many decisions need to be made in a large cluster. Rather than CPU, time and bandwidth consuming determinism, injecting randomization speeds decisions with less overhead.</li>
<li><strong>Dynamic tablets.</strong> The key-value store tracks resource usage within a single table and ensures that no single partition is too large for fast restores.</li>
</ul>
<p>DRAM is volatile so the log replication data is spread to other servers on other racks for redundancy before being committed to disk. Still, total system write throughput is limited by the disk write speed, whose limits are a key reason people are moving from disks. Flash drives may help, but other techniques, such as log truncation and sharding make it possible to get good performance from several thousand SATA drives.</p>
<p>How good? The team reports that in a 60 node cluster they recover 35GB in 1.6 seconds. With more nodes larger partitions should be restored even faster. Scale is good.</p>
<p><strong>Lights out!</strong><br />
Power failures wipe all the data in DRAM. The obvious defense is to avoid failures: combine battery backup with diesel generator sets. Power ride-through will handle interruptions into the hundreds of milliseconds.</p>
<p>But who is going to trust that? That&#8217;s why future commercial implementations will insist on logging to stable storage, such as the flash SSDs.</p>
<p>They&#8217;re getting cheaper fast &#8211; faster than DRAM &#8211; which will make this a common approach. </p>
<p><strong>Cost</strong><br />
Professor Ousterhout kindly sent a short note about cost, correctly noting that</p>
<blockquote><p>
. . . if you measure cost/operation, DRAM is roughly 100x cheaper than disk, since a disk can only perform about 100-200 operations/second.  This is why RAMCloud makes sense for data-intensive applications. . . .
</p></blockquote>
<p>While you and I might find that persuasive, too many enterprises don&#8217;t. The deep conservatism of the storage culture &#8211; both figuratively and literally &#8211; makes cost a good excuse to stay with the tried and true, and easy to explain to CFOs. </p>
<p>The good news for the company I hope he is starting is that the primacy of $/GB is slowly eroding as customers see the system level savings from fast storage. SSD vendors and companies like TMS and Kaminario are breaking trail for RAMCloud.</p>
<p><strong>The StorageMojo take</strong><br />
Make no mistake: RAMCloud is a research project, not a commercial product, years and million$ away from commercial application. But the concept is promising.</p>
<p>Imagine a world where data layout doesn&#8217;t matter, where apps are optimized for sub-millisecond storage, where 100 byte I/Os are faster and just as efficient as 8KB I/Os. The architectural implications are huge and would take a decade or more to absorb.</p>
<p>RAMCloud raises the thorny issue of tiering: getting hot data on the hot storage and everything else off to disk. There are OK answers for tiering but nothing insanely great. </p>
<p>RAMCloud shows we&#8217;re far from the end of the line in what storage can do. Faster, better, arguably cheaper: 2 out of 3 ain&#8217;t bad.</p>
<p><strong>Courteous comments welcome, of course.</strong> A shorter version of this post appeared on <a href="http://www.zdnet.com/blog/storage/ramcloud-puts-everything-in-dram/1546" target="_blank">ZDNet</a>.</p>
<div style="clear:both;margin-bottom:5px;">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2011/10/05/ramcloud-is-the-new-flash/&text=RAMCloud is the new flash" target="_blank" title="Click here if you liked this article">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2011/10/05/ramcloud-is-the-new-flash/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>NoSQL in the metadata engine room</title>
		<link>http://storagemojo.com/2011/10/03/nosql-in-the-metadata-engine-room/</link>
		<comments>http://storagemojo.com/2011/10/03/nosql-in-the-metadata-engine-room/#comments</comments>
		<pubDate>Mon, 03 Oct 2011 18:59:44 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Clusters]]></category>
		<category><![CDATA[Future Tech]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2525</guid>
		<description><![CDATA[One more datapoint and we&#8217;ll have a trend: NoSQL databases managing metadata. It&#8217;s obvious in retrospect: use a scalable big data tool to handle scale-out metadata. Maybe not a requirement today, but surely will be with even bigger data tomorrow. Metadata is a fraction of the user data set, but it gets hammered much more. [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>One more datapoint and we&#8217;ll have a trend: NoSQL databases managing metadata. It&#8217;s obvious in retrospect: use a scalable big data tool to handle scale-out metadata. Maybe not a requirement today, but surely will be with even bigger data tomorrow.</p>
<p>Metadata is a fraction of the user data set, but it gets hammered much more. As more metadata is found useful the hammering will get more insistent.</p>
<p><strong>Nutanix</strong><br />
<a href="http://www.nutanix.com/" target="_blank">Nutanix</a>, whose CTO and co-founder, Mohit Aron, was a developer of the Google File System, uses MapReduce. Nutanix achieves it scale due to its distributed metadata, masterless architecture &#8211; powered by MapReduce jobs that run in the background.</p>
<p><strong>Druva</strong><br />
<a href="http://www.druva.com/" target="_blank">Druva</a>, a backup company for mobile devices, also uses a NoSQL database to manage storage metadata. They say they&#8217;ve found that NoSQL scales over an order of magnitude better than relational in similar applications.</p>
<p><strong>Somebody else</strong><br />
A company that shall remain nameless is porting Hadoop to their backend. The customer won&#8217;t be able to access Hadoop for their work &#8211; it is strictly for the system&#8217;s internal use.</p>
<p>It is a proof of concept so it isn&#8217;t a 3rd data point, but they see the potential advantages. Call it data point 2½. </p>
<p><strong>The StorageMojo take</strong><br />
Small advances are the building blocks of disruption. RAID made it possible to build available storage using cheap disks. Consumer adoption of PCs made disks even cheaper. Moore&#8217;s Law made RAID controllers cheaper and faster, or faster and more capable. </p>
<p>A virtuous circle of disruption.</p>
<p>The basic architecture of scale-out storage systems &#8211; purpose-built software on clustered commodity hardware &#8211; has been stable. But this is the beginning of scale-out storage 2.0: taking scale-out technology developed for users and incorporating it into the storage infrastructure itself.</p>
<p>These ideas are bubbling up among the latest startups and among the establishment players. At some point the old RAID architectures will be well and truly broken, able to compete in smaller and smaller niches until the revenue can&#8217;t justify more investment. </p>
<p>Of course vendors have been making RAID controllers out of servers for years now, and those servers can run any software they want. But at some point the explicit and implicit assumptions in the old architecture crash into current realities &#8211; either in cost, development time, feature completeness or management overhead &#8211; and then we move on.</p>
<p><strong>Courteous comments welcome, of course.</strong> I learned about Nutanix at the last <a href="http://techfieldday.com/" target="_blank">Tech Field Day</a> &#8220;The Independent IT Influencer Event&#8221; which paid for my travel expenses to Silicon Valley.</p>
<div style="clear:both;margin-bottom:5px;">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2011/10/03/nosql-in-the-metadata-engine-room/&text=NoSQL in the metadata engine room " target="_blank" title="Click here if you liked this article">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2011/10/03/nosql-in-the-metadata-engine-room/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Nimble Storage architecture video</title>
		<link>http://storagemojo.com/2011/08/03/nimble-storage-architecture-video/</link>
		<comments>http://storagemojo.com/2011/08/03/nimble-storage-architecture-video/#comments</comments>
		<pubDate>Wed, 03 Aug 2011 23:26:15 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Backup]]></category>
		<category><![CDATA[Information Management]]></category>
		<category><![CDATA[SOHO/SMB]]></category>
		<category><![CDATA[Video]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2483</guid>
		<description><![CDATA[I sat down with Nimble Storage co-founder and VP of engineering Varun Mehta to discuss their architecture &#8211; and shoot some video. Varun has been part of several Valley success stories &#8211; NetApp, Sun, Data Domain &#8211; and has a first hand perspective on disruptive technologies. Varun and co-founder Umesh Maheshwari &#8211; a brilliant architect [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>I sat down with <a href="http://www.nimblestorage.com/" target="_blank">Nimble Storage</a> co-founder and VP of engineering Varun Mehta to discuss their architecture &#8211; and shoot some video. Varun has been part of several Valley success stories &#8211; NetApp, Sun, Data Domain &#8211; and has a first hand perspective on disruptive technologies.</p>
<p>Varun and co-founder Umesh Maheshwari &#8211; a brilliant architect and a very nice guy &#8211; designed the Nimble product that he discusses. Take 4 minutes to learn more about <i>Innovations in Storage Architecture at Nimble Storage</i>:</p>
<p><object width="500" height="306"><param name="movie" value="http://www.youtube.com/v/KxQVmSe_o3M?version=3"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/KxQVmSe_o3M?version=3" type="application/x-shockwave-flash" width="500" height="306" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>Or you can see it in HD on <a href="http://www.youtube.com/watch?v=KxQVmSe_o3M" target="_blank">YouTube</a>.</p>
<p><strong>The StorageMojo take</strong><br />
The Nimble guys have great technology, but they&#8217;ve also put together a compelling value proposition: collapse 3 time-consuming and complex workflows &#8211; primary storage, backup and archiving &#8211; into 1 appliance. Include all the needed software, price it well, target under-served mid-sized companies and you have a recipe for another Valley success. </p>
<p>The tech trends they&#8217;re riding will only get better. But the business trends are in their favor as well. SMB&#8217;s today have many TB of data and little staff to manage it &#8211; or capital to invest. With Congress ensuring that America operates well below capacity for years to come, the times favor thrifty solutions like Nimble&#8217;s.</p>
<p><strong>Courteous comments welcome, of course.</strong><br />
Nimble bought my time for this video, but I made all editorial decisions.</p>
<div style="clear:both;margin-bottom:5px;">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2011/08/03/nimble-storage-architecture-video/&text=Nimble Storage architecture video" target="_blank" title="Click here if you liked this article">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2011/08/03/nimble-storage-architecture-video/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>A cluster-based dedup appliance</title>
		<link>http://storagemojo.com/2011/07/28/a-cluster-based-dedup-appliance-2/</link>
		<comments>http://storagemojo.com/2011/07/28/a-cluster-based-dedup-appliance-2/#comments</comments>
		<pubDate>Thu, 28 Jul 2011 22:53:41 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Backup]]></category>
		<category><![CDATA[Enterprise]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2476</guid>
		<description><![CDATA[Quantum announced a new deduplication appliance series &#8211; the DXi 6701 and 6702 &#8211; that claims exceptional scalability. Why? Because it uses technology from Quantum&#8217;s StorNext cluster file system. Scale out Quantum says the units grow from 8 to 80TB of usable RAID 6 capacity with no subtractions for landing areas, hidden reserves or multiplication. [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>Quantum announced a new deduplication appliance series &#8211; the DXi 6701 and 6702 &#8211; that claims exceptional scalability. Why? Because it uses technology from Quantum&#8217;s StorNext cluster file system.</p>
<p><strong>Scale out</strong><br />
Quantum says the units grow from 8 to 80TB of usable RAID 6 capacity with no subtractions for landing areas, hidden reserves or multiplication. And they say they&#8217;re fast: 5.8TB/hr using VTL or OST; 5TB/hr for NAS; all dedup at wire speed.</p>
<p>The only difference between the 2 models is that one has 1Gig Ethernet and the other 10Gig. All the software is included in the price: NAS, OST, VTL, tape support, replication and client side dedup option.</p>
<p>List prices start at $56k. Quantum sells through the channel, so that&#8217;s a maximum.</p>
<p><strong>The StorageMojo take</strong><br />
The DXi 670x is a good example of the power and economy of scale-out vs scale-up. The cluster file system technology underlying it enables the 10x capacity expansion with high performance and low-costs.</p>
<p>With a scale-up approach hardware volumes would be lower with hardware and software qual and support costs higher. That Quantum owns the underlying technology makes their job that much easier. </p>
<p>Quantum&#8217;s market power is a fraction of EMC&#8217;s Data Domain. But the power of their architecture&#8217;s advantages in performance, flexibility and cost point to a larger trend.</p>
<p>The problem with Quantum&#8217;s marketing is that they only play the price/performance card. Important, no doubt, but by ignoring the fundamental advantages of their scale-out architecture, they let the competition sidestep their long-term problem: they don&#8217;t scale.</p>
<p>Winning in the development lab is only part of the battle. Helping customers appreciate &#8211; and making competitors react to &#8211; the differences, is needed to win in the market. </p>
<p><strong>Courteous comments welcome, of course.</strong> </p>
<div style="clear:both;margin-bottom:5px;">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2011/07/28/a-cluster-based-dedup-appliance-2/&text=A cluster-based dedup appliance " target="_blank" title="Click here if you liked this article">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2011/07/28/a-cluster-based-dedup-appliance-2/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>The per-slot cost metric</title>
		<link>http://storagemojo.com/2011/07/25/the-per-slot-cost-metric/</link>
		<comments>http://storagemojo.com/2011/07/25/the-per-slot-cost-metric/#comments</comments>
		<pubDate>Mon, 25 Jul 2011 20:21:00 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Cloud computing & storage]]></category>
		<category><![CDATA[Management]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2466</guid>
		<description><![CDATA[Commenters on the last post &#8211; Open source storage array &#8211; helped crystallize an idea that&#8217;s been lurking for years: comparing disk storage hardware on per-slot price. The Backblaze box, which costs about $50/slot, got a comment that said, in effect, &#8220;it doesn&#8217;t have the features of a $200/slot box.&#8221; Good! But the comment raised [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>Commenters on the last post &#8211; <a href="http://storagemojo.com/2011/07/20/open-source-storage-array/" target="_blank">Open source storage array</a> &#8211; helped crystallize an idea that&#8217;s been lurking for years: comparing disk storage hardware on per-slot price. The Backblaze box, which costs about $50/slot, got a comment that said, in effect, &#8220;it doesn&#8217;t have the features of a $200/slot box.&#8221; Good! </p>
<p>But the comment raised an interesting point: since we all use the same disks from the same few &#8211; and soon to be fewer &#8211; manufacturers, isn&#8217;t the cost of the tin we wrap them in a key metric? Let&#8217;s call it PSC &#8211; Per Slot Cost.</p>
<p>Some advantages:</p>
<ul>
<li><strong>Focus on value-add.</strong> We know how many disk slots there are in a storage system. We know how much disks cost. Therefore, the per-slot price tells us what the vendor&#8217;s value-add per disk is &#8211; or what we&#8217;re supposed to think it is.</li>
<li><strong>Increases pricing contrast.</strong> Disk costs are typically 10-15% of the price of a mid-to-high end array. The number of disk slots in those arrays vary, as do individual disk capacities. These variables obscure what the vendor is asking for their value add.</li>
<li><strong>Cleaner comparisons.</strong> As a corollary to the previous point, PSC makes it easier to compare  architecturally similar systems &#8211; SAS vs SAS, hybrid SSD/SATA systems, RAID 6 systems &#8211; whose hardware cost structures should be similar.</li>
<li><strong>Focus on software value.</strong> Since most storage systems &#8211; even high-end systems &#8211; run on commodity hardware, the biggest price variable is in software. Isn&#8217;t that where we <i>should</i> focus?
</ul>
<p><strong>The cloud storage angle</strong><br />
PSC should be useful for market segmentation. Instead of dumping arrays into entry-level price buckets &#8211; such as $75-$100k or $/GB &#8211; the PSC should track with the value of the stored data. </p>
<p>Expect to see segments range from Bulk (the Backblaze segment) to Heavy Transactional (traditional big iron) with yet-to-be-named segments between. But the most important use for PSC is in highly-scalable architectures in the public vs private cloud storage arena. </p>
<p>Cloud architectures are distinguished by the fact that the larger they scale, the lower their PSC. This is partly a function of economic necessity &#8211; who can afford 2 dozen PB of Symm? &#8211; and largely due to their use of software-based object replication instead of RAID. </p>
<p>When your storage is cheap, you can afford triple replication. And when you have massive numbers of boxes &#8211; and at least 2 data centers &#8211; you can have strong disaster tolerance. So large-scale cloud suppliers have motive and opportunity to reduce PSC. </p>
<p>The private cloud space is where the calculus gets interesting. Many observers dismiss the private cloud concept because they can&#8217;t possibly compete with Amazon, Microsoft and Google on scale or cost, including PSC. </p>
<p><strong>The StorageMojo take</strong><br />
There is a private cloud market because there are other issues, such as network latency, and the commercialization of high-scale software such as Hadoop, that make it possible for any focused billion-dollar company to build a competitive  cloud infrastructure. The hardware is already a commodity, and many of the improvements that Google 1st pushed, such as more efficient power supplies, are now widely available.</p>
<p>The bigger issue for competitive private clouds is the enterprise IT mindset that lacks the skills to specify and manage them. This is where PSC comes in: it allows CFOs to compare their costs to best-in-breed cloud providers in a simple way.</p>
<p>PSC is just a metric, not <i>the</i> metric. The big guys are optimizing things &#8211; like power distribution &#8211; that won&#8217;t move the needle for smaller players. </p>
<p>But if you use commodity hardware then you should focus on the software. And since every big player is already running on commodity hardware &#8211; a Good Thing, BTW &#8211; let&#8217;s focus on getting software that delivers business value. To the extent that PSC helps decision-makers do that, it will help the industry shift the focus from things like $/GB to a higher-level discussion.</p>
<p><strong>Courteous comments welcome, of course.</strong> I just paid $250 per slot for an array with 1 controller, 1 fan and 1 Thunderbolt connection to my 1 desktop. Yes, I could have done better &#8211; if I didn&#8217;t want Thunderbolt. So PSC doesn&#8217;t trump all.</p>
<div style="clear:both;margin-bottom:5px;">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2011/07/25/the-per-slot-cost-metric/&text=The per-slot cost metric " target="_blank" title="Click here if you liked this article">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2011/07/25/the-per-slot-cost-metric/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>It&#8217;s time for primary data compression</title>
		<link>http://storagemojo.com/2011/07/05/its-time-for-primary-data-compression/</link>
		<comments>http://storagemojo.com/2011/07/05/its-time-for-primary-data-compression/#comments</comments>
		<pubDate>Tue, 05 Jul 2011 22:05:03 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Disk]]></category>
		<category><![CDATA[Enterprise]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2451</guid>
		<description><![CDATA[Deduplication has been accepted as an enterprise-class compression technology. Is it time for data compression to be a standard feature of primary storage? I&#8217;ve been doing some work for Nimble Storage a cool Valley startup. Talking to co-founder Varun Mehta, he mentioned that Nimble&#8217;s storage/backup/archive appliance does data compression on all data, all the time. [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>Deduplication has been accepted as an enterprise-class compression technology. Is it time for data compression to be a standard feature of primary storage?</p>
<p>I&#8217;ve been doing some work for <a href="http://www.nimblestorage.com/" target="_blank">Nimble Storage</a> a cool Valley startup. Talking to co-founder Varun Mehta, he mentioned that Nimble&#8217;s storage/backup/archive appliance does data compression on all data, all the time.</p>
<p>That&#8217;s right, primary storage on Nimble&#8217;s box is <strong>always compressed</strong>. Not only that, all their performance numbers are quoted with compressed data. </p>
<p>They aren&#8217;t kidding.</p>
<p><strong>Venerable compression</strong><br />
Data compression is one of the oldest computer storage technologies around. Bell Labs mathematician Claude Shannon published <a href="http://cm.bell-labs.com/cm/ms/what/shannonday/paper.html" target="_blank">A Mathematical Theory of Communication</a> in 1948 which, among other things, laid out the math behind compression.</p>
<blockquote><p>
The ratio of the entropy of a source to the maximum value it could have while still restricted to the same symbols will be called its relative entropy. This is the maximum compression possible when we encode into the same alphabet. One minus the relative entropy is the redundancy. The redundancy of ordinary English, not considering statistical structure over greater distances than about eight letters, is roughly 50%.
</p></blockquote>
<p>In line compression has been part of every enterprise tape drive for decades. The algorithms &#8211; Lempel-Ziv was big 20 years ago &#8211; have been tuned to a fare-thee-well.</p>
<p>Compression is as thoroughly wrung out as any technology in the data center. </p>
<p>So why don&#8217;t we use it everywhere, like Nimble?</p>
<p><strong>Not about capacity</strong><br />
The doubling of capacity from compression is not the big win. The larger benefit is that it more than doubles the internal bandwidth of the array &#8211; because bandwidth is more expensive than capacity.</p>
<p>And bandwidth is more important than capacity. As John von Neumann noted in his <a href="http://www.virtualtravelog.net/entries/2003-08-TheFirstDraft.pdf" target="_blank"> First Draft of a Report on the EDVAC</a> (pdf):</p>
<blockquote><p>
This result deserves to be noted. It shows in a most striking way where the real difficulty, the main bottleneck, of an automatic very high speed computing device lies: At the memory.
</p></blockquote>
<p>Varun reports that Nimble&#8217;s comdec operates at wire speed on a multicore CPU, no ASIC or FPGA required. It must increase latency, but given Nimble&#8217;s focus on full stripe writes the increase in bandwidth must more than make up for it.</p>
<p><strong>The StorageMojo take</strong><br />
Since it is possible to perform wire-speed compression/decompression with a commodity CPU, why not everywhere? </p>
<p>Will RAID controllers stumble reconstructing compressed data? Is compressed data more prone to corruption? Is bandwidth so cheap that we don&#8217;t need more?</p>
<p>I don&#8217;t think so, but I&#8217;m open to dissenting opinions. With disk capacity growth slowing comdec everywhere is a good way to increase performance, reduce $/GB and have something new to show customers.</p>
<p><strong>Courteous comments welcome, of course.</strong> StorageMojo dove into this 5 years ago in <a href="http://storagemojo.com/2006/04/27/25x-data-compression-made-simple/" target="_blank">25x data compression made simple</a>.</p>
<div style="clear:both;margin-bottom:5px;">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2011/07/05/its-time-for-primary-data-compression/&text=It's time for primary data compression" target="_blank" title="Click here if you liked this article">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2011/07/05/its-time-for-primary-data-compression/feed/</wfw:commentRss>
		<slash:comments>29</slash:comments>
		</item>
		<item>
		<title>De-dup: too much of good thing?</title>
		<link>http://storagemojo.com/2011/06/27/de-dup-too-much-of-good-thing/</link>
		<comments>http://storagemojo.com/2011/06/27/de-dup-too-much-of-good-thing/#comments</comments>
		<pubDate>Mon, 27 Jun 2011 18:52:40 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[SSD/Flash Disk]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2434</guid>
		<description><![CDATA[A post last month in ACM&#8217;s Queue raised a disturbing point around block-level deduplication in flash SSDs: it could hose your file system. De-dup is a Good Thing, right? Researchers found that at least 1 Sandforce SSD controller &#8211; the SF1200 &#8211; does block-level deduplication by default. Many file systems write critical metadata to multiple [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>A post last month in <a href="http://queue.acm.org/detail.cfm?id=1985003" target="_blank">ACM&#8217;s Queue</a> raised a disturbing point around block-level deduplication in flash SSDs: it could hose your file system.</p>
<p><strong>De-dup is a Good Thing, right?</strong><br />
Researchers found that at least 1 Sandforce SSD controller &#8211; the SF1200 &#8211; does block-level deduplication by default. Many file systems write critical metadata to multiple blocks in case one copy gets corrupted. But what if, unbeknownst to you, your SSD de-duplicates that block, leaving your file system with only 1 copy? </p>
<p>Yup, corruption of 1 block could wipe out your entire file system. And since all the &#8220;copies&#8221; point to the same corrupted block, there&#8217;s no way to recover. </p>
<p>Most Unix superblock-based FSs and ZFS could be pooched by loss of a single block. NTFS also mirrors critical metafile info and could be vulnerable as well.</p>
<p>To be fair, AFAIK no one has reported this failure in the wild, so it is conjecture today. That said, it may have happened to people who didn&#8217;t realize what went wrong.</p>
<p>But in the world of storage, if something can happen it will, usually at the worst possible time.  Have you seen a total data loss on an otherwise functioning SSD?</p>
<p><strong>The StorageMojo take</strong><br />
I&#8217;ve made calls to a number of vendors to get their responses, including Sandforce, Intel, Texas Memory Systems and OCZ. With any luck we&#8217;ll soon have a 1st pass on who does what to your data. </p>
<p>Don&#8217;t panic: not all SSD controllers do this. Texas Memory Systems controllers don&#8217;t, partly because they don&#8217;t use MLC flash and partly because minimizing capacity use and maximizing data availability are conflicting goals, and they chose the availability over capacity.</p>
<p>Also note that the SF-1200 is offered as a consumer grade controller. Not clear what Sandforce does with the rest of their line, but their site does repeatedly reference their &#8220;DuraWrite&#8221; technology which appears to include block-level dedup. </p>
<p>Just last week StorageMojo recommended faster adoption of SSDs in the enterprise &#8211; and still does. But this once again underlines the need for mirroring. The sooner we find these issues, the sooner they&#8217;ll be fixed.</p>
<p>Watch the comments for vendor info, and I&#8217;ll update this post with more info if and when it develops. </p>
<p><strong>Update:</strong>Here is the Sandforce response:</p>
<blockquote><p>
In the recent article by David Rosenthal he mentions a conversation with Kirk McKusik and the ZFS team at Sun Microsystems (Oracle). That conversation explains why it is critical that meta data not be lost or corrupted. He goes on to say that &#8220;If the stored metadata gets corrupted, the corruption will apply to all copies, so recovery is impossible.&#8221;</p>
<p>SandForce employs a feature called DuraWrite which enables flash memory to last longer through innovative patent pending techniques. Although SandForce has not disclosed the specific operation of DuraWrite and its 100% lossless write reduction techniques, the concept of deduplication, compression, and data differencing is certainly related. Through all the years of development and OEM testing with our SSD manufacturers and top tier storage users, there has not been a single reported failure of the DuraWrite engine. There is no more likelihood of DuraWrite loosing data than if it was not present.</p>
<p>We completely agree that any loss of metadata is likely to corrupt access to the underlying data. That is why SandForce created RAISE (Redundant Array of Independent Silicon Elements) and includes it on every SSD that uses a SandForce SSD Processor. All storage devices include ECC protection to minimize the potential that a bit can be lost and corrupt data. Not only do SandForce SSD Processors employ ECC protection enabling an UBER (Uncorrectable Bit Error Rate) of greater than 10^-17, if the ECC engine is unable to correct the bit error RAISE will step in to correct a complete failure of an entire sector, page, or block. </p>
<p>This combination of ECC and RAISE protection provides a resulting UBER of 10^-29 virtually eliminates the probabilities of data corruption. This combined protection is much higher than any other currently shipping SSD or HDD solution we know about. The fact that ZFS stores up to three copies of the metadata and optionally can replicate user data is not an issue. All data stored on a SandForce Driven SSD is viewed critical and protected with the highest level of certainty.
</p></blockquote>
<p>Readers: how does that sound to you?<br />
<strong>End update.</strong><br />
<strong>Update 2:</strong> Oddly enough, the Sandforce web site specifies the SD-1200 controller at</p>
<blockquote><p>
ECC Recovery: Up to 24 bytes correctable per 512-byte sector<br />
Unrecoverable Read Errors: Less than 1 sector per 1016 bits read
</p></blockquote>
<p>which is about where many enterprise disk drives spec&#8217;d &#8211; and quite a bit less than 10<sup>-29</sup>. Hmm-m.<br />
<strong>End update 2.</strong></p>
<p><strong>Update 3:</strong><br />
Spoke to James Myers of Intel. He said that no current Intel SSD uses any form of compression, including dedup. He also cautioned against making too much of the risk: after all, you&#8217;d have to have an unrecoverable read error AND it would have to be that 1 critical block. Perhaps, he suggested, file systems that do use multiple copies of critical FS metadata could slightly alter the copies to eliminate the possibility of deduplication.<br />
<strong>End update 3.</strong></p>
<p><strong>Courteous comments welcome, of course.</strong> TMS has been advertising on StorageMojo for a couple of years. </p>
<div style="clear:both;margin-bottom:5px;">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2011/06/27/de-dup-too-much-of-good-thing/&text=De-dup: too much of good thing?" target="_blank" title="Click here if you liked this article">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2011/06/27/de-dup-too-much-of-good-thing/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Can flash SSDs be trusted?</title>
		<link>http://storagemojo.com/2011/06/20/can-flash-ssds-be-trusted/</link>
		<comments>http://storagemojo.com/2011/06/20/can-flash-ssds-be-trusted/#comments</comments>
		<pubDate>Mon, 20 Jun 2011 22:15:18 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[SSD/Flash Disk]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2404</guid>
		<description><![CDATA[IT pros are always skeptical about new technology. Is it surprising that flash SSD&#8217;s are getting the gimlet eye? The big worry seems to be endurance. Nobody wants to buy an expensive SSD and have it fail after a year on the job. But IT infrastructures are designed to manage endurance failures. LTO tape, for [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>IT pros are always skeptical about new technology. Is it surprising that flash SSD&#8217;s are getting the gimlet eye?</p>
<p>The big worry seems to be endurance. Nobody wants to buy an expensive SSD and have it fail after a year on the job.</p>
<p>But IT infrastructures are designed to manage endurance failures. LTO tape, for example, is specified for a few hundred head passes. Yet tape is the paragon of data persistence.</p>
<p>Hard drive failure rates aren&#8217;t low enough for any of us to consider storing important data on one without backup. So why are IT pros so skittish about flash SSD&#8217;s?</p>
<p><strong>Experience</strong><br />
Or rather, lack of experience. Flash SSDs are evolving rapidly, with new generations arriving every 12 to 18 months.</p>
<p>It takes time for experience with new models to percolate. In the meantime, bad experiences with earlier generation drives continue to circulate.</p>
<p>Vendor secrecy about failure rates and modes doesn&#8217;t help. Until the Bianca Schroeder/Google/CMU <a href="http://storagemojo.com/2007/02/19/googles-disk-failure-experience/" target="_blank">disk drive studies</a> were released 4 years ago, we had no independent large-scale reliability data.</p>
<p>I hope it won&#8217;t take 20 years before we get that information on SSD&#8217;s. How about it, vendors?</p>
<p><strong>Reliability</strong><br />
SSD&#8217;s may turn out to be more reliable than hard drives but I won&#8217;t believe it until I see independent data. The lack of moving parts is a plus but about half the failures and this drives come from the electronics not the spinning bits. SSDs have most of the same electronics.</p>
<p><strong>SSD equivalent of a head disk assembly</strong><br />
Plane failures are a major trouble spot. Each die consists of two planes. These planes are prone to sudden failure, wiping out half the data on a die.</p>
<p>Most chip carriers contain multiple stacked dies, so a plane failure will remove anywhere from a quarter to an eighth of the chip&#8217;s total storage. Most flash controllers lay out the data in ways similar to a RAID array to guard against data loss.</p>
<p><strong>What to look for</strong><br />
Since Maxtor&#8217;s well-deserved demise we&#8217;ve had reasonable parity between disk drives and disk drive vendors. But that is not the case with the still maturing flash drive market.</p>
<p>Storage Newsletter recently <a href="http://www.storagenewsletter.com/news/flash/90-ssd-manufacturers-in-the-world-document" target="_blank">published</a> a list of 85 SSD vendors, most of whom none of us have heard of. Many are focused on the embedded systems market, but also because the SSD market barriers to entry are small: buy controller chip; buy flash on the spot market; gen up a PC board and <i>voilà</i> you are in the SSD market.</p>
<p>But flash that ends up on the spot market at rock-bottom prices is often marginal. The big buyers, like Apple, get first dibs on the best.</p>
<p>SSDs made with spot-market flash and a no-name &#8211; USB thumb drive? &#8211; controller will have a lot more problems. Which is to say that in today&#8217;s SSD market brandnames count.</p>
<p>Other things to look for are a guarantee of total write capacity. Another is a statement on the amount of over provisioning the drive has. </p>
<p>Even better: a five-year guarantee such as Seagate popularized with disks and that Intel just started offering on one of its SSD lines.</p>
<p><strong>The StorageMojo take</strong><br />
I have been as skeptical as anyone on SSDs &#8211; read some of my earliest posts &#8211; but the time for skepticism has passed. Of course, perform careful evals on any new IT product. But the best flash SSD&#8217;s are ready for the enterprise today.</p>
<p>And here&#8217;s an even more radical conclusion: the best consumer SSD&#8217;s are ready for the enterprise as well. Using any SATA drives in your enterprise?</p>
<p>The key: how is the SSD architected into the system? If it is storage tier the data has to be protected just like a RAID array. If it is a cache you have more flexibility &#8211; as long as the data is also on disk.</p>
<p>Yes, it&#8217;s more difficult to separate the wheat from the chaff in the SSD market today. But there are quality products available today.</p>
<p><strong>Courteous comments welcome, of course.</strong><br />
Started thinking about this is result of the research project I did a few months ago. Leading-edge storage managers with workloads that would benefit enormously by flash SSD&#8217;s weren&#8217;t seriously evaluating them today. Big surprise. What do you think?</p>
<div style="clear:both;margin-bottom:5px;">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2011/06/20/can-flash-ssds-be-trusted/&text=Can flash SSDs be trusted? " target="_blank" title="Click here if you liked this article">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2011/06/20/can-flash-ssds-be-trusted/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Webinar Q&amp;A: flash SSD performance &amp; reliability</title>
		<link>http://storagemojo.com/2011/06/07/webinar-qa-flash-ssd-performance-reliability/</link>
		<comments>http://storagemojo.com/2011/06/07/webinar-qa-flash-ssd-performance-reliability/#comments</comments>
		<pubDate>Tue, 07 Jun 2011 17:02:03 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[SOHO/SMB]]></category>
		<category><![CDATA[SSD/Flash Disk]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2387</guid>
		<description><![CDATA[I was surprised by the number of questions at last week&#8217;s webinar &#8211; many more than we could get to &#8211; so I&#8217;m answering a few here. Performance Q: Can Robin talk about performance and how does flash help solve I/O bottleneck? NAND flash is very good at random reads, and a good SSD can [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>I was surprised by the number of questions at last week&#8217;s webinar &#8211; many more than we could get to &#8211; so I&#8217;m answering a few here.</p>
<p><strong>Performance</strong><br />
<i>Q: Can Robin talk about performance and how does flash help solve I/O bottleneck?</i></p>
<p>NAND flash is very good at random reads, and a good SSD can handle thousands per second, compared to a disk drive&#8217;s 150-500 (for a high-end drive). That&#8217;s one reason arrays are popular: they provide higher random I/O performance because multiple heads are seeking. But that&#8217;s also why capacity utilization is so low: the disks come with more capacity than most applications use.</p>
<p>So not only are you buying multiples of the most expensive disks, but then you only use a fraction of their capacity. This is why flash SSDs are predicted to kill the high-end drives even though SSDs cost much more per GB: a single SSD can eliminate 6-10 hard drives. That is a major cost saving.</p>
<p><i>Q: So the low cost assumes that the cache is read only, otherwise it needs to be RAID? That comes off as misleading.</i></p>
<p>While flash SSDs are much better at random reads than they are random writes, they still beat several high-end disks at writes. </p>
<p>Since most workloads are 80%-95% reads, an SSD that can handle 1,000 writes per second can handle a lot of work. Disks are still the most cost-effective solution for large sequential workloads because their performance is close to SSDs and they are so much cheaper. </p>
<p><strong>Reliability</strong><br />
<i>Q: What are Robin&#8217;s thoughts on the reliability of SSDs? We have seen failure rates of over 10% on drives less than two months old.</i></p>
<p>Flash SSD reliability today is all over the map. As flash SSD technology matures, I&#8217;d expect to see drive reliability rates converge. 5 years ago disk reliability was fairly similar with the glaring exception of Maxtor. </p>
<p>That said, it&#8217;s useful to recognize that there&#8217;s a lot more design and sourcing variability in SSDs. If someone uses the cheapest parts &#8211; and there are plenty available &#8211; they can offer good specs but highly variable reliability. </p>
<p>If they leave out too much redundancy they&#8217;ll have a cost advantage but will be more vulnerable to chip and plane failures. The market will eventually settle on similar specs for each application, but we&#8217;re years away from that.</p>
<p><i>Q: You mentioned 10,000 writes and failure can begin, what is that in years?</i></p>
<p>Like so many storage specs, that 10k write spec for MLC flash is a statistical one that can be improved upon by more robust ECC, as this chart from SNIA shows:</p>
<p><a href="http://storagemojo.com/wp-content/uploads//2011/06/ecc_flash_reliability.jpg"><img src="http://storagemojo.com/wp-content/uploads//2011/06/ecc_flash_reliability.jpg" alt="" title="ecc_flash_reliability" width="480" height="343" class="aligncenter size-full wp-image-2388" /></a></p>
<p>But the most important way to improve upon it is by increasing the capacity of the SSD. Double the size of the SSD and you double the total write capacity. </p>
<p>As to what that is in years, the industry is still figuring out how to spec that. The best vendor spec I&#8217;ve seen so far has been from Intel &#8211; 5 years at 20 GB of writes per day. </p>
<p><strong>Courteous comments welcome, of course.</strong> I enjoyed the webinar &#8211; a new experience for me &#8211; and not just because I got paid. The crew at Nimble was a pleasure to work with. Here&#8217;s a <a href="http://www.nimblestorage.com/resources/robin-harris-ssd-webinar/" target="_blank">link</a> to the webinar. </p>
<div style="clear:both;margin-bottom:5px;">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2011/06/07/webinar-qa-flash-ssd-performance-reliability/&text=Webinar Q&A: flash SSD performance & reliability" target="_blank" title="Click here if you liked this article">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2011/06/07/webinar-qa-flash-ssd-performance-reliability/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Flash and the re-architecting of storage</title>
		<link>http://storagemojo.com/2011/05/17/flash-and-the-re-architecting-of-storage/</link>
		<comments>http://storagemojo.com/2011/05/17/flash-and-the-re-architecting-of-storage/#comments</comments>
		<pubDate>Tue, 17 May 2011 18:44:11 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[SOHO/SMB]]></category>
		<category><![CDATA[SSD/Flash Disk]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2374</guid>
		<description><![CDATA[JIm Gray&#8217;s comment that disk is the new tape is truer today than it was 8 years ago. We&#8217;ve been adding caches, striping disks, modifying applications and performing other unnatural acts to both reduce and accommodate random reads and writes to disk. Flash changes the calculus of 20 years of storage engineering. Flash gives us [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>JIm Gray&#8217;s comment that <a href="http://queue.acm.org/detail.cfm?id=864078" target="_blank">disk is the new tape</a> is truer today than it was 8 years ago. We&#8217;ve been adding caches, striping disks, modifying applications and performing other unnatural acts to both reduce and accommodate random reads and writes to disk.</p>
<p>Flash changes the calculus of 20 years of storage engineering. Flash gives us abundant random reads &#8211; something hard drives are poor at &#8211; and reasonable random writes to whatever hot data we choose. </p>
<p>In a feverish burst of design and investment we&#8217;ve tried flash everywhere in the storage stack: disks; PCI cards; motherboards; controllers; built-in tiering; and appliances.  These products have been focused on enterprise datacenters or very targeted applications where the cost of flash was justifiable.</p>
<p>But clarity is emerging. It isn&#8217;t so much where you put the flash as what you ask the flash to do. There are three requirements:</p>
<ul>
<li>Valuable data. Flash is an order of magnitude more costly than disk. </li>
<li>Often accessed. If not, leave it on disk.</li>
<li>Enables new functionality and/or lowers cost. If it doesn&#8217;t, why bother?</li>
</ul>
<p><strong>The buyer&#8217;s burden</strong><br />
These requirements frame a basic point: optimizing for flash requires a systems level approach. Adding flash can make current architectures go faster, but that isn&#8217;t the big win.</p>
<p>Buyers looking for an economic edge must make a cognitive leap: <i>the old ways are no longer best</i>. Flash enables efficiencies and capabilities in smaller systems that only costly enterprise gear had a few years ago.</p>
<p><strong>Tiering</strong><br />
Tiered flash solutions are the most common approach today. Tiering software has improved in recent years, making the movement of data between flash and disk safe, fast and granular. </p>
<p>We’ve started to at least see interest in the midsize enterprise, like the <a href="http://www.equallogic.com/products/default.aspx?id=9511" target="_blank">EqualLogic hybrid SAS/SSD</a> array in VDI deployments.</p>
<p><strong>Metadata and cache</strong><br />
The best fit for flash today is metadata and caching. These best meet the requirements for value, access and functionality.</p>
<p>Once metadata is freed from disk constraints we can combine it with caching to build high-performance systems on commodity hardware. The win for innovators is to design new metadata structures and caching algorithms for flash. </p>
<p>They can design the (write) data layouts to best take advantage of the physics of disk and flash, such as with <a href=”http://storagemojo.com/2010/11/08/jack-be-nimble/” target=”_blank”>Nimble Storage’s CASL architecture</a>, which combines a large flash cache with full-stripe writes, is one example.</p>
<p>Flash is also an important enabler for low-cost de-duplication because it&#8217;s cheaper to keep block metadata &#8211; fingerprints or hash codes &#8211; in flash than it is in RAM. Some vendors are encouraging the use of de-duplicated storage for midrange primary storage, enabled by flash indexes or caches that make it feasible to reconstruct files on-the-fly. </p>
<p><strong>The StorageMojo take</strong><br />
Shaking off the effects of 50 years of disk-based limitations isn&#8217;t easy. Our disk-based orthodoxy is ingrained in architectures and our thinking. </p>
<p>But buyers face a difficult job: evaluating architectures and algorithms to choose  products for eval. A shortcut: look for architectures that collapse existing storage stovepipes to reduce cost, total data stored and operational complexity. The three are related and offer the big wins. </p>
<p>In the last 10 years raw disk capacity cost has dropped to less than a 10th of what they were, but the cost of traditional storage systems haven&#8217;t. The culprits: operating costs; storage network infrastructure costs; and capacity requirements that have risen faster than management productivity. </p>
<p>The flood of data continues to rise, but cost and complexity doesn&#8217;t have to rise with it. We can &#8211; and are &#8211; doing better.</p>
<p><strong>Courteous comments welcome, of course.</strong> I&#8217;ve been working with Nimble Storage lately and like what they&#8217;ve done.</p>
<div style="clear:both;margin-bottom:5px;">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2011/05/17/flash-and-the-re-architecting-of-storage/&text=Flash and the re-architecting of storage" target="_blank" title="Click here if you liked this article">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2011/05/17/flash-and-the-re-architecting-of-storage/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
	</channel>
</rss>

