<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>StorageMojo &#187; Future Tech</title>
	<atom:link href="http://storagemojo.com/category/future-tech/feed/" rel="self" type="application/rss+xml" />
	<link>http://storagemojo.com</link>
	<description>Data storage info &#38; analysis</description>
	<lastBuildDate>Mon, 21 May 2012 22:16:25 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Violin&#8217;s clean-sheet architecture</title>
		<link>http://storagemojo.com/2012/04/11/violins-clean-sheet-architecture/</link>
		<comments>http://storagemojo.com/2012/04/11/violins-clean-sheet-architecture/#comments</comments>
		<pubDate>Wed, 11 Apr 2012 20:29:23 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Disk]]></category>
		<category><![CDATA[Enterprise]]></category>
		<category><![CDATA[Future Tech]]></category>
		<category><![CDATA[SSD/Flash Disk]]></category>
		<category><![CDATA[Video]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2637</guid>
		<description><![CDATA[Over 3 years ago StorageMojo saw that Violin Memory was &#8220;. . . on the winning architectural track.&#8221; Well, it took a lot of time and money, but Violin is making good on that early promise. StorageMojo&#8217;s enthusiasm was kindled by Violin&#8217;s unique architecture. Here&#8217;s a short video that shows how Violin&#8217;s architecture addresses key [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>Over <a href="http://storagemojo.com/2009/01/04/the-top-storage-stories-of-2008/" target="_blank">3 years ago</a> StorageMojo saw that <a href="http://www.violin-memory.com/" target="_blank">Violin Memory</a> was &#8220;. . . on the winning architectural track.&#8221; Well, it took a lot of time and money, but Violin is making good on that early promise.</p>
<p>StorageMojo&#8217;s enthusiasm was kindled by Violin&#8217;s unique architecture. Here&#8217;s a short video that shows how Violin&#8217;s architecture addresses key problems with flash:</p>
<p><iframe width="425" height="349" src="http://www.youtube.com/embed/L2VibZhNFbE?hl=en&#038;fs=1" frameborder="0" allowfullscreen></iframe></p>
<p>Full screen mode recommended.</p>
<p><strong>The StorageMojo take</strong><br />
The industry is still in the early days of digesting the implications of fast persistent solid state storage. We&#8217;ve built up 50 years of cruft to deal with disk&#8217;s many issues. It will take a few more years for flash&#8217;s new options to ripple through the entire storage, server and application stack.</p>
<p>Take, for example, failover. If all apps and monitoring software could declare a failure in 10 seconds rather than, say, a minute, how much smoother would major apps run? How much better would be the perception of system uptime and response times be?</p>
<p>There are many other possibilities &#8211; what about metadata? &#8211; that flash and its successor technologies will affect. I&#8217;ll be offering more detail in my keynote at the <a href="http://techfieldday.com/2012/ssss12/" target="_blank">Solid State Storage Symposium</a> on Wednesday, April 25 in Silicon Valley. S4 is free and you can <a href="http://ssss12.eventbrite.com/" target="_blank">register here</a>.</p>
<p><strong>Courteous comments welcome, of course.</strong> The other flash company I liked in 2009 was Fusion-io, and they&#8217;ve done OK. And yes, Violin paid StorageMojo to produce the video white paper, but the opinions are my own.</p>
<div class="twttr_button">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2012/04/11/violins-clean-sheet-architecture/&text=Violin's clean-sheet architecture" target="_blank" title="Click here if you liked this article.">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2012/04/11/violins-clean-sheet-architecture/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Tintri responds on SSD arrays</title>
		<link>http://storagemojo.com/2012/03/20/tintri-responds-on-ssd-arrays/</link>
		<comments>http://storagemojo.com/2012/03/20/tintri-responds-on-ssd-arrays/#comments</comments>
		<pubDate>Tue, 20 Mar 2012 23:27:50 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Enterprise]]></category>
		<category><![CDATA[Future Tech]]></category>
		<category><![CDATA[SSD/Flash Disk]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2618</guid>
		<description><![CDATA[StorageMojo offered its soapbox to any vendors willing to weigh in on the question of whether enterprise arrays should be built from flash SSDs or not. Ed Lee, architect at Tintri, formerly of Data Domain and a Berkeley Ph.D, elected to respond. It is a long piece but rich in insight. Tintri produces hybrid disk/flash [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>StorageMojo offered its soapbox to any vendors willing to weigh in on the question of whether enterprise arrays should be built from flash SSDs or not. Ed Lee, architect at <a href="http://www.tintri.com/products/technology/" target="_blank">Tintri</a>, formerly of Data Domain and a Berkeley Ph.D, elected to respond. It is a long piece but rich in insight. </p>
<p>Tintri produces hybrid disk/flash SSD appliances optimized for virtual environments, not Symm-killers. They use SSDs in their products, as do other folks like <a href="http://www.nimblestorage.com/" target="_blank">Nimble Storage</a>. </p>
<p>No money changed hands between Tintri and StorageMojo or related entities. My accountant is weeping in the next room.</p>
<p><strong>Begin Tintri&#8217;s response:</strong></p>
<blockquote><p>
<strong>Outside the SSD Box: More than Faster Disk</strong><br />
Robin Harris of Storage Mojo in his recent article, &#8220;<a href="http://storagemojo.com/2012/03/05/are-ssd-based-arrays-a-bad-idea/" target="_blank">Are SSD-based arrays a bad idea?</a> and Matt Kixmoeller of Pure in his response, <a href="http://www.purestorage.com/blog/the-ssd-is-key-to-economic-flash-arrays/" target="_blank">The SSD is Key to Economic Flash Arrays</a>, present interesting perspectives on whether or not SSDs are the best technology for building flash-based arrays. Robin argues that by rethinking how flash can be packaged outside the SSD box, you can achieve better performance, reliability, cost and flexibility. And these observations are supported by the experience of existing flash-based storage vendors who have developed their own custom flash modules and packaging. Matt argues that SSDs provide an industry-standard product that requires less investment to leverage, better economies of scale, and rapid improvement in technology. These are also very valid points, especially for startups with limited time and capital.</p>
<p><strong>Latency</strong><br />
Taking latency as a point for comparison, flash-based storage vendors using custom packaging often quote IO latencies in the tens of microseconds versus SSD latencies of low hundreds of microseconds. While this is a notable difference, software and interfaces can also add overhead and the final latency seen at the subsystem level may differ by only a factor of two to four. Server-side flash products can avoid more of the software and interface overhead and provide better latencies – but may require rewriting applications to capitalize on this advantage. Keep in mind that hard disk latencies can easily reach tens of milliseconds under even moderate load. ALL of these flash-based products have latencies that are hundreds of times faster than disk.</p>
<p><a href="http://storagemojo.com/wp-content/uploads//2012/03/Bottleneck-no-longer-storage.png"><img src="http://storagemojo.com/wp-content/uploads//2012/03/Bottleneck-no-longer-storage.png" alt="" title="Bottleneck no longer storage" width="500" height="351" class="aligncenter size-full wp-image-2619" /></a></p>
<p>In short, most of the performance improvement comes from simply replacing hard disk with some form of flash. This immediately shifts the performance bottleneck from storage to some other component in your system. As a result, you won’t be able to take full advantage of flash performance without also optimizing the performance of the rest of your infrastructure, and ultimately rewriting your applications as well.</p>
<p>The above phenomenon explains why replacing your hard disk with flash often speeds up your applications by only a factor of two to three rather than ten or a hundred. Congratulations! You’ve just moved the bottleneck from storage to some other component of your system. By Amdahl’s Law, further improving only storage performance has diminishing returns. So while custom packaging does provide significant advantages in latency, most applications are unlikely to benefit until the rest of the computing ecosystem is optimized to take full advantage of flash.</p>
<p>To take a closer look at SSD latencies, I ran the following simple experiment:<br />
1)	Erase an MLC SSD so that no logical blocks were actually mapped to flash, and then issue small random reads.<br />
2)	Overwrite the entire SSD so that all logical blocks are mapped, and issue the same small random reads in step 1.</p>
<p>The idea here is to measure the software and protocol overheads of accessing flash packaged as SSD separately from accessing the data on the SSD. Reads with no blocks mapped had latencies of around 70us, while the reads with all blocks mapped had latencies of 250us. In this case only a fraction of the overall IO latency was due to SW and protocol overhead, indicating that SSDs may still have significant room for improving latency.</p>
<p><strong>Form factor</strong><br />
Another important issue discussed by both Robin and Matt is the relative cost of flash packaged in SSD versus non-SSD form factors. Robin argues that an SSD costs significantly more $/GB than the underlying flash while Matt argues that non-SSD packaging is expensive to develop, and SSDs provide useful flash management functions as well as hot-swap capability. It’s certainly true that developing custom packaging has a high up front cost, although this is likely balanced by lower unit costs. But as Robin points out, there are also standard packaging options available for non-SSD form factor flash, which may make custom packaging for non-SSD flash unnecessary.</p>
<p>A very important point to keep in mind when thinking about commercially available SSD vs. non-SSD form factors is that SSDs are designed as a substitute for disk, while non-SSD form factors are often designed as substitutes for memory. This means that SSDs focus primarily on reducing $/GB (its greatest weakness vs. disk), while non-SSDs focus on reducing $/IOPS (its greatest weakness vs. DRAM). This explains why SSD is currently much cheaper on a $/GB basis than PCIe flash, while PCIe flash designed as memory expansion is cheaper on a $/IOPS basis than SSD. This is not to say that you can’t build a non-SSD form factor that has lower $/GB than SSD, just that the primary applications for these non-SSD form factors today is usually not as a replacement for disk.</p>
<p>Whether flash in SSD versus non-SSD form factors is better for use in storage subsystems in the long run primarily depends on the relative volumes of these products, and the feature and price sensitivity of the applications these products serve. At this point the ‘winning’ form-factor seems hard to predict. So as a flash subsystem vendor, it seems desirable to keep your options open and ensure that your technology will work well with a variety of packaging options.</p>
<p><strong>More than just a faster disk</strong><br />
But flash is about more than just performance and packing. Flash enables much more than just a faster, denser replacement for disk. With flash, we can finally remove a key mechanical barrier to scaling not only storage systems, but computing systems in general. Going forward, CPU, network and storage can now all scale with improvements in semiconductor technology. When transistors replaced vacuum tubes, we got more than just compact radios; we got simpler, more powerful computing systems. Similarly, flash is a catalyst that will enable far greater levels of automation and functionality for storage and computing systems than is possible today.</p>
<p>I tend to think of the value of new technology as the product of its simplicity times the functionality it offers. It&#8217;s clear why functionality is important, but why is simplicity so important? Technology that is simple to use will be used more often, to solve more problems, in less time. As a result, simplicity has a compounding effect on value:</p>
<p>Value = Simplicity * Functionality</p>
<p>How does one measure simplicity? One way is to list the basic steps it takes to perform a task and how long each step takes. One to three is good, four to six is manageable, and anything resembling a twelve step program will likely require written directions and a significant amount of focus. Note that in assessing the simplicity and functionality of a technology, one must do it in the context of the job that needs to be done. For example, a chainsaw has great features for cutting down trees but not for giving haircuts.</p>
<p>A common problem with many general purpose storage products when applied to applications such as virtualization is that they require executing long lists of steps to get anything done – and most of the features are not directly applicable to virtualization. Paradoxically, many of the features that try to make these products better suited to the application end up making the products more complex – resulting in little improvement in overall value. Kind of like adding too many tools to a Swiss army knife until you have so many that the attachments start to stick and rub against each other.</p>
<p><a href="http://storagemojo.com/wp-content/uploads//2012/03/Swiss-Army-Giant-Knife.jpg"><img src="http://storagemojo.com/wp-content/uploads//2012/03/Swiss-Army-Giant-Knife.jpg" alt="" title="Swiss Army Giant Knife" width="500" height="350" class="aligncenter size-full wp-image-2620" /></a></p>
<p><strong>Flash as a catalyst</strong><br />
Flash eliminates a key mechanical barrier to scaling computing systems and is 400 times faster than disk. To keep things in perspective, the speed of sound is “only” 250 times faster than walking! If I could get to work at supersonic speeds, I would no doubt save a lot of time each year. But would I do no more which such an ability? Similarly, is flash just a faster replacement for disk? Will it make no significant difference in the way storage is managed and used? We obviously don’t think so. Flash will greatly increase the value of storage by improving both the simplicity and functionality of enterprise storage products. But these gains will not come easily or without their own set of problems.</p>
<p>An obvious way flash promotes simplicity is by eliminating performance bottlenecks, but as flash enables more dense storage systems many of those gains will be converted to problems in quality-of-service. A more significant way flash promotes value is by providing a better building block for constructing storage systems: flash promotes simplicity by enabling higher levels of automation and allows the implementation of more powerful functionality.</p>
<p>Flash will fragment the enterprise storage market. The general purpose storage systems of today will be supplanted by new flash-based products that are far simpler and more powerful for the specific application areas that they target. This will amplify the simplicity and power that flash already makes possible, and further accelerate the fragmentation of the storage market. This is precisely what happened in the 1980’s when advances in networking technology caused a shift from centralized computing to networked computing – and in the process fragmented the direct attached storage market into ones based on networked storage technology. Over time, the networked storage markets consolidated into the current general purpose storage market dominated by a few major vendors. And so the cycle is repeating itself. </p>
<p><a href="http://storagemojo.com/wp-content/uploads//2012/03/market_fragmentation.jpg"><img src="http://storagemojo.com/wp-content/uploads//2012/03/market_fragmentation.jpg" alt="" title="market_fragmentation" width="500" height="400" class="aligncenter size-full wp-image-2621" /></a></p>
<p>We are at the start of a new technological shift. A shift that is made possible by flash and one that will disrupt the existing enterprise storage market. Just as transistors enabled new products such as personal computers and smart phones, flash will enable simple, intelligent and fast enterprise storage systems. In turn, this will lead to much higher value for end users, but only if we think outside the storage box and treat flash as more than just a faster, denser disk.
</p></blockquote>
<p><strong>The StorageMojo take</strong><br />
For the record the original post wasn&#8217;t looking at hybrid solutions, although it is obvious that SSDs can help legacy designs stay competitive without replacing all disks for a few years. For folks like Tintri and Nimble who want to speed up disk storage to stay affordable SSDs make sense. Why engineer a small part of your system when an off-the-shelf solution will suffice?</p>
<p>But for high end transactional SAN storage I still don&#8217;t see how SSDs are the right way to go. But I&#8217;m expecting more responses, so stay tuned.</p>
<p><strong>Courteous comments welcome, of course.</strong> I&#8217;m working on a post that reflects directly on Ed&#8217;s comment about SSD latency. You&#8217;ll like it.</p>
<div class="twttr_button">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2012/03/20/tintri-responds-on-ssd-arrays/&text=Tintri responds on SSD arrays" target="_blank" title="Click here if you liked this article.">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2012/03/20/tintri-responds-on-ssd-arrays/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>NAND&#8217;s dimming future</title>
		<link>http://storagemojo.com/2012/02/29/nands-dimming-future/</link>
		<comments>http://storagemojo.com/2012/02/29/nands-dimming-future/#comments</comments>
		<pubDate>Wed, 29 Feb 2012 20:21:16 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Future Tech]]></category>
		<category><![CDATA[SSD/Flash Disk]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2593</guid>
		<description><![CDATA[Another StorageMojo Best paper, The Bleak Future of NAND Flash Memory, presented at this year&#8217;s FAST &#8217;12 conference, quantifies flash&#8217;s declining reliability, endurance, and performance as density increases. Researchers Laura M. Grupp and Steven Swanson from the UCSD Non-volatile Systems Lab and John D. Davis of Microsoft Research collected data from 45 flash chips from [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>Another StorageMojo Best paper, <a href="http://cseweb.ucsd.edu/users/swanson/papers/FAST2012BleakFlash.pdf" target="_blank">The Bleak Future of NAND Flash Memory</a>, presented at this year&#8217;s FAST &#8217;12 conference, quantifies flash&#8217;s declining reliability, endurance, and performance as density increases. </p>
<p>Researchers Laura M. Grupp and Steven Swanson from the UCSD Non-volatile Systems Lab and John D. Davis of Microsoft Research collected data from 45 flash chips from 6 manufacturers. Using that empirical data they predict the performance and cost characteristics of future SSDs. </p>
<p><strong>Faster better cheaper or slower worse cheaper?</strong><br />
While NAND flash is produced with semiconductor processes, smaller feature sizes don&#8217;t lead to faster performance or greater reliability. As NAND features shrink, so do the number of trapped electrons that store information. </p>
<p><strong>Figures of merit</strong><br />
The research found that performance, program/erase endurance, energy efficiency, and data retention time all got worse with feature shrink.</p>
<p>Based on past performance, the team derived equations to describe how changes in feature size have affected key specs. They looked at SLC, MLC and TLC and feature sizes scaled from 72 nm to 6.5 nm (the consensus smallest feature size published in the International Technology Roadmap for Semiconductors (ITRS0), and assumed a fixed silicon budget for flash storage.</p>
<p><strong>Key results</strong></p>
<ul>
<li><strong>Latency.</strong> MLC write latency will double over time. Triple-level cell writes will grow to over 2.5MS, noticably reducing its performance advantage over disk writes.</li>
<li><strong>Bandwidth.</strong> Small &#8211; 512B &#8211; read bandwidth and all writes decline by up to 50% over time. The impact is greatest on high-performance SLC flash.</li>
<li><strong>IOPS.</strong> MLC flash I/O rates will drop almost in half. </li>
</ul>
<p>Flash may be the new disk in a few years.</p>
<p><strong>The StorageMojo take</strong><br />
One important qualifier is that for the purposes of their modeling the team constrained the number of chips in the hypothetical future devices whose performance they predicted. While fine for isolating the impact of future chip shrinks, it ignores the potential of much greater parallelism for managing these changes.</p>
<p>Bandwidth drops by half? Double the number of chips.</p>
<p>But if something can&#8217;t go on forever, it won&#8217;t. NAND flash will soon enter an end-of-life crisis for computer applications that need performance. That&#8217;s why ReRAM (resistance RAM) looks to be a good bet for replacing computer flash &#8211; not mobile device flash &#8211; over the next decade.</p>
<p><strong>Courteous comments welcome, of course.</strong> A version of this post was published on ZDNet last week.</p>
<div class="twttr_button">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2012/02/29/nands-dimming-future/&text=NAND's dimming future" target="_blank" title="Click here if you liked this article.">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2012/02/29/nands-dimming-future/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Virtualizing storage controllers</title>
		<link>http://storagemojo.com/2012/02/28/virtualizing-storage-controllers/</link>
		<comments>http://storagemojo.com/2012/02/28/virtualizing-storage-controllers/#comments</comments>
		<pubDate>Tue, 28 Feb 2012 15:30:56 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Enterprise]]></category>
		<category><![CDATA[Future Tech]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2587</guid>
		<description><![CDATA[A hardware storage controller is an expensive guarantee that you&#8217;re using old technology to handle your most important data. Hardware specs are frozen early in the typical 18-24 month development cycle so by the time you get your &#8220;new&#8221; controller it is already 2 years old. But it may not have to be that way. [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>A hardware storage controller is an expensive guarantee that you&#8217;re using old technology to handle your most important data. Hardware specs are frozen early in the typical 18-24 month development cycle so by the time you get your &#8220;new&#8221; controller it is already 2 years old.</p>
<p>But it may not have to be that way. In <a href="http://static.usenix.org/event/fast12/tech/full_papers/Ben-Yehuda2-2-12.pdf" target="_blank">Adding Advanced Storage Controller Functionality via Low-Overhead Virtualization</a> researchers Muli Ben-Yehuda, Michael Factor, Eran Rom, Avishay Traeger, Eran Borovik and Ben-Ami Yassour of IBM Research–Haifa wanted to find out if virtualized storage controller features are feasible.</p>
<p>Short answer: with some tweaking, yes.</p>
<p>The big question is overhead. Storage controllers are typically in the data path, so latency, as well as compute efficiency on out-of-date processors, are real concerns.</p>
<p>Unlike the gateway approach of virtual storage appliances (VSA), the team ran the VMs directly on storage controllers using the Linux KVM hypervisor.</p>
<p><strong>Overhead</strong><br />
The team identified 3 sources of performance overhead:</p>
<ul>
<li><strong>Base.</strong> System work such as virtual memory managment or process switching.</li>
<li><strong>External communication.</strong> Important if a new function is layered on top of the storage system, such as a file server.</li>
<li><strong>Internal communication.</strong> Virtual machine coordination and communication with the hardware controller.</li>
</ul>
<p><strong>Reducing overhead</strong><br />
Different techniques are used to limit each type of overhead.</p>
<p><strong>Base</strong> They statically allocate CPU cores to the guest to ensure sufficient resources. Memory is also statically allocated to the VM to reduce translation overheads.</p>
<p><strong>External</strong> Device assignment is the highest-performing approach as it eliminates hypervisor intervention for physical events. This requires assigning the network device directly to the guest using an <a href="http://www.intel.com/content/www/us/en/pci-express/pci-sig-sr-iov-primer-sr-iov-technology-paper.html" target="_blank">SR-IOV</a> (single root I/O virtualization) enabled adapter which allows the guest to send requests directly to the device. </p>
<p><strong>Internal communications</strong> To reduce internal communication overhead, they modified KVM’s block driver to poll instead of interrupt. This gives a fast, exit-less, zero-copy transport.</p>
<p><strong>Results</strong></p>
<blockquote><p>
By using these techniques, we show no measurable difference in network latency between bare metal and virtualized I/O and under 5% difference in throughput. For internal communication, micro-benchmarks show 6.6μs latency overhead, read throughput of 357K IOPS, and write throughput of 284K IOPS; roughly seven times better than a base KVM implementation. In addition, an I/O intensive filer workload running in KVM incurs less than 0.4% runtime performance overhead compared to bare metal integration.
</p></blockquote>
<p>That sounds pretty good.</p>
<p><strong>The StorageMojo take</strong><br />
While the static assignments may reduce flexibility, the win is updating storage functionality on the fly. But are there viable use cases? The arc of controller history suggests there are.</p>
<p>The earliest disk drives were directly controlled by the host CPU. Over the decades that and much other functionality migrated to controllers and to disks. Lately that trend has slowed because of large investments in existing standards.</p>
<p>This paper shows that it is possible to migrate more functionality to controllers without lengthy development cycles, enabling architects to make different tradeoffs. </p>
<p>For example, big data requires big pipes, and big pipes are expensive. If volume-reducing preprocessors could be added to file servers, existing bandwidth could be optimized. </p>
<p>More importantly, it suggests that by virtualizing the controller&#8217;s applications, the underlying hardware can be updated more frequently. To be fair, that&#8217;s not what the authors suggested, but it certainly seems possible based on their work.</p>
<p><strong>Courteous comments welcome, of course.</strong> Jeff Darcy of Red Hat has his own list of favorite papers from FAST &#8217;12 <a href="http://hekafs.org/blog" target="_blank">here</a>.</p>
<div class="twttr_button">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2012/02/28/virtualizing-storage-controllers/&text=Virtualizing storage controllers" target="_blank" title="Click here if you liked this article.">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2012/02/28/virtualizing-storage-controllers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Doubling flash write performance through retention relaxation</title>
		<link>http://storagemojo.com/2012/02/27/doubling-flash-write-performance-through-retention-relaxation/</link>
		<comments>http://storagemojo.com/2012/02/27/doubling-flash-write-performance-through-retention-relaxation/#comments</comments>
		<pubDate>Mon, 27 Feb 2012 15:07:22 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Future Tech]]></category>
		<category><![CDATA[SSD/Flash Disk]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2582</guid>
		<description><![CDATA[FAST &#8211; File and Storage Technology &#8211; is a must-see conference for StorageMojo, and I&#8217;ll be reviewing several Best Papers from FAST &#8217;12 . While most emerging technology is developed in private company labs, FAST is where much of the first publicly available research is published. Case in point, a StorageMojo Best Paper of FAST [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>FAST &#8211; File and Storage Technology &#8211; is a must-see conference for StorageMojo, and I&#8217;ll be reviewing several Best Papers from FAST &#8217;12 . While most emerging technology is developed in private company labs, FAST is where much of the first publicly available research is published.</p>
<p>Case in point, a StorageMojo Best Paper of FAST &#8217;12: <a href="http://static.usenix.org/events/fast12/tech/full_papers/Liu.pdf" target="_blank">Optimizing NAND Flash-Based SSDs via Retention Relaxation</a> by Ren-Shuo Liu and Chia-Lin Yang of National Taiwan University, and Wei Wu of Intel. NAND engineers have known for years that it is possible to speed up writes by allowing for shorter retention, but this paper quantifies the process.</p>
<p>Data retention was a theme of several papers. Disk drives don&#8217;t care if an update needs to last a minute or a year, but flash does. </p>
<p><strong>NAND retention</strong><br />
NAND flash writes are spec&#8217;d &#8211; by JEDEC &#8211; for one year of retention. But relaxing that retention requirement can be beneficial.</p>
<ul>
<li><strong>Speed.</strong> Writes can be 1.8 to 5.7x faster, depending on how long the data is to be kept.</li>
<li><strong>SSD architecture.</strong> The need for overprovisioning and other choices is a direct result of incoming data rates and flash write speeds. Faster writes might also mean allow less aggressive garbage collection.</li>
<li><strong>ECC.</strong> As feature sizes shrink and NAND cells get flakier, the ECC overhead required to achieve a year&#8217;s retention grows. Single error correcting codes used to suffice. Now we need 24-error correcting codes and the arms race continues.</li>
</ul>
<p>These advantages are meaningless if most writes need to be retained for more than, say, 2 weeks. The authors looked at a number workload traces and found that for all but one of them, at least 50% of the writes were retained for 1 week or less. For active enterprise workloads the percentage is likely to over 75%. </p>
<p><strong>What happens when the time is up?</strong><br />
The authors propose that the Flash Translation Layer keep track of how long each block remains unchanged. When &#8211; and if &#8211; it reaches the threshold, a background process rewrites the data for the standard 1 year retention.</p>
<p>It is feasible to differentiate between host writes and background writes &#8211; garbage collection, for example &#8211; and to write them differently. Long-term writes would get improved ECC, while host writes would avoid the costly ECC encoding required.</p>
<p>Yes, there is overhead in managing the fast blocks and rewriting long-term data. But the added performance appears to make that a small price to pay.</p>
<p><strong>The StorageMojo take</strong><br />
The paper presents a strong case for relaxing retention requirements to improve performance. As future generations of flash become less reliable and slower we&#8217;ll need this and other techniques to improve &#8211; or at least maintain &#8211; performance.</p>
<p>Many performance enhancement schemes require unrealistic levels of intelligence about application or system behavior to be effective. But this is within the realm of practical implementation.</p>
<p>The retention issue is a fair example of being handed a lemon and making lemonade. Or offering another degree of freedom to system architects. </p>
<p>In fact, some vendors are already exploring this possibility. If it extends the useful life of flash for a few years it will be well worth the engineering effort.</p>
<p><strong>Courteous comments welcome, of course.</strong> A somewhat analogous process for disks is the concept of shingle writes, an area UCSC has been working in. Will disk vendors pick it up?</p>
<div class="twttr_button">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2012/02/27/doubling-flash-write-performance-through-retention-relaxation/&text=Doubling flash write performance through retention relaxation" target="_blank" title="Click here if you liked this article.">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2012/02/27/doubling-flash-write-performance-through-retention-relaxation/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The network is choking our storage</title>
		<link>http://storagemojo.com/2011/10/20/the-network-is-choking-our-storage/</link>
		<comments>http://storagemojo.com/2011/10/20/the-network-is-choking-our-storage/#comments</comments>
		<pubDate>Thu, 20 Oct 2011 17:03:08 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Cloud computing & storage]]></category>
		<category><![CDATA[Clusters]]></category>
		<category><![CDATA[Future Tech]]></category>
		<category><![CDATA[SAN, FC]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2533</guid>
		<description><![CDATA[Amazon Web Services architect James Hamilton has been posting on network issues for over a year and researching them much longer. As Ethernet becomes the de facto SAN technology, his views become more relevant to the larger storage market. Critique Part of Mr. Hamilton&#8217;s concern is the structure of the networking industry: the high margins; [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>Amazon Web Services architect James Hamilton has been <a href="http://perspectives.mvdirona.com/2011/10/01/ChangesInNetworkingSystems.aspx" target="_blank">posting</a> on network issues for over a year and researching them much longer. As Ethernet becomes the <i>de facto</i> SAN technology, his views become more relevant to the larger storage market.</p>
<p><strong>Critique</strong><br />
Part of Mr. Hamilton&#8217;s concern is the structure of the networking industry: the high margins; the dominance of a single player, Cisco; the closed technology; and the heavy vertical integration. All antithetical to the dynamics that have driven server costs down so successfully in the last 20 years.</p>
<p>These are issues the storage industry knows too well. But Mr. Hamilton is more concerned about the waste the current high-cost industry structure causes.</p>
<p>Waste?</p>
<p><strong>Workload placement</strong><br />
The cost of network bandwidth leads to network over-subscription. Networks are configured as tree topologies: the further you move from end nodes the worse the over subscription. </p>
<p>As described in the 2009 Microsoft Research paper <a href="http://research.microsoft.com/pubs/80693/vl2-sigcomm09-final.pdf" target="_blank">VL2: A Scalable and Flexible Data Center Network</a>:</p>
<blockquote><p>
. . . the capacity between different branches of the tree is typically over- subscribed by factors of 1:5 or more, with paths through the highest levels of the tree oversubscribed by factors of 1:80 to 1:240. This limits communication between servers to the point that it fragments the server pool — congestion and computation hot-spots are prevalent even when spare capacity is available elsewhere.
</p></blockquote>
<p>This throttles data center performance by limiting server-to-server bandwidth, fragmenting resources and reducing network utilization. The latter reflects the redundant paths needed in case of switch failure: ≈50% or more of costly data center bandwidth goes unused.</p>
<p>As might be expected, big Internet data centers like Amazon&#8217;s have complex and unpredictable workloads. They need lots of bandwidth between all servers all the time.</p>
<p><strong>A solution</strong><br />
The VL2 paper describes an experimental solution to these problems that includes <i>location-specific</i> and <i>application-specific</i> addressing, multi-path traffic load balancing and a novel directory design that efficiently handles lookups and updates to network mappings.</p>
<p>In an 75-node test cluster the design moved 2.75TB of data in 395 seconds &#8211; 94% of maximum network bandwidth &#8211; at a fraction of the cost of current enterprise networks. The paper calculates that a cloud-service scale network with no over-subscription could be built with commodity switches at <strong>1/14th the cost</strong> of a traditional data center Ethernet.</p>
<p>Whoa!</p>
<p><strong>The StorageMojo take</strong><br />
VC and engineering dollars follow high-growth markets. What Google, Amazon and Microsoft want, they get. With the rapid growth of public cloud services the network over-subscription problem will get solved. </p>
<p>Merchant silicon from Broadcom, Intel and Marvell is making a tried-and-true Moore&#8217;s Law attack on hardware cost. The protocol stack is tougher, but several open-source industry initiatives are under way with support from major companies. Progress will be slower than hoped, but within 3 years we&#8217;ll have a viable stack to build on.</p>
<p>Where does this leave the networking industry? That depends on where you sit.</p>
<p>Cisco will be the biggest loser, because they&#8217;ve been the biggest winner with the current model. They may need to pull an IBM and move big into services if they want to stick around. Ironically, Cisco&#8217;s UCS product line &#8211; which bakes in the tree-structured network &#8211; has further motivated broader industry action.</p>
<p>The rest of the industry can go after this emerging market with a lower-GM business model. Not all of them will, but it will be a critical success factor. </p>
<p>The big winner will be storage. Scale-out storage relies on spraying data across multiple racks for maximum availability, utilization and performance. Cheaper, faster, better scale-out networks will only drive storage demand.</p>
<p>For most of us this is an academic problem today. Lightly used systems &#8211; such as for backup and archiving &#8211; don&#8217;t see Amazon&#8217;s problems. But in 5 years this will be common even outside the public cloud providers.</p>
<p>Just as IT users have benefited from Google&#8217;s push on energy efficiency and much more, they will also benefit from much lower cost and more scalable networks.</p>
<p><strong>Courteous comments welcome, of course.</strong> I can&#8217;t help but continue to marvel at how dumb Cisco&#8217;s UCS has turned out to be. It&#8217;s a gift that keeps on giving.</p>
<div class="twttr_button">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2011/10/20/the-network-is-choking-our-storage/&text=The network is choking our storage " target="_blank" title="Click here if you liked this article.">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2011/10/20/the-network-is-choking-our-storage/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>NoSQL in the metadata engine room</title>
		<link>http://storagemojo.com/2011/10/03/nosql-in-the-metadata-engine-room/</link>
		<comments>http://storagemojo.com/2011/10/03/nosql-in-the-metadata-engine-room/#comments</comments>
		<pubDate>Mon, 03 Oct 2011 18:59:44 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Clusters]]></category>
		<category><![CDATA[Future Tech]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2525</guid>
		<description><![CDATA[One more datapoint and we&#8217;ll have a trend: NoSQL databases managing metadata. It&#8217;s obvious in retrospect: use a scalable big data tool to handle scale-out metadata. Maybe not a requirement today, but surely will be with even bigger data tomorrow. Metadata is a fraction of the user data set, but it gets hammered much more. [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>One more datapoint and we&#8217;ll have a trend: NoSQL databases managing metadata. It&#8217;s obvious in retrospect: use a scalable big data tool to handle scale-out metadata. Maybe not a requirement today, but surely will be with even bigger data tomorrow.</p>
<p>Metadata is a fraction of the user data set, but it gets hammered much more. As more metadata is found useful the hammering will get more insistent.</p>
<p><strong>Nutanix</strong><br />
<a href="http://www.nutanix.com/" target="_blank">Nutanix</a>, whose CTO and co-founder, Mohit Aron, was a developer of the Google File System, uses MapReduce. Nutanix achieves it scale due to its distributed metadata, masterless architecture &#8211; powered by MapReduce jobs that run in the background.</p>
<p><strong>Druva</strong><br />
<a href="http://www.druva.com/" target="_blank">Druva</a>, a backup company for mobile devices, also uses a NoSQL database to manage storage metadata. They say they&#8217;ve found that NoSQL scales over an order of magnitude better than relational in similar applications.</p>
<p><strong>Somebody else</strong><br />
A company that shall remain nameless is porting Hadoop to their backend. The customer won&#8217;t be able to access Hadoop for their work &#8211; it is strictly for the system&#8217;s internal use.</p>
<p>It is a proof of concept so it isn&#8217;t a 3rd data point, but they see the potential advantages. Call it data point 2½. </p>
<p><strong>The StorageMojo take</strong><br />
Small advances are the building blocks of disruption. RAID made it possible to build available storage using cheap disks. Consumer adoption of PCs made disks even cheaper. Moore&#8217;s Law made RAID controllers cheaper and faster, or faster and more capable. </p>
<p>A virtuous circle of disruption.</p>
<p>The basic architecture of scale-out storage systems &#8211; purpose-built software on clustered commodity hardware &#8211; has been stable. But this is the beginning of scale-out storage 2.0: taking scale-out technology developed for users and incorporating it into the storage infrastructure itself.</p>
<p>These ideas are bubbling up among the latest startups and among the establishment players. At some point the old RAID architectures will be well and truly broken, able to compete in smaller and smaller niches until the revenue can&#8217;t justify more investment. </p>
<p>Of course vendors have been making RAID controllers out of servers for years now, and those servers can run any software they want. But at some point the explicit and implicit assumptions in the old architecture crash into current realities &#8211; either in cost, development time, feature completeness or management overhead &#8211; and then we move on.</p>
<p><strong>Courteous comments welcome, of course.</strong> I learned about Nutanix at the last <a href="http://techfieldday.com/" target="_blank">Tech Field Day</a> &#8220;The Independent IT Influencer Event&#8221; which paid for my travel expenses to Silicon Valley.</p>
<div class="twttr_button">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2011/10/03/nosql-in-the-metadata-engine-room/&text=NoSQL in the metadata engine room " target="_blank" title="Click here if you liked this article.">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2011/10/03/nosql-in-the-metadata-engine-room/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>StorageMojo at NAB &#8217;11</title>
		<link>http://storagemojo.com/2011/04/10/storagemojo-at-nab-11/</link>
		<comments>http://storagemojo.com/2011/04/10/storagemojo-at-nab-11/#comments</comments>
		<pubDate>Mon, 11 Apr 2011 03:52:12 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Future Tech]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2343</guid>
		<description><![CDATA[CES is a lot of fun, but my favorite toy trade show is the National Association of Broadcasters (NAB) convention. I arrive tomorrow and return Wednesday and hope &#8211; on the way back &#8211; to walk across the new bridge that is 900 feet above the Colorado River at Hoover Dam. I do video work [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>CES is a lot of fun, but my favorite <strike>toy</strike> trade show is the National Association of Broadcasters (NAB) convention. I arrive tomorrow and return Wednesday and hope &#8211; on the way back &#8211; to walk across the new bridge that is 900 feet above the Colorado River at Hoover Dam.</p>
<p>I do video work today and did FM broadcasting decades ago. Today we&#8217;re all narrowcasting, but digital has made the tech now so much better &#8211; and cheaper! &#8211; than what broadcasters had 10 years ago that the possibilities are endless.</p>
<p><strong>Pre-show expectations</strong><br />
Rumor has it that Apple will announce the newest version of Final Cut Studio &#8211; my preferred editing platform &#8211; on Tuesday. I&#8217;m also looking for any and all Thunderbolt peripherals, although the pre-show PR hasn&#8217;t mentioned it. I hope Promise and LaCie have something to show, and perhaps Sony as well.</p>
<p>Object storage should be more visible this year as well. And where will USB 3.0 show up &#8211; if it shows up anywhere &#8211; in pro gear?</p>
<p><strong>The StorageMojo take</strong><br />
There&#8217;s something about pro gear &#8211; even though I can&#8217;t afford 98% of it and wouldn&#8217;t use it to best advantage if I could &#8211; that fascinates. Built into all the features and specs is deep knowledge of the technology and its limitations. </p>
<p>There are people who spend $5k for a microphone to record a single instrument. Their ears can discern the differences between equally high-end mics and they know how to mix them to get the sound they want. </p>
<p>Then we listen to all that painstaking work through crummy little earbuds. Oh well!</p>
<p>If you have something to share please contact me through the comments. I&#8217;m looking for cool stuff.</p>
<p><strong>Courteous comments welcome, of course.</strong> </p>
<div class="twttr_button">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2011/04/10/storagemojo-at-nab-11/&text=StorageMojo at NAB '11" target="_blank" title="Click here if you liked this article.">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2011/04/10/storagemojo-at-nab-11/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>StorageMojo @Big Data NYC next week</title>
		<link>http://storagemojo.com/2011/03/15/storagemojo-big-data-nyc-next-week/</link>
		<comments>http://storagemojo.com/2011/03/15/storagemojo-big-data-nyc-next-week/#comments</comments>
		<pubDate>Tue, 15 Mar 2011 22:10:42 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Future Tech]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2329</guid>
		<description><![CDATA[The younger StorageMojo analysts have heard that New York is a den of sin and depravity and can&#8217;t wait to try some. With new Wrangler jeans and polished silver bolos they are ready to par-tay with some big city gals. Yee-ha! Should I tell them that it will be 95% male? Nah. The event is [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>The younger StorageMojo analysts have heard that New York is a den of sin and depravity and can&#8217;t wait to try some. With new Wrangler jeans and polished silver bolos they are ready to par-tay with some big city gals. Yee-ha!</p>
<p>Should I tell them that it will be 95% male? Nah. </p>
<p>The event is GigaOm&#8217;s <a href="http://bigdata2011vip.eventbrite.com/" target="_blank">Structure Big Data 2011</a> conference on Wednesday the 23rd. Haven&#8217;t been to it before, but I was overdue to see what Mr. Malik has cooked up.</p>
<p><strong>The StorageMojo take</strong><br />
With the gradual slowing of improvement in hardware and big plans for massive data collection, the world of storage is looking at accelerating change. Drive and CPU vendors won&#8217;t be doing all the heavy lifting. </p>
<p>That means that architecture will become even more critical to successful products. Lots of room for creativity and innovation because there is a ready audience that needs something better than what they have today.</p>
<p>Looking forward to meeting new people and learning about new technologies and markets. I&#8217;ll have some time Thursday morning too, if anyone is eager to bend my ear. Leave a comment with your preferred means of contact.</p>
<p><strong>Courteous comments welcome, of course.</strong>  </p>
<div class="twttr_button">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2011/03/15/storagemojo-big-data-nyc-next-week/&text=StorageMojo @Big Data NYC next week" target="_blank" title="Click here if you liked this article.">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2011/03/15/storagemojo-big-data-nyc-next-week/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Show StorageMojo some love</title>
		<link>http://storagemojo.com/2011/02/28/show-storagemojo-some-love-2/</link>
		<comments>http://storagemojo.com/2011/02/28/show-storagemojo-some-love-2/#comments</comments>
		<pubDate>Tue, 01 Mar 2011 06:22:52 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Future Tech]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2314</guid>
		<description><![CDATA[Update: The survey is now closed. Thanks to everyone who responded! I&#8217;ll have more later on the results. End update. StorageMojo would like your help. TechnoQWAN LLC, StorageMojo&#8217;s publisher, is a research and analysis firm. An IT supplier has retained us to help plan their internal storage strategy . You&#8217;ll be anonymous, unless you choose [...]]]></description>
			<content:encoded><![CDATA[<p></p><p><strong>Update:</strong> The survey is now closed. Thanks to everyone who responded!</p>
<p>I&#8217;ll have more later on the results. <strong>End update.</strong></p>
<p>StorageMojo would like your help. </p>
<p>TechnoQWAN LLC, StorageMojo&#8217;s publisher, is a research and analysis firm. An IT supplier has retained us to help plan their internal storage strategy . </p>
<p>You&#8217;ll be anonymous, unless you choose not to be. This is a research project, not a marketing Trojan horse.</p>
<p><strong>How you can help</strong><br />
Please donate 5 to 8 minutes to complete a survey. You&#8217;ll help keep StorageMojo the independent source of storage coolness it has always strived to be.</p>
<p>We&#8217;d like to get a couple of hundred respondents RSN. Can you do it now?</p>
<p>Here&#8217;s the link to the <a href="https://www.surveygizmo.com/s3/476397/ecdedba37108" target="_blank">Internal Storage Survey</a>.</p>
<p>And please pass the link on to likely friends and colleagues. The more the merrier!</p>
<p><strong>The StorageMojo take</strong><br />
After we&#8217;ve looked at the data I plan to write about what I&#8217;ve found interesting in the results. You&#8217;ll get to learn something about the rest of the StorageMojo community.</p>
<p>And I&#8217;ll get to learn some more about you.</p>
<p>If, perchance, you&#8217;re passionate about the topic, you&#8217;ll be able to volunteer for a more in-depth discussion. Trust me: the sponsor can make a real difference in the servers you buy in a couple of years. </p>
<p><strong>Courteous comments welcome, of course.</strong> Completed surveys even more so! BTW, QWAN = Quality Without A Name.</p>
<div class="twttr_button">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2011/02/28/show-storagemojo-some-love-2/&text=Show StorageMojo some love" target="_blank" title="Click here if you liked this article.">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2011/02/28/show-storagemojo-some-love-2/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Great work at FAST &#8217;11</title>
		<link>http://storagemojo.com/2011/02/17/great-work-at-fast-11/</link>
		<comments>http://storagemojo.com/2011/02/17/great-work-at-fast-11/#comments</comments>
		<pubDate>Fri, 18 Feb 2011 06:42:47 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Future Tech]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2296</guid>
		<description><![CDATA[After a quick scan of the paper titles I wasn&#8217;t impressed. But after seeing presentations and posters I am. Here&#8217;s some I found interesting. I&#8217;ll be posting longer pieces on some of these. A Study of Practical Deduplication Full paper *Best Paper Winner* Tradeoffs in Scalable Data Routing for Deduplication Clusters Full paper Exploiting Half-Wits: [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>After a quick scan of the paper titles I wasn&#8217;t impressed. But after seeing presentations and posters I am.</p>
<p>Here&#8217;s some I found interesting. I&#8217;ll be posting longer pieces on some of these.</p>
<ul>
<li><strong>A Study of Practical Deduplication <a href="http://www.usenix.org/events/fast11/tech/full_papers/Meyer.pdf">Full paper</a> <em>*Best Paper Winner*</em></strong></li>
<li><strong><strong>Tradeoffs in Scalable Data Routing for Deduplication Clusters <a href="http://www.usenix.org/events/fast11/tech/full_papers/Dong.pdf">Full paper</a></strong></strong></li>
<li><strong>Exploiting Half-Wits: Smarter Storage for Low-Power Devices <a href="http://www.usenix.org/events/fast11/tech/full_papers/Salajegheh.pdf">Full paper</a></strong></strong></li>
<li><strong><strong>Reliably Erasing Data from Flash-Based Solid State Drives <a href="http://www.usenix.org/events/fast11/tech/full_papers/Wei.pdf">Full paper</a></strong></strong></li>
<li><strong><strong>Scale and Concurrency of GIGA+: File System Directories with Millions of Files <a href="http://www.usenix.org/events/fast11/tech/full_papers/Patil.pdf">Full paper</a></strong></strong></li>
<li><strong><strong>Emulating Goliath Storage Systems with David <a href="http://www.usenix.org/events/fast11/tech/full_papers/Agrawal.pdf">Full paper</a> <em>*Best Paper Winner*</em></strong></strong></li>
</ul>
<p>An excellent conference. NetApp, EMC, Microsoft and IBM were recruiting.</p>
<p><strong>The StorageMojo take</strong><br />
We&#8217;re still learning about flash, and the research presented here is a substantial addition to our meager knowledge.</p>
<p>Microsoft tells me they&#8217;re delivering major improvements to NTFS and Windows Server later this year. I&#8217;m looking forward to that briefing.</p>
<p>And it&#8217;s always a pleasure catching up with the people who, for some reason, never come to Sedona.</p>
<p><strong>Courteous comments welcome, as always.</strong></p>
<div class="twttr_button">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2011/02/17/great-work-at-fast-11/&text=Great work at FAST '11" target="_blank" title="Click here if you liked this article.">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2011/02/17/great-work-at-fast-11/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>StorageMojo @FAST &#8217;11</title>
		<link>http://storagemojo.com/2011/02/11/storagemojo-fast-11/</link>
		<comments>http://storagemojo.com/2011/02/11/storagemojo-fast-11/#comments</comments>
		<pubDate>Fri, 11 Feb 2011 20:26:33 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Future Tech]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2290</guid>
		<description><![CDATA[It&#8217;s that time of the year again: the Usenix File And Storage Technology conference in San Jose next Tuesday thru Thursday. The elite StorageMojo analyst SWAT unit will be there, in color coordinated Kevlar and Spandex, rappelling into the Marriott ballroom. So come by and say hello. The StorageMojo take No obvious must-reads among the [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>It&#8217;s that time of the year again: the Usenix <a href="http://www.usenix.org/events/fast11/calendar.html" target="_blank">File And Storage Technology</a> conference in San Jose next Tuesday thru Thursday.</p>
<p>The elite StorageMojo analyst SWAT unit will be there, in color coordinated Kevlar and Spandex, rappelling into the Marriott ballroom. So come by and say hello. </p>
<p><strong>The StorageMojo take</strong><br />
No obvious must-reads among the papers this year, so we&#8217;ll have to wait and see if lightning strikes. </p>
<p><strong>Courteous comments welcome, of course.</strong> </p>
<div class="twttr_button">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2011/02/11/storagemojo-fast-11/&text=StorageMojo @FAST '11" target="_blank" title="Click here if you liked this article.">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2011/02/11/storagemojo-fast-11/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Will algorithms leap Moore&#8217;s Wall?</title>
		<link>http://storagemojo.com/2011/01/30/will-algorithms-leap-moores-wall/</link>
		<comments>http://storagemojo.com/2011/01/30/will-algorithms-leap-moores-wall/#comments</comments>
		<pubDate>Mon, 31 Jan 2011 00:23:49 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Future Tech]]></category>
		<category><![CDATA[Security & Public Policy]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2259</guid>
		<description><![CDATA[The performance increase in individual CPUs is slowing to a crawl. All the easy wins &#8211; higher clock speeds, wider datapaths, more DRAM, larger registers and caches, 2-4 cores &#8211; have been exploited. Doctor, is there any hope? In the recent PCAST report on Federal technological initiatives (see Fed funding for our digital future) (pdf) [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>The performance increase in individual CPUs is <a href="http://storagemojo.com/2010/11/29/moores-wall-the-end-of-moores-law/" target="_blank">slowing to a crawl</a>. All the easy wins &#8211; higher clock speeds, wider datapaths, more DRAM, larger registers and caches, 2-4 cores &#8211; have been exploited. </p>
<p><strong>Doctor, is there any hope?</strong><br />
In the recent PCAST report on Federal technological initiatives (see <a href="http://storagemojo.com/2010/12/28/fed-funding-for-our-digital-future/" target="_blank">Fed funding for our digital future</a>) (pdf) one sidebar suggested &#8220;Progress in Algorithms Beats Moore’s Law.&#8221;</p>
<blockquote><p>
. . . in many areas, performance gains due to improvements in algorithms have vastly exceeded even the dramatic performance gains due to increased processor speed.</p>
<p>The algorithms that we use today for speech recognition, for natural language translation, for chess playing, for logistics planning, have evolved remarkably in the past decade. It’s difficult to quantify the improvement, though, because it is as much in the realm of quality as of execution time.</p>
<p>In the field of numerical algorithms, however, the improvement can be quantified. Here is just one example . . . a benchmark production planning model solved using linear programming would have taken 82 years to solve in 1988, using the computers and the linear programming algorithms of the day. Fifteen years later – in 2003 – this same model could be solved in roughly 1 minute, an improvement by a factor of roughly 43 million. Of this, a factor of roughly 1,000 was due to increased processor speed, whereas a factor of roughly 43,000 was due to improvements in algorithms!
</p></blockquote>
<p><strong>The StorageMojo take</strong><br />
Let&#8217;s file this one under &#8220;Wishful thinking&#8221; along with &#8220;US housing prices will never decline.&#8221; Piecemeal enhancements of specific application areas cannot replace the generalized performance improvements we&#8217;ve seen for decades.</p>
<p>No doubt there are important algorithmic improvements to be made. And that in certain problem spaces those speedups will far exceed Moore&#8217;s Law &#8211; even though the Law is about transistor count, not performance.</p>
<p>That doesn&#8217;t change the fact of computation today: the era of predictable and rapid performance improvement is over. Like a vein of rich ore that thins out, our computers will still improve, but the effort needed to do so is rising fast.</p>
<p>Cheap(er) SSDs, larger memories and caches are helping mask the performance plateau by increasing system performance, but reduced I/O latency and increased bandwidth will only take us so far. The way forward is a game of wringing out single-digit percent improvements, not the 2-3 year doubling of the last 60 years.</p>
<p><strong>Courteous comments welcome, of course.</strong> The professor whose work the PCAST quote refers to is Martin Grötschel of Konrad-Zuse-Zentrum in Berlin. He&#8217;s been doing <a href="http://www.zib.de/groetschel/research/Musterbiblio.html" target="_blank">brilliant work on optimization problems</a>- including the traveling salesman problem and data network design &#8211; for decades. </p>
<div class="twttr_button">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2011/01/30/will-algorithms-leap-moores-wall/&text=Will algorithms leap Moore's Wall? " target="_blank" title="Click here if you liked this article.">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2011/01/30/will-algorithms-leap-moores-wall/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Hyder: a flash-based scale-out database</title>
		<link>http://storagemojo.com/2011/01/24/hyder-a-flash-based-scale-out-database/</link>
		<comments>http://storagemojo.com/2011/01/24/hyder-a-flash-based-scale-out-database/#comments</comments>
		<pubDate>Mon, 24 Jan 2011 07:36:35 +0000</pubDate>
		<dc:creator>Robin Harris</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Clusters]]></category>
		<category><![CDATA[Future Tech]]></category>
		<category><![CDATA[Information Management]]></category>
		<category><![CDATA[SSD/Flash Disk]]></category>

		<guid isPermaLink="false">http://storagemojo.com/?p=2239</guid>
		<description><![CDATA[Talked to a company last week whose cloud app handles several billion transactions per month on a cluster. Sounds like SSDs could help them but how? In a paper from the latest 5th Biennial Conference on Innovative Data Systems Research (CIDR &#8217;11) researchers Philip A. Bernstein and Colin W. Reid of Microsoft and Sudipto Das [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>Talked to a company last week whose cloud app handles several billion transactions per month on a cluster. Sounds like SSDs could help them but how?</p>
<p>In a paper from the latest <a href="http://www.cidrdb.org/cidr2011/" target="_blank">5th Biennial Conference on Innovative Data Systems Research</a> (CIDR &#8217;11) researchers Philip A. Bernstein and Colin W. Reid of Microsoft and Sudipto Das of UC Santa Barbara have a suggestion: <a href="http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper2.pdf" target="_blank">Hyder – A Transactional Record Manager for Shared Flash</a> (pdf).</p>
<p>As underlying hardware changes &#8211; faster networks, large memories, multi-core CPUs and SSDs &#8211; database software architectures may change too. <i>Hyder</i> architecture supports</p>
<blockquote><p>
. . . reads and writes on indexed records within classical multi-step transactions. It is designed to run on a cluster of servers that have shared access to a large pool of network-addressable raw flash chips. . . . Hyder uses a data-sharing architecture that scales out without partitioning the database or application.
</p></blockquote>
<p><strong>No partition scale-out</strong><br />
Today, most popular database clusters partition the database across multiple servers. Done well this works, but at some cost. The database design is non-trivial &#8211; cross-partition transactions, cache coherence, load balancing, scaling and multi-server debugging &#8211; are knotty issues which translate into higher design and operation costs.</p>
<p>Hyder eliminates partitioning, distributed programming, layers of cache, remote procedure calls and load balancing. All servers can read and write the entire database &#8211; so any server can execute any transaction. Load-balancing is simple: direct new transactions to lightly-loaded servers.</p>
<p>Each update transaction runs on one machine and writes to a shared log &#8211; so there&#8217;s no 2-phase commit. And no 2-phase <strike>commit</strike> locking, which can force performance off a cliff when workloads spike.</p>
<p>The 3 main components of Hyder are the <i>log</i>, the <i>index</i> and the <i>roll-forward algorithm</i>.</p>
<p><strong>Log</strong><br />
The log runs on multiple flash devices &#8211; chips, DIMMs or ??? &#8211; and writes multi-page log records across multiple devices with parity to enable log recovery after device failures.</p>
<p>Hyder uses a <i>multi-versioned</i> database &#8211; old record versions aren&#8217;t updated-in-place, only the most recent version of a record is used &#8211; which has a couple of useful properties:</p>
<ul>
<li>Server caches are inherently coherent since only the most recent versions of records are used.</li>
<li>Data can be read while writes are in progress.</li>
<li>Queries that can be decomposed can be run across multiple servers concurrently for a faster response time.</li>
</ul>
<p>[This may seem like voodoo to ACIDheads. A good technical intro to multi-versioning concurrency control (MVCC) is <a href="http://www.rtcmagazine.com/articles/view/101612" target="_blank">Multi-core software: to gain speed, eliminate resource contention</a>.]</p>
<p>Servers run a cache update process that keeps them current with updated records. Server caches don&#8217;t have to be identical and the cache invalidate messages that most clusters use for cache coherency aren&#8217;t needed.</p>
<p>All log writes are idempotent appends, so if a write fails the server can simply reissue the write. The authors describe several error modes and how Hyder handles them.</p>
<p><strong>Index</strong><br />
The index stores the database as a search tree with each node a [key, payload] pair. The tree can store, for example, a relational database. The index tree is also represented in the log.</p>
<p>Tree nodes are not updated in place. When node <i>n</i> is updated, a new copy &#8211; <i>n&#8217;</i>is created. Then, of course, the parent node must be updated and so on up the tree. </p>
<p>A binary tree minimizes the number of node updates, but can be processor intensive. The optimal tree structure for Hyder is not yet resolved.</p>
<p>Garbage collection is an issue. Each node pointer includes the ID of the oldest reachable data element. An element older than any that is pointed to by a node is garbage.</p>
<p><strong>Roll-forward algorithm</strong><br />
This is the key process of Hyder.</p>
<p>When a record update begins, one server executes the transaction. The server is given a copy of  the latest database root, a static snapshot of the entire database.</p>
<p>The updates are stored in a local cache and after execution the after-images are gathered into an <i>intention</i> record, which is broadcast to all servers and appended to the log. The update&#8217;s readset is included in the intention record, to insure all intentions are properly ordered, none are lost, and the offset is made known to all servers.</p>
<p>Each server can assemble a local copy of the tail of the log, which is used to determine if there are conflicting updates. The <i>meld</i> procedure manages conflicting updates.</p>
<p>Appending the intention to the database log doesn&#8217;t commit the transaction. The intention references the static snapshot of the latest database root. The meld procedure determines if any committed transactions since the snapshot conflict with the intention. </p>
<p>If they don&#8217;t, all is good. If they do, the transaction is aborted.</p>
<p>All servers roll forward using meld and don&#8217;t message each other about committed and failed transactions. Therefore there is no lock manager and no 2-phase commit.</p>
<p><strong>Contention</strong><br />
Losing the lock manager and 2-phase commit should help performance unless other points of contention throttle the system. Hyder&#8217;s points of contention include appending intentions to the log, melding the log at each server, and aborting transactions.</p>
<p>Intention appends are serial. The lower the write latency the more appends can be written. A 10us write latency means a 100k TPS.</p>
<p>Network latency adds to write latency. Faster switches improve append performance.</p>
<p>The abort rate depends on the number of concurrent transactions that conflict. Fast transactions reduce the probability of aborts by reducing the number of concurrent transactions. </p>
<p>The worst case is a record subject to multiple updates from different servers. Detecting high-conflict transactions and serializing them by forcing them onto 1 server would reduce the hot data performance hit.</p>
<p><strong>Performance</strong><br />
The authors model Hyder&#8217;s performance with a focus on the high-contention corner cases. In general, the tests show linear scaling as servers are added. </p>
<p>The problems come when the underlying hardware limits are exceeded. Increasing execution times mean more aborts and performance falls off a cliff. From the paper:</p>
<p><a href="http://storagemojo.com/wp-content/uploads//2011/01/hyder_thrashing.jpg"><img src="http://storagemojo.com/wp-content/uploads//2011/01/hyder_thrashing.jpg" alt="" title="hyder_thrashing" width="475" height="286" class="aligncenter size-full wp-image-2240" /></a></p>
<p><strong>The StorageMojo take</strong><br />
We&#8217;ve been building disk workarounds for for decades. We now tend to assume those workarounds are fundamental architectural requirements rather than hacks. </p>
<p>The <i>Hyder</i> paper asks us to imagine a world where non-volatile mass storage is fast and cheap &#8211; and how we could re-architect basic systems to be faster and cheaper too.</p>
<p>The authors conclusion is a fair assessment:</p>
<blockquote><p>
Many variations of the Hyder architecture and algorithms would be worth exploring. There may also be opportunities to use Hyder’s logging and meld algorithms with some modification in other contexts, such as file systems and middleware. We suggested a number of directions for future work throughout the paper. No doubt there are many other directions as well.
</p></blockquote>
<p><strong>Courteous comments welcome, of course.</strong> I hope to get to some of the other CIDR papers before <a href="" target="_blank">FAST &#8217;11</a> snows me under.  <strong>Update:</strong> Phil Bernstein was kind enough to scan the post and I&#8217;ve updated 1 minor error. He also mentioned that it won the Best Paper award at the conference. Those CIDR folks have great taste in papers, don&#8217;t they?</p>
<div class="twttr_button">
				<a href="http://twitter.com/share?url=http://storagemojo.com/2011/01/24/hyder-a-flash-based-scale-out-database/&text=Hyder: a flash-based scale-out database" target="_blank" title="Click here if you liked this article.">
					<img src="http://storagemojo.com/wp-content/plugins/twitter-plugin/images/twitt.gif" alt="Twitt" />
				</a>
			</div>]]></content:encoded>
			<wfw:commentRss>http://storagemojo.com/2011/01/24/hyder-a-flash-based-scale-out-database/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

