<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: A 1 petabyte science project</title>
	<atom:link href="http://storagemojo.com/2009/12/08/a-1-petabyte-science-project/feed/" rel="self" type="application/rss+xml" />
	<link>http://storagemojo.com/2009/12/08/a-1-petabyte-science-project/</link>
	<description>Data storage info &#38; analysis</description>
	<lastBuildDate>Tue, 07 Sep 2010 17:31:25 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
	<item>
		<title>By: Tim</title>
		<link>http://storagemojo.com/2009/12/08/a-1-petabyte-science-project/comment-page-1/#comment-209341</link>
		<dc:creator>Tim</dc:creator>
		<pubDate>Mon, 24 May 2010 14:48:10 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1726#comment-209341</guid>
		<description>might also want to have a look at http://openstoragepod.org -- &quot;Petascale storage for the rest of us!&quot;.</description>
		<content:encoded><![CDATA[<p>might also want to have a look at <a href="http://openstoragepod.org" rel="nofollow">http://openstoragepod.org</a> &#8212; &#8220;Petascale storage for the rest of us!&#8221;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Allen</title>
		<link>http://storagemojo.com/2009/12/08/a-1-petabyte-science-project/comment-page-1/#comment-209332</link>
		<dc:creator>Allen</dc:creator>
		<pubDate>Sat, 22 May 2010 23:28:06 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1726#comment-209332</guid>
		<description>Check out the Gravity stuff at Infiscale. They are the makers of a ton of free open source HPC and cloud stuff with links to systems they did that have several petabytes of storage. Met a couple of them at SC09 and they had a nice demo of their Perceus running at the Intel booth. Always partial to those that show us the source and give us code ;)</description>
		<content:encoded><![CDATA[<p>Check out the Gravity stuff at Infiscale. They are the makers of a ton of free open source HPC and cloud stuff with links to systems they did that have several petabytes of storage. Met a couple of them at SC09 and they had a nice demo of their Perceus running at the Intel booth. Always partial to those that show us the source and give us code <img src='http://storagemojo.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: A newbe</title>
		<link>http://storagemojo.com/2009/12/08/a-1-petabyte-science-project/comment-page-1/#comment-208401</link>
		<dc:creator>A newbe</dc:creator>
		<pubDate>Tue, 02 Mar 2010 10:19:57 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1726#comment-208401</guid>
		<description>Could this link possibly help?

http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/</description>
		<content:encoded><![CDATA[<p>Could this link possibly help?</p>
<p><a href="http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/" rel="nofollow">http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: KD Mann</title>
		<link>http://storagemojo.com/2009/12/08/a-1-petabyte-science-project/comment-page-1/#comment-208338</link>
		<dc:creator>KD Mann</dc:creator>
		<pubDate>Sat, 27 Feb 2010 00:17:15 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1726#comment-208338</guid>
		<description>Just a few quick thoughts here (though I&#039;m a couple months late)...

1) IBM&#039;s SONAS (Scale Out NAS): infinibanded HPC derived clusters with DDN storage on the back-end. Just saw these with my own eyes last week -- way fast, way scalable, way efficient, way cool

2) IBM&#039;s SAN Volume Controller; especially now that they&#039;re able to license per spindle instead of per terabyte. SVC&#039;s 380,000 IOPS in SPC-1 is almost twice as fast as anything else ever tested on spinning disks, and is even about 30% faster than the big SSD arrays recently tested by both IBM and TMS. All that, and you can even put your existing spindle-farm underneath it.

Finally -- wouldn&#039;t Isilon be DQ&#039;d here on performance? Isilon is all about cheap, massive capacity across a single namespace, but Isilon performance is not anywhere near the rest of the solutions discussed here.</description>
		<content:encoded><![CDATA[<p>Just a few quick thoughts here (though I&#8217;m a couple months late)&#8230;</p>
<p>1) IBM&#8217;s SONAS (Scale Out NAS): infinibanded HPC derived clusters with DDN storage on the back-end. Just saw these with my own eyes last week &#8212; way fast, way scalable, way efficient, way cool</p>
<p>2) IBM&#8217;s SAN Volume Controller; especially now that they&#8217;re able to license per spindle instead of per terabyte. SVC&#8217;s 380,000 IOPS in SPC-1 is almost twice as fast as anything else ever tested on spinning disks, and is even about 30% faster than the big SSD arrays recently tested by both IBM and TMS. All that, and you can even put your existing spindle-farm underneath it.</p>
<p>Finally &#8212; wouldn&#8217;t Isilon be DQ&#8217;d here on performance? Isilon is all about cheap, massive capacity across a single namespace, but Isilon performance is not anywhere near the rest of the solutions discussed here.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jacob</title>
		<link>http://storagemojo.com/2009/12/08/a-1-petabyte-science-project/comment-page-1/#comment-208157</link>
		<dc:creator>Jacob</dc:creator>
		<pubDate>Wed, 17 Feb 2010 01:08:43 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1726#comment-208157</guid>
		<description>I&#039;ve been doing a bunch of massive capacity projects with an archival file system from FileTek.  Its is not an HSM.  Its an archival file system that stores files on tape and caches them on disk.  Its not designed for crazy performance but it designed for enormous file count and data integrity.  In addition to archiving files it archives SQL databases too.  Very cool.   Next you need is a virtual file system:  IRODS, SRB, Nirvana,  maybe even Acopia to feed it.  Drop me  line if you want to references or want to learn more.</description>
		<content:encoded><![CDATA[<p>I&#8217;ve been doing a bunch of massive capacity projects with an archival file system from FileTek.  Its is not an HSM.  Its an archival file system that stores files on tape and caches them on disk.  Its not designed for crazy performance but it designed for enormous file count and data integrity.  In addition to archiving files it archives SQL databases too.  Very cool.   Next you need is a virtual file system:  IRODS, SRB, Nirvana,  maybe even Acopia to feed it.  Drop me  line if you want to references or want to learn more.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Francis Kim</title>
		<link>http://storagemojo.com/2009/12/08/a-1-petabyte-science-project/comment-page-1/#comment-207874</link>
		<dc:creator>Francis Kim</dc:creator>
		<pubDate>Sun, 31 Jan 2010 22:36:06 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1726#comment-207874</guid>
		<description>z,

I would definitely look at SSDs, since you have a mountain of metadata to churn through before you can get to your petabyte.  Of course you&#039;re talking to FIO.  They love science experiments.  Their ioDrive cards are the IOPS kings at the moment, as their pricing suggests.  One caveat.  FIO&#039;s model of &quot;an ioDrive in every server&quot; is going to be at odds with your existing environment running SAM-Q on Solaris.  Better to look at stuffing a box with a number of SSDs (disk form factor, PCIe, etc.), then present them out as storage target LUNs for your SAM-Q server to use for metadata store.  You want to disrupt your fragile HSM server as little as possible.  This way, you can remain flexible with respect to SSD adoption and take advantage of the SSD&#039;s reapidly falling price/(capacity:performance) curve.</description>
		<content:encoded><![CDATA[<p>z,</p>
<p>I would definitely look at SSDs, since you have a mountain of metadata to churn through before you can get to your petabyte.  Of course you&#8217;re talking to FIO.  They love science experiments.  Their ioDrive cards are the IOPS kings at the moment, as their pricing suggests.  One caveat.  FIO&#8217;s model of &#8220;an ioDrive in every server&#8221; is going to be at odds with your existing environment running SAM-Q on Solaris.  Better to look at stuffing a box with a number of SSDs (disk form factor, PCIe, etc.), then present them out as storage target LUNs for your SAM-Q server to use for metadata store.  You want to disrupt your fragile HSM server as little as possible.  This way, you can remain flexible with respect to SSD adoption and take advantage of the SSD&#8217;s reapidly falling price/(capacity:performance) curve.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kebabbert</title>
		<link>http://storagemojo.com/2009/12/08/a-1-petabyte-science-project/comment-page-1/#comment-207638</link>
		<dc:creator>Kebabbert</dc:creator>
		<pubDate>Sun, 17 Jan 2010 23:04:28 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1726#comment-207638</guid>
		<description>Nice solutions, but dont forget SILENT CORRUPTION. All solutions (except ZFS) are subject to Silent Corruption, where your data slowly get rotten bits, without the hardware even telling you this. If you value your data, do you want to avoid bit rot? What happens if your data suddenly get changed from an &quot;1&quot; to &quot;0&quot; - without the hardware informing you?

Big Physics Centre CERN did a study on this, and on 3000 hardware raid rack servers, they found 152 instances of bit rot, where the data was altered without the hardware even knowing this! The sysadmins didnt get noticed. CERN discovered this by using a program that wrote a known bit pattern and then compared to the expected result. 

All hardware solutions have some rudimentary protection against bit rot and silent corruption, but no one protects completely, except ZFS. Here is two more articles for you to read, if you want to learn more about CERN and bit rot (he concludes end-to-end checksums are needed, ordinary checksums will not do - he suggests ZFS)
https://indico.desy.de/contributionDisplay.py?contribId=65&amp;sessionId=42&amp;confId=257

ZFS is designed from scratch, to NEVER EVER trust the underlying hardware (cosmic radiation might flip a bit, power spike, bugs in BIOS, not really connected card slots, etc etc):
http://queue.acm.org/detail.cfm?id=1317400

In my opinion, ZFS protection against bit rot is THE main reason to use ZFS. Why use fast and unreliable storage? Better to use safe storage which guarantees that your bits are not altered. Read those links for more information.</description>
		<content:encoded><![CDATA[<p>Nice solutions, but dont forget SILENT CORRUPTION. All solutions (except ZFS) are subject to Silent Corruption, where your data slowly get rotten bits, without the hardware even telling you this. If you value your data, do you want to avoid bit rot? What happens if your data suddenly get changed from an &#8220;1&#8243; to &#8220;0&#8243; &#8211; without the hardware informing you?</p>
<p>Big Physics Centre CERN did a study on this, and on 3000 hardware raid rack servers, they found 152 instances of bit rot, where the data was altered without the hardware even knowing this! The sysadmins didnt get noticed. CERN discovered this by using a program that wrote a known bit pattern and then compared to the expected result. </p>
<p>All hardware solutions have some rudimentary protection against bit rot and silent corruption, but no one protects completely, except ZFS. Here is two more articles for you to read, if you want to learn more about CERN and bit rot (he concludes end-to-end checksums are needed, ordinary checksums will not do &#8211; he suggests ZFS)<br />
<a href="https://indico.desy.de/contributionDisplay.py?contribId=65&amp;sessionId=42&amp;confId=257" rel="nofollow">https://indico.desy.de/contributionDisplay.py?contribId=65&amp;sessionId=42&amp;confId=257</a></p>
<p>ZFS is designed from scratch, to NEVER EVER trust the underlying hardware (cosmic radiation might flip a bit, power spike, bugs in BIOS, not really connected card slots, etc etc):<br />
<a href="http://queue.acm.org/detail.cfm?id=1317400" rel="nofollow">http://queue.acm.org/detail.cfm?id=1317400</a></p>
<p>In my opinion, ZFS protection against bit rot is THE main reason to use ZFS. Why use fast and unreliable storage? Better to use safe storage which guarantees that your bits are not altered. Read those links for more information.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: NG</title>
		<link>http://storagemojo.com/2009/12/08/a-1-petabyte-science-project/comment-page-1/#comment-207371</link>
		<dc:creator>NG</dc:creator>
		<pubDate>Tue, 29 Dec 2009 13:45:27 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1726#comment-207371</guid>
		<description>If you want to keep HSM/tape in your configuration, consider something like Quantum&#039;s StorNext, its a SAN file system that can scale in performance with the addition of more nodes and has some very good references.  If you want to move away from tape and stay with all disk, take a look at object based storage solutions (this is all in addition to what has been mentioned) like Caringo, they have petabyte implementations and demonstrate performance.  

Finally, there are other ways you can reduce the cost of your infrastructure while using inexpensive SATA drives such as deduplication and compression.  There are two companies that do a really good job at this independent of the file based storage you have, Ocarina Networks and Storwize.  Ocarina is a post process for more static content and is able to optimize images along with text files and other precompressed files.  Storewize is an inline compression engine that is optimal with a variety of file excluding precompressed ones.  By using these technologies, you can reduce the footprint of data and the cost of the storage and its environment.

So to not to forget though....Open ZFS has been adopted by a few vendors who might be good for tier two including Greenbytes who added deduplication inline and Nexenta.  Might be an interesting option.</description>
		<content:encoded><![CDATA[<p>If you want to keep HSM/tape in your configuration, consider something like Quantum&#8217;s StorNext, its a SAN file system that can scale in performance with the addition of more nodes and has some very good references.  If you want to move away from tape and stay with all disk, take a look at object based storage solutions (this is all in addition to what has been mentioned) like Caringo, they have petabyte implementations and demonstrate performance.  </p>
<p>Finally, there are other ways you can reduce the cost of your infrastructure while using inexpensive SATA drives such as deduplication and compression.  There are two companies that do a really good job at this independent of the file based storage you have, Ocarina Networks and Storwize.  Ocarina is a post process for more static content and is able to optimize images along with text files and other precompressed files.  Storewize is an inline compression engine that is optimal with a variety of file excluding precompressed ones.  By using these technologies, you can reduce the footprint of data and the cost of the storage and its environment.</p>
<p>So to not to forget though&#8230;.Open ZFS has been adopted by a few vendors who might be good for tier two including Greenbytes who added deduplication inline and Nexenta.  Might be an interesting option.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: TimC</title>
		<link>http://storagemojo.com/2009/12/08/a-1-petabyte-science-project/comment-page-1/#comment-207336</link>
		<dc:creator>TimC</dc:creator>
		<pubDate>Sun, 27 Dec 2009 01:19:47 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1726#comment-207336</guid>
		<description>Just a note on your AMS2500 concerns.  The SAS disks in an AMS are mechanically identical to their FC brethren, they just have a different backplane interface.  You should actually see BETTER concurrency because they are a point-t0-point connection rather than the loop topology of FCAL.</description>
		<content:encoded><![CDATA[<p>Just a note on your AMS2500 concerns.  The SAS disks in an AMS are mechanically identical to their FC brethren, they just have a different backplane interface.  You should actually see BETTER concurrency because they are a point-t0-point connection rather than the loop topology of FCAL.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joe Landman</title>
		<link>http://storagemojo.com/2009/12/08/a-1-petabyte-science-project/comment-page-1/#comment-207286</link>
		<dc:creator>Joe Landman</dc:creator>
		<pubDate>Mon, 21 Dec 2009 17:20:15 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1726#comment-207286</guid>
		<description>Coming late with a response ... we&#039;ve had a busy month/quarter/year ...

First off, we are a vendor.  We build very nice, dense, and fast storage boxen/targets/storage clusters and systems for this sort of work. We deliver and sustain data rates that are quite good (e.g. non-marketing numbers, real, end-user-repeatable results).  We have units in production that easily supply more than 10GB/s to large HPC cluster systems, including for informatics analysis applications.

The issues I see being requested to be addressed are 1)  existing meta data servers are barely able to keep up with load, 2) a need to expand the capacity (increase the density?) .  I don&#039;t see the tape as an issue, event though it is listed as a problem.

Other comments seem to suggest replacing the existing infrastructure  with something new.  Choose your flavor and go forth or something like this.

If the issue is to solve #1 and #2, this is easy to do without replacing much.  The question I would have is how this would scale going forward.  If you see going from 1PB to 10PB as problematic, then a new architecture is probably not a bad idea.  More on that in a moment.

Question #2 is &quot;solvable&quot; by keeping the cost of additional nodes down.  This may be at odds with some vendor solutions (not ours).  

Question #1 is &quot;solvable&quot; by replacing the MDS in the current design with something faster.  We just benchmarked one of our JackRabbit-Flash units, with 1k random reads against 256GB of data at a sustained 180k IOPs.  This isn&#039;t a terribly expensive unit, and its flash drives obviously run circles around the 15k RPM drives.  You can&#039;t repeal physics; mechanics will not be as fast as electronics in most cases that I am aware of.

Ok.  Onto design issues.  If as you scale up, the MDS is only going to get worse (as all centralized designs will), then replacing it provides only a bandaid over the issue, and avoids solving the real problem, that of good design. #2 isn&#039;t affected as much as #1 on the design side.  Bulk data storage should be lower cost and fast.  But, if you have a single point of information flow in your scale out process, your design will eventually fall over.

So if you do plan to scale up well beyond 1PB, the centralized MDS has got to go (and any design that utilizes a centralized MDS is likely to have the same issues during scale up).  Here things like Gluster (which we sell/support/integrate into our offerings) and a few others make a great deal of sense.  You scale up as you need, with reasonable economics.

Feel free to ping me on/offline if you need to talk about these designs.  Basically, if you are not trashing your existing infrastructure, you need to have a clear conception of how much higher it can scale, and whether or not an SSD replacement will help your MDS for your planned future.  If you really do need to scale up/out, our siCluster  (info to appear soon at http://scalableinformatics.com/sicluster) product is certainly one worthy of consideration, providing some of the best end user achievable  scale-out performance we have seen on customer applications to date.</description>
		<content:encoded><![CDATA[<p>Coming late with a response &#8230; we&#8217;ve had a busy month/quarter/year &#8230;</p>
<p>First off, we are a vendor.  We build very nice, dense, and fast storage boxen/targets/storage clusters and systems for this sort of work. We deliver and sustain data rates that are quite good (e.g. non-marketing numbers, real, end-user-repeatable results).  We have units in production that easily supply more than 10GB/s to large HPC cluster systems, including for informatics analysis applications.</p>
<p>The issues I see being requested to be addressed are 1)  existing meta data servers are barely able to keep up with load, 2) a need to expand the capacity (increase the density?) .  I don&#8217;t see the tape as an issue, event though it is listed as a problem.</p>
<p>Other comments seem to suggest replacing the existing infrastructure  with something new.  Choose your flavor and go forth or something like this.</p>
<p>If the issue is to solve #1 and #2, this is easy to do without replacing much.  The question I would have is how this would scale going forward.  If you see going from 1PB to 10PB as problematic, then a new architecture is probably not a bad idea.  More on that in a moment.</p>
<p>Question #2 is &#8220;solvable&#8221; by keeping the cost of additional nodes down.  This may be at odds with some vendor solutions (not ours).  </p>
<p>Question #1 is &#8220;solvable&#8221; by replacing the MDS in the current design with something faster.  We just benchmarked one of our JackRabbit-Flash units, with 1k random reads against 256GB of data at a sustained 180k IOPs.  This isn&#8217;t a terribly expensive unit, and its flash drives obviously run circles around the 15k RPM drives.  You can&#8217;t repeal physics; mechanics will not be as fast as electronics in most cases that I am aware of.</p>
<p>Ok.  Onto design issues.  If as you scale up, the MDS is only going to get worse (as all centralized designs will), then replacing it provides only a bandaid over the issue, and avoids solving the real problem, that of good design. #2 isn&#8217;t affected as much as #1 on the design side.  Bulk data storage should be lower cost and fast.  But, if you have a single point of information flow in your scale out process, your design will eventually fall over.</p>
<p>So if you do plan to scale up well beyond 1PB, the centralized MDS has got to go (and any design that utilizes a centralized MDS is likely to have the same issues during scale up).  Here things like Gluster (which we sell/support/integrate into our offerings) and a few others make a great deal of sense.  You scale up as you need, with reasonable economics.</p>
<p>Feel free to ping me on/offline if you need to talk about these designs.  Basically, if you are not trashing your existing infrastructure, you need to have a clear conception of how much higher it can scale, and whether or not an SSD replacement will help your MDS for your planned future.  If you really do need to scale up/out, our siCluster  (info to appear soon at <a href="http://scalableinformatics.com/sicluster)" rel="nofollow">http://scalableinformatics.com/sicluster)</a> product is certainly one worthy of consideration, providing some of the best end user achievable  scale-out performance we have seen on customer applications to date.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dave Brown</title>
		<link>http://storagemojo.com/2009/12/08/a-1-petabyte-science-project/comment-page-1/#comment-207202</link>
		<dc:creator>Dave Brown</dc:creator>
		<pubDate>Wed, 16 Dec 2009 20:05:29 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1726#comment-207202</guid>
		<description>You mention not wanting to get connected to the hip, or hip pocket, to a large storage vendor.  Consider complimentary technology to your existing SAM-FS software technology from DataCore Software.  DataCore is coming up on being 12 years old in the industry now and has been an innovator in the industry.  DataCore was the inventor of the thin provisioning technology and continues to lead with things like the ability to have up to 1TB of cache in a given storage controller and was the first 8Gb FC target on the market.  DataCore also just released the ability for very large volume support today.

As Rob of Xiotech mentioned their ISE 5000 is an awesome array, and adding some DataCore SANsymphony to complement it you&#039;re increasing your performance even more.  If you want to add SSD, you can do that today, just connect up SSDs from any of the vendors to your commodity server hardware running Intel or AMD and you can use things like STEC&#039;s 3Gb or 6Gb SAS connected SSDs, or their 4Gb FC SSD.  Pliant just released some and Intel has them as well.  With DataCore, you can pool all those and any other disk type from any vendor into as many pools and tiers as you&#039;d like.

Just as the CPU has evolved over the years from vacuum tubes to silicon to a small internal cache to now three layers of cache, look at storage in the same way with DataCore&#039;s software being your fastest Layer 1 cache.  The software will not slow things down but speed them up, typically taking up to an order of magnitude of I/O latency off over a normal cached array controller.  There are many more features and capabilities of the software than I&#039;ll write about here although if you&#039;re interested in finding a solution that offers you the scalability, performance, ability to fit into your exiting storage environment (DataCore can present storage to any Open System host) and doesn&#039;t tie you to a monolithic stack of storage, you should look at DataCore.

Dave Brown
dave.brown@datacore.com</description>
		<content:encoded><![CDATA[<p>You mention not wanting to get connected to the hip, or hip pocket, to a large storage vendor.  Consider complimentary technology to your existing SAM-FS software technology from DataCore Software.  DataCore is coming up on being 12 years old in the industry now and has been an innovator in the industry.  DataCore was the inventor of the thin provisioning technology and continues to lead with things like the ability to have up to 1TB of cache in a given storage controller and was the first 8Gb FC target on the market.  DataCore also just released the ability for very large volume support today.</p>
<p>As Rob of Xiotech mentioned their ISE 5000 is an awesome array, and adding some DataCore SANsymphony to complement it you&#8217;re increasing your performance even more.  If you want to add SSD, you can do that today, just connect up SSDs from any of the vendors to your commodity server hardware running Intel or AMD and you can use things like STEC&#8217;s 3Gb or 6Gb SAS connected SSDs, or their 4Gb FC SSD.  Pliant just released some and Intel has them as well.  With DataCore, you can pool all those and any other disk type from any vendor into as many pools and tiers as you&#8217;d like.</p>
<p>Just as the CPU has evolved over the years from vacuum tubes to silicon to a small internal cache to now three layers of cache, look at storage in the same way with DataCore&#8217;s software being your fastest Layer 1 cache.  The software will not slow things down but speed them up, typically taking up to an order of magnitude of I/O latency off over a normal cached array controller.  There are many more features and capabilities of the software than I&#8217;ll write about here although if you&#8217;re interested in finding a solution that offers you the scalability, performance, ability to fit into your exiting storage environment (DataCore can present storage to any Open System host) and doesn&#8217;t tie you to a monolithic stack of storage, you should look at DataCore.</p>
<p>Dave Brown<br />
<a href="mailto:dave.brown@datacore.com">dave.brown@datacore.com</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: paul</title>
		<link>http://storagemojo.com/2009/12/08/a-1-petabyte-science-project/comment-page-1/#comment-207180</link>
		<dc:creator>paul</dc:creator>
		<pubDate>Tue, 15 Dec 2009 22:25:10 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1726#comment-207180</guid>
		<description>The Ocarina post prompted me to point out that the 

IBM Information Archive appliance supports both compression and deduplication of files archived.   This provides from 20-80% reduction in storage size.  The compression and deduplication is performed either at the client or upon ingest to the archive through up to (3) 8-core IBM System X servers, providing plenty of horsepower for the job.

Paul Hewitt
IBM Data Archive Institute</description>
		<content:encoded><![CDATA[<p>The Ocarina post prompted me to point out that the </p>
<p>IBM Information Archive appliance supports both compression and deduplication of files archived.   This provides from 20-80% reduction in storage size.  The compression and deduplication is performed either at the client or upon ingest to the archive through up to (3) 8-core IBM System X servers, providing plenty of horsepower for the job.</p>
<p>Paul Hewitt<br />
IBM Data Archive Institute</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Carter George</title>
		<link>http://storagemojo.com/2009/12/08/a-1-petabyte-science-project/comment-page-1/#comment-207179</link>
		<dc:creator>Carter George</dc:creator>
		<pubDate>Tue, 15 Dec 2009 22:00:09 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1726#comment-207179</guid>
		<description>Most of the responses here have focused on your problems in the fast tier.  If I understand correctly, you’re not looking for a new file system, but for storage to make the different tiers go faster with your existing SAM-FS infrastructure.      

Having gone to the effort of implementing an HSM deployment (easier said than done), one way to take advantage of that is to optimize your tier 2.  You can get more cost-savings than just using cheap SATA disk.  For scientific data sets – such as next-gen sequencing, mass spectrometry, images that come off of many types of instrument – it is possible to integrate content-aware compression transparently in the second (SATA) tier.      

This kind of compression recognizes specific file types and can get up to 75% compression at close to wire speed performance. This won’t solve your IOPS problem for the metdata slices, but saving 75% for a petabyte of storage at that tier could help you find the money to buy the cool stuff for tier one.  If your data looks like TIFFs, SAM/BAM/SRF for genomics, or any other scientific image or coded data set, this would be worth looking in to.  If it’s just alphanumeric, then generic compressors (such as those in ZFS) could be turned on.

As it happens, we do have some experience with the AMS 2500. I can’t compare it directly to the STK models, which I believe are made for Sun by LSI. We’ve found the HDS a bit difficult to get configured and ordered, but once it is in place, it’s rock solid.  Super highly available, no disparity between vendor performance claims and actual performance, and very good at using intelligent cache to get that claimed 900,000 IOPS out of an array with SAS drives.  (We’ve found it’s possible to get just as much performance with the 15K rpm SAS as with Fibre Channel drives.)

Cache is important in the HDS scheme, so you’d want to get the full cache size on offer. Finally, I am a bit surprised that HDS has not qualified any SAS form factor SSD’s in this array yet, as it would be a natural thing to do. SSD would be a good fit for high IOPS to a read-mostly metadata slice.

Although we have not used the AMS with SAM-FS, we have used it with multiple cluster file systems (Ibrix, PolyServe, Lustre) to good effect.

Carter George, VP Products, Ocarina Networks</description>
		<content:encoded><![CDATA[<p>Most of the responses here have focused on your problems in the fast tier.  If I understand correctly, you’re not looking for a new file system, but for storage to make the different tiers go faster with your existing SAM-FS infrastructure.      </p>
<p>Having gone to the effort of implementing an HSM deployment (easier said than done), one way to take advantage of that is to optimize your tier 2.  You can get more cost-savings than just using cheap SATA disk.  For scientific data sets – such as next-gen sequencing, mass spectrometry, images that come off of many types of instrument – it is possible to integrate content-aware compression transparently in the second (SATA) tier.      </p>
<p>This kind of compression recognizes specific file types and can get up to 75% compression at close to wire speed performance. This won’t solve your IOPS problem for the metdata slices, but saving 75% for a petabyte of storage at that tier could help you find the money to buy the cool stuff for tier one.  If your data looks like TIFFs, SAM/BAM/SRF for genomics, or any other scientific image or coded data set, this would be worth looking in to.  If it’s just alphanumeric, then generic compressors (such as those in ZFS) could be turned on.</p>
<p>As it happens, we do have some experience with the AMS 2500. I can’t compare it directly to the STK models, which I believe are made for Sun by LSI. We’ve found the HDS a bit difficult to get configured and ordered, but once it is in place, it’s rock solid.  Super highly available, no disparity between vendor performance claims and actual performance, and very good at using intelligent cache to get that claimed 900,000 IOPS out of an array with SAS drives.  (We’ve found it’s possible to get just as much performance with the 15K rpm SAS as with Fibre Channel drives.)</p>
<p>Cache is important in the HDS scheme, so you’d want to get the full cache size on offer. Finally, I am a bit surprised that HDS has not qualified any SAS form factor SSD’s in this array yet, as it would be a natural thing to do. SSD would be a good fit for high IOPS to a read-mostly metadata slice.</p>
<p>Although we have not used the AMS with SAM-FS, we have used it with multiple cluster file systems (Ibrix, PolyServe, Lustre) to good effect.</p>
<p>Carter George, VP Products, Ocarina Networks</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: paul</title>
		<link>http://storagemojo.com/2009/12/08/a-1-petabyte-science-project/comment-page-1/#comment-207176</link>
		<dc:creator>paul</dc:creator>
		<pubDate>Tue, 15 Dec 2009 20:03:14 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1726#comment-207176</guid>
		<description>Z
After reading all these point solutions from vendors, I would encourage you to evaluate the IBM Smart Archive solution suite and the Information Archive platform.  

http://www-304.ibm.com/jct01003c/software/data/smart-archive/

This is a next generation archiving solution that delivers on the promise of &quot;ILM&quot; with software integration  to almost any structured or unstructured data/application in the enterprise.  The infrastructure consists of IBM 2U servers which FC connect to either the IBM TruMAID (Massive Array of Idle Disk) storage and/or to your tape libraries.  The IA pools could look to SAM-FS as another tier, but with much lower TCO than any of these other scale out NAS or internal cloud solutions.  

TruMAID provides 179TBytes per square foot using 2TB SATA drives, and a full cabinet (10 sq. ft.) with 1.79PBytes requires only 5.5kW maximum power.   

The IBM Information Archive is a complete software and hardware solution that would integrate well into your existing SAM-FS environment, giving you full protected archive features for compliance and eDiscovery requirements, and provide a very compelling TCO vs. standard NAS or cloud storage.

I am the IBM Data Archive Institute storage consultant for the western U.S.  Send me your contact information if interested and let&#039;s schedule a meeting to explore further.</description>
		<content:encoded><![CDATA[<p>Z<br />
After reading all these point solutions from vendors, I would encourage you to evaluate the IBM Smart Archive solution suite and the Information Archive platform.  </p>
<p><a href="http://www-304.ibm.com/jct01003c/software/data/smart-archive/" rel="nofollow">http://www-304.ibm.com/jct01003c/software/data/smart-archive/</a></p>
<p>This is a next generation archiving solution that delivers on the promise of &#8220;ILM&#8221; with software integration  to almost any structured or unstructured data/application in the enterprise.  The infrastructure consists of IBM 2U servers which FC connect to either the IBM TruMAID (Massive Array of Idle Disk) storage and/or to your tape libraries.  The IA pools could look to SAM-FS as another tier, but with much lower TCO than any of these other scale out NAS or internal cloud solutions.  </p>
<p>TruMAID provides 179TBytes per square foot using 2TB SATA drives, and a full cabinet (10 sq. ft.) with 1.79PBytes requires only 5.5kW maximum power.   </p>
<p>The IBM Information Archive is a complete software and hardware solution that would integrate well into your existing SAM-FS environment, giving you full protected archive features for compliance and eDiscovery requirements, and provide a very compelling TCO vs. standard NAS or cloud storage.</p>
<p>I am the IBM Data Archive Institute storage consultant for the western U.S.  Send me your contact information if interested and let&#8217;s schedule a meeting to explore further.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
