<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Building a 1.8 exabyte data center</title>
	<atom:link href="http://storagemojo.com/2008/10/12/building-a-18-exabyte-data-center/feed/" rel="self" type="application/rss+xml" />
	<link>http://storagemojo.com/2008/10/12/building-a-18-exabyte-data-center/</link>
	<description>Data storage info &#38; analysis</description>
	<lastBuildDate>Fri, 12 Mar 2010 00:35:33 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: joe</title>
		<link>http://storagemojo.com/2008/10/12/building-a-18-exabyte-data-center/comment-page-1/#comment-198748</link>
		<dc:creator>joe</dc:creator>
		<pubDate>Thu, 04 Dec 2008 00:55:52 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=973#comment-198748</guid>
		<description>Is this a joke? When you say &quot;I dont know what the boxes would cost but it would be at least several times the racks.&quot;, are you budgeting $1000 for 10 Sunfire 4540 boxes? These numbers are way, way wrong.</description>
		<content:encoded><![CDATA[<p>Is this a joke? When you say &#8220;I dont know what the boxes would cost but it would be at least several times the racks.&#8221;, are you budgeting $1000 for 10 Sunfire 4540 boxes? These numbers are way, way wrong.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bill Mottram</title>
		<link>http://storagemojo.com/2008/10/12/building-a-18-exabyte-data-center/comment-page-1/#comment-198377</link>
		<dc:creator>Bill Mottram</dc:creator>
		<pubDate>Sat, 08 Nov 2008 18:12:16 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=973#comment-198377</guid>
		<description>There is a limit to how many Sun fire 4540 can be mounted in one rack. The limit I understand is 4. This will increase the rack count significantly.</description>
		<content:encoded><![CDATA[<p>There is a limit to how many Sun fire 4540 can be mounted in one rack. The limit I understand is 4. This will increase the rack count significantly.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joe Kraska</title>
		<link>http://storagemojo.com/2008/10/12/building-a-18-exabyte-data-center/comment-page-1/#comment-198246</link>
		<dc:creator>Joe Kraska</dc:creator>
		<pubDate>Fri, 31 Oct 2008 23:01:14 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=973#comment-198246</guid>
		<description>I think it&#039;s 1.6 racks per PB, formatted and usable on the DDN storage. They also have some nice power savings capabilities. Their RAID-6 rebuild on the fly and real time capability is, however, what I find most exemplary about DDN. You basically never even really need to know that RAID rebuild is going on.

--Joe.</description>
		<content:encoded><![CDATA[<p>I think it&#8217;s 1.6 racks per PB, formatted and usable on the DDN storage. They also have some nice power savings capabilities. Their RAID-6 rebuild on the fly and real time capability is, however, what I find most exemplary about DDN. You basically never even really need to know that RAID rebuild is going on.</p>
<p>&#8211;Joe.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Emmanuel Florac</title>
		<link>http://storagemojo.com/2008/10/12/building-a-18-exabyte-data-center/comment-page-1/#comment-198211</link>
		<dc:creator>Emmanuel Florac</dc:creator>
		<pubDate>Thu, 30 Oct 2008 11:51:53 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=973#comment-198211</guid>
		<description>The new Datadirect 6620 may be a step in the good direction : 60 drives per 4U enclosure, with sophisticated power management ( automatic drive spindown). They actually seem to sell the setup S2A9900+6620 enclosure with 1TB drives, it&#039;s 1.2PB per 2 racks with 2 RAID controllers, something like 1PiB available storage (with parity and spare) with an average power consumption of 36.6kW. You&#039;ll have to had some servers to that setup, however.</description>
		<content:encoded><![CDATA[<p>The new Datadirect 6620 may be a step in the good direction : 60 drives per 4U enclosure, with sophisticated power management ( automatic drive spindown). They actually seem to sell the setup S2A9900+6620 enclosure with 1TB drives, it&#8217;s 1.2PB per 2 racks with 2 RAID controllers, something like 1PiB available storage (with parity and spare) with an average power consumption of 36.6kW. You&#8217;ll have to had some servers to that setup, however.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joe Kraska</title>
		<link>http://storagemojo.com/2008/10/12/building-a-18-exabyte-data-center/comment-page-1/#comment-198165</link>
		<dc:creator>Joe Kraska</dc:creator>
		<pubDate>Sat, 25 Oct 2008 17:37:46 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=973#comment-198165</guid>
		<description>The exabyte data center is not far off from being real. 

We have been working a coming challenge problem for a three letter agency involving ingest and store rates of 1.5PB/day. That&#039;s about 17GB/s sustained write, 24/7/365. Individual streams can come in at 1+ GB/s.

The real humdinger of it all is that the customer prefers disk and not tape for all storage.

Neither dedup nor compression is possible with these data types (they are not duplicative, and they are already compressed).

--Joe</description>
		<content:encoded><![CDATA[<p>The exabyte data center is not far off from being real. </p>
<p>We have been working a coming challenge problem for a three letter agency involving ingest and store rates of 1.5PB/day. That&#8217;s about 17GB/s sustained write, 24/7/365. Individual streams can come in at 1+ GB/s.</p>
<p>The real humdinger of it all is that the customer prefers disk and not tape for all storage.</p>
<p>Neither dedup nor compression is possible with these data types (they are not duplicative, and they are already compressed).</p>
<p>&#8211;Joe</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joe</title>
		<link>http://storagemojo.com/2008/10/12/building-a-18-exabyte-data-center/comment-page-1/#comment-198163</link>
		<dc:creator>Joe</dc:creator>
		<pubDate>Sat, 25 Oct 2008 17:11:10 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=973#comment-198163</guid>
		<description>I didn&#039;t see this post till today.  It is interesting given what occurred at a small regional event we went to.

We had a booth at a small event (Ohio Linux Fest: ... hey we sold a JackRabbit to an attendee from that last year, so we were hoping to replicate that success).  You can see your humble contributer at some of the pictures &lt;a href=&quot;http://scalability.org/?p=835&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt; .

The interesting thing was, during our booth time, someone came up to me and started grilling me on JackRabbit and &#916;V capability.  While Sun and others were there, he didn&#039;t quite look like your typical G2-gatherer.

He finally caved, indicating he was looking to build an exabyte sized data center.  I gave him a rough estimate of the number of units of anyone&#039;s storage, power consumption, costs, etc.  Not that far off Robin&#039;s.  

I had chalked this up to a somewhat ... eccentric ... person asking if it were possible.  I doubt this as a person from a three-letter-agency (though curiously we are getting more hits on the sites from the &quot;Maryland procurement office&quot; these days ... hmmmm).

Now Robin, you are giving me pause to reflect upon this conversation.  They asked me similar questions.  If that person is the same one who asked you about this, and they are lurking, I am curious as to how serious they were.   Intrigued, not from a vendor perspective, but from a management perspective.  As you scale up the number of parts, your probability of failure over some interval approaches an asymptotic limit of 1.  Which means that no matter what you do, you will always have to contend with some aspect of a failed part.   This is IMO far more important than other considerations mentioned.  Some mentioned de-dup as a technology to use for this, though this begs the question as to why they think that 90 GB data sets would have duplication in them?  I would imagine that run length encoded compression may be more beneficial and far faster than de-dup. 

More to the point, I didn&#039;t get a sense from my questioner what their data sets were.  I don&#039;t know if Robin got that.  If this is imagery, and they want do avoid doing lossless compression, RLE and other lossless techniques could help, at the cost of processing power.   If this is genomic or similar data, you have other techniques.  De-dup doesn&#039;t quite factor into these.

It seems to me that the phased build out approach would make the most sense.  Moreover, someone suggested slotting in and out x4500&#039;s (ok, we would prefer &lt;a href=&quot;http://jackrabbit.scalableinformatics.com&quot; rel=&quot;nofollow&quot;&gt; JackRabbits&lt;/a&gt;.   This may be feasible, though rather than adapt the robotics to handle that, mount the units vertically, and use the robotic mechanisms to slot in and out drives into the chassis.  

The large tape storage folks could do this.  Then the question of how to have some sort of file system handle this.  You would need to envision some sort or large cache file system for handling inbound data, some sort of distributed meta-data mapper (standard meta-data plus a directory of where the data lives).  Rather than de-dup, you would want to either dup or code the data to handle drive lossage.  

Would be interesting to talk about the tech behind this.</description>
		<content:encoded><![CDATA[<p>I didn&#8217;t see this post till today.  It is interesting given what occurred at a small regional event we went to.</p>
<p>We had a booth at a small event (Ohio Linux Fest: &#8230; hey we sold a JackRabbit to an attendee from that last year, so we were hoping to replicate that success).  You can see your humble contributer at some of the pictures <a href="http://scalability.org/?p=835" rel="nofollow">here</a> .</p>
<p>The interesting thing was, during our booth time, someone came up to me and started grilling me on JackRabbit and &Delta;V capability.  While Sun and others were there, he didn&#8217;t quite look like your typical G2-gatherer.</p>
<p>He finally caved, indicating he was looking to build an exabyte sized data center.  I gave him a rough estimate of the number of units of anyone&#8217;s storage, power consumption, costs, etc.  Not that far off Robin&#8217;s.  </p>
<p>I had chalked this up to a somewhat &#8230; eccentric &#8230; person asking if it were possible.  I doubt this as a person from a three-letter-agency (though curiously we are getting more hits on the sites from the &#8220;Maryland procurement office&#8221; these days &#8230; hmmmm).</p>
<p>Now Robin, you are giving me pause to reflect upon this conversation.  They asked me similar questions.  If that person is the same one who asked you about this, and they are lurking, I am curious as to how serious they were.   Intrigued, not from a vendor perspective, but from a management perspective.  As you scale up the number of parts, your probability of failure over some interval approaches an asymptotic limit of 1.  Which means that no matter what you do, you will always have to contend with some aspect of a failed part.   This is IMO far more important than other considerations mentioned.  Some mentioned de-dup as a technology to use for this, though this begs the question as to why they think that 90 GB data sets would have duplication in them?  I would imagine that run length encoded compression may be more beneficial and far faster than de-dup. </p>
<p>More to the point, I didn&#8217;t get a sense from my questioner what their data sets were.  I don&#8217;t know if Robin got that.  If this is imagery, and they want do avoid doing lossless compression, RLE and other lossless techniques could help, at the cost of processing power.   If this is genomic or similar data, you have other techniques.  De-dup doesn&#8217;t quite factor into these.</p>
<p>It seems to me that the phased build out approach would make the most sense.  Moreover, someone suggested slotting in and out x4500&#8217;s (ok, we would prefer <a href="http://jackrabbit.scalableinformatics.com" rel="nofollow"> JackRabbits</a>.   This may be feasible, though rather than adapt the robotics to handle that, mount the units vertically, and use the robotic mechanisms to slot in and out drives into the chassis.  </p>
<p>The large tape storage folks could do this.  Then the question of how to have some sort of file system handle this.  You would need to envision some sort or large cache file system for handling inbound data, some sort of distributed meta-data mapper (standard meta-data plus a directory of where the data lives).  Rather than de-dup, you would want to either dup or code the data to handle drive lossage.  </p>
<p>Would be interesting to talk about the tech behind this.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nicolai Plum</title>
		<link>http://storagemojo.com/2008/10/12/building-a-18-exabyte-data-center/comment-page-1/#comment-198045</link>
		<dc:creator>Nicolai Plum</dc:creator>
		<pubDate>Sat, 18 Oct 2008 20:56:58 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=973#comment-198045</guid>
		<description>About the &quot;OSHA won&#039;t let you put ten X45xx in a rack&quot;: the weight of the J4500 is specified at 77kg. Most racks and the datacentre floors they sit on are rated for something like 400kg including the weight of the rack, PDU, etc. I expect that Sun&#039;s 4 boxes-in-the-rack figure is based on what&#039;s generally achievable in most datacentres today.
Sure, you could get racks that will hold 800kg, and floors to hold up the rack-and-storage combo that weighs in at about 1000kg/rack all up, but they&#039;re not standard items. Might want to use racks sitting on the subfloor, run the power and network and air overhead, and talk to the architect about the building&#039;s design parameters.
You&#039;ll also still need a baby forklift to get the units into the top of the rack.
Then each rack is going to consume and emit 11KW, heavy duty air circulation will be required.</description>
		<content:encoded><![CDATA[<p>About the &#8220;OSHA won&#8217;t let you put ten X45xx in a rack&#8221;: the weight of the J4500 is specified at 77kg. Most racks and the datacentre floors they sit on are rated for something like 400kg including the weight of the rack, PDU, etc. I expect that Sun&#8217;s 4 boxes-in-the-rack figure is based on what&#8217;s generally achievable in most datacentres today.<br />
Sure, you could get racks that will hold 800kg, and floors to hold up the rack-and-storage combo that weighs in at about 1000kg/rack all up, but they&#8217;re not standard items. Might want to use racks sitting on the subfloor, run the power and network and air overhead, and talk to the architect about the building&#8217;s design parameters.<br />
You&#8217;ll also still need a baby forklift to get the units into the top of the rack.<br />
Then each rack is going to consume and emit 11KW, heavy duty air circulation will be required.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Richard B</title>
		<link>http://storagemojo.com/2008/10/12/building-a-18-exabyte-data-center/comment-page-1/#comment-198020</link>
		<dc:creator>Richard B</dc:creator>
		<pubDate>Fri, 17 Oct 2008 15:49:16 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=973#comment-198020</guid>
		<description>Ewen, and others who’ve mention de-dup ratios, you should take a large pinch of salt with vendors’ claims about the reduction they can effect. Data Domain, EMC Avamar and others talk in the context of *backup* data - that is repeated storage of very similar data, so they are able to point to huge reductions. You will clearly see a lot of block level duplication in this sort of environment. When you’re looking at primary storage at GreenBytes seem to be, and NetApp are now starting to, the type of data is all important. If this is archived data it could be scanned images, in which case the amount of duplication could be close to zero.</description>
		<content:encoded><![CDATA[<p>Ewen, and others who’ve mention de-dup ratios, you should take a large pinch of salt with vendors’ claims about the reduction they can effect. Data Domain, EMC Avamar and others talk in the context of *backup* data &#8211; that is repeated storage of very similar data, so they are able to point to huge reductions. You will clearly see a lot of block level duplication in this sort of environment. When you’re looking at primary storage at GreenBytes seem to be, and NetApp are now starting to, the type of data is all important. If this is archived data it could be scanned images, in which case the amount of duplication could be close to zero.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joe Kraska</title>
		<link>http://storagemojo.com/2008/10/12/building-a-18-exabyte-data-center/comment-page-1/#comment-198015</link>
		<dc:creator>Joe Kraska</dc:creator>
		<pubDate>Fri, 17 Oct 2008 02:13:48 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=973#comment-198015</guid>
		<description>BTW, the 1.8 exabyte archive: how fast does it fill up? If it takes a year, I believe that this is ~60 GB/s ... 24/7/365. So this archive may have some needs for a few fairly hefty OC links as well.


Joe.</description>
		<content:encoded><![CDATA[<p>BTW, the 1.8 exabyte archive: how fast does it fill up? If it takes a year, I believe that this is ~60 GB/s &#8230; 24/7/365. So this archive may have some needs for a few fairly hefty OC links as well.</p>
<p>Joe.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joe Kraska</title>
		<link>http://storagemojo.com/2008/10/12/building-a-18-exabyte-data-center/comment-page-1/#comment-198014</link>
		<dc:creator>Joe Kraska</dc:creator>
		<pubDate>Fri, 17 Oct 2008 02:10:16 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=973#comment-198014</guid>
		<description>Another issue with data storage at this scale will be namespace management. 1.8 exabytes divided by typical maximum volume sizes today will lead to a volume proliferation nightmare, upon which much very nasty OPEX could very easily rest. As far as who is most suited for this activity now, I&#039;d take a look, perhaps, at HP&#039;s Extreme Data Store when combined with PolyServe. You have both industry-leading density (1.6 racks per PB) as well as fairly large volumes.

This, assuming that you wish to store everything on spinning media.

BTW, I did an analysis like this very recently for a future sustained 17GB/s problem (sustained: 24/7/365). That&#039;s a 1.5PB/day write rate. Kinda big, eh? Anyway, the problem is much more achievable than it sounds if the problem is in 2013. In 2013, I would expect 8TB SATA drive (or something equivalent) to be readily procured at or less than today&#039;s 1TB drives.

Others mentioned deduplication and compression, but there is no mention regarding the duplicative nature of the data or its current compression. I will say this: it sounds a bit like a content addressable storage (CAS) problem. The CAS space seems to be kind of formative to me, but some of those technologies might complement a truly gigantic archive.

Finally: I would think such an archive would be dying for ILM of some sort.

Joe.</description>
		<content:encoded><![CDATA[<p>Another issue with data storage at this scale will be namespace management. 1.8 exabytes divided by typical maximum volume sizes today will lead to a volume proliferation nightmare, upon which much very nasty OPEX could very easily rest. As far as who is most suited for this activity now, I&#8217;d take a look, perhaps, at HP&#8217;s Extreme Data Store when combined with PolyServe. You have both industry-leading density (1.6 racks per PB) as well as fairly large volumes.</p>
<p>This, assuming that you wish to store everything on spinning media.</p>
<p>BTW, I did an analysis like this very recently for a future sustained 17GB/s problem (sustained: 24/7/365). That&#8217;s a 1.5PB/day write rate. Kinda big, eh? Anyway, the problem is much more achievable than it sounds if the problem is in 2013. In 2013, I would expect 8TB SATA drive (or something equivalent) to be readily procured at or less than today&#8217;s 1TB drives.</p>
<p>Others mentioned deduplication and compression, but there is no mention regarding the duplicative nature of the data or its current compression. I will say this: it sounds a bit like a content addressable storage (CAS) problem. The CAS space seems to be kind of formative to me, but some of those technologies might complement a truly gigantic archive.</p>
<p>Finally: I would think such an archive would be dying for ILM of some sort.</p>
<p>Joe.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Matt L</title>
		<link>http://storagemojo.com/2008/10/12/building-a-18-exabyte-data-center/comment-page-1/#comment-197989</link>
		<dc:creator>Matt L</dc:creator>
		<pubDate>Wed, 15 Oct 2008 01:32:55 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=973#comment-197989</guid>
		<description>What about Data DeDup options like Data Domain, EMC, FalconStor and others.  Even Compressed Data can be Dedupped at the block level and only Unique Data will be Archived.  Now admittedly I am a little green in the storage game but what type of availiblity do you require for the data.  How quickly does it need to be accessed, how will it be accessed?</description>
		<content:encoded><![CDATA[<p>What about Data DeDup options like Data Domain, EMC, FalconStor and others.  Even Compressed Data can be Dedupped at the block level and only Unique Data will be Archived.  Now admittedly I am a little green in the storage game but what type of availiblity do you require for the data.  How quickly does it need to be accessed, how will it be accessed?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kevin Closson</title>
		<link>http://storagemojo.com/2008/10/12/building-a-18-exabyte-data-center/comment-page-1/#comment-197985</link>
		<dc:creator>Kevin Closson</dc:creator>
		<pubDate>Tue, 14 Oct 2008 15:21:01 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=973#comment-197985</guid>
		<description>I didn&#039;t see the word compression anywhere in this post (did I speed read?).  Since it isn&#039;t mentioned then we are either fantasizing about 1.8 exabyte of compressed data (perhaps 6 to 8 exabytes of uncompressed data).  If the &quot;Thumpers&quot; (SunFire 45XX) were to write uncompressed data as fast as the AMD HT 2.0 interconnect could sustain it would take something like 3 years of nonstop maxed-out writing of random bytes to lay 1.8 exabyte onto disk.

Did the Soviet&#039;s start talking about strapping Yuri a lightning bolt and sending him to another solar system after his first lap around the earth ?   :-)</description>
		<content:encoded><![CDATA[<p>I didn&#8217;t see the word compression anywhere in this post (did I speed read?).  Since it isn&#8217;t mentioned then we are either fantasizing about 1.8 exabyte of compressed data (perhaps 6 to 8 exabytes of uncompressed data).  If the &#8220;Thumpers&#8221; (SunFire 45XX) were to write uncompressed data as fast as the AMD HT 2.0 interconnect could sustain it would take something like 3 years of nonstop maxed-out writing of random bytes to lay 1.8 exabyte onto disk.</p>
<p>Did the Soviet&#8217;s start talking about strapping Yuri a lightning bolt and sending him to another solar system after his first lap around the earth ?   <img src='http://storagemojo.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jake</title>
		<link>http://storagemojo.com/2008/10/12/building-a-18-exabyte-data-center/comment-page-1/#comment-197984</link>
		<dc:creator>Jake</dc:creator>
		<pubDate>Tue, 14 Oct 2008 14:19:06 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=973#comment-197984</guid>
		<description>Fully loaded capacity is 512TB each, so yes you would need more than my original petabyte calculation.  It still is a far better choice than a DIY solution using Sun boxes to front end.</description>
		<content:encoded><![CDATA[<p>Fully loaded capacity is 512TB each, so yes you would need more than my original petabyte calculation.  It still is a far better choice than a DIY solution using Sun boxes to front end.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Robin Harris</title>
		<link>http://storagemojo.com/2008/10/12/building-a-18-exabyte-data-center/comment-page-1/#comment-197982</link>
		<dc:creator>Robin Harris</dc:creator>
		<pubDate>Tue, 14 Oct 2008 13:47:35 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=973#comment-197982</guid>
		<description>Jake, did you slip a few decimal points? It looks like a fully expanded DS 8300 is about a thousand drives in a couple of racks. You&#039;d need over a thousand to reach 1.8 EB.

Robin</description>
		<content:encoded><![CDATA[<p>Jake, did you slip a few decimal points? It looks like a fully expanded DS 8300 is about a thousand drives in a couple of racks. You&#8217;d need over a thousand to reach 1.8 EB.</p>
<p>Robin</p>
]]></content:encoded>
	</item>
</channel>
</rss>
