<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Storage for version control</title>
	<atom:link href="http://storagemojo.com/2010/01/19/storage-for-version-control/feed/" rel="self" type="application/rss+xml" />
	<link>http://storagemojo.com/2010/01/19/storage-for-version-control/</link>
	<description>Data storage info &#38; analysis</description>
	<lastBuildDate>Tue, 07 Feb 2012 16:02:02 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: cliff</title>
		<link>http://storagemojo.com/2010/01/19/storage-for-version-control/comment-page-1/#comment-207801</link>
		<dc:creator>cliff</dc:creator>
		<pubDate>Wed, 27 Jan 2010 11:58:23 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1852#comment-207801</guid>
		<description>robin-
First time I have made a comment.  This is a great blog and love what you write.  I work as a senior sales engineer for Rackspace (NYSE:RAX).  These comments are my own.  It sounds like Chris might be a customer.  I would be happy to work with him directly if he wants and you can share my email with him.

comment for the post, feel free to edit for length-
there are really two ways to this answer this. You see that kind of through the other comments.  there is the hardware based approach  (netapp) or the web scale approach (the github example is good).  Notice the difference in level of expertise required to admin the solution.  there are tradeoffs along a couple of different axis.  cost vs complexity.  complexity and ease of use.  

The iops here are really small, so the cheapest option will be to use commodity server running NFS with a bunch of TB sized drives, then shard the users so that X% go to one server after another.  I assume the service has enough users that this can work.

cliff</description>
		<content:encoded><![CDATA[<p>robin-<br />
First time I have made a comment.  This is a great blog and love what you write.  I work as a senior sales engineer for Rackspace (NYSE:RAX).  These comments are my own.  It sounds like Chris might be a customer.  I would be happy to work with him directly if he wants and you can share my email with him.</p>
<p>comment for the post, feel free to edit for length-<br />
there are really two ways to this answer this. You see that kind of through the other comments.  there is the hardware based approach  (netapp) or the web scale approach (the github example is good).  Notice the difference in level of expertise required to admin the solution.  there are tradeoffs along a couple of different axis.  cost vs complexity.  complexity and ease of use.  </p>
<p>The iops here are really small, so the cheapest option will be to use commodity server running NFS with a bunch of TB sized drives, then shard the users so that X% go to one server after another.  I assume the service has enough users that this can work.</p>
<p>cliff</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Chris</title>
		<link>http://storagemojo.com/2010/01/19/storage-for-version-control/comment-page-1/#comment-207758</link>
		<dc:creator>Chris</dc:creator>
		<pubDate>Mon, 25 Jan 2010 16:49:21 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1852#comment-207758</guid>
		<description>Thanks everyone for such great advice and feedback. There are a lot of options and it helps a lot. Now we have to choose a good path :)

Once we figure it out I will post with an update.

Chris</description>
		<content:encoded><![CDATA[<p>Thanks everyone for such great advice and feedback. There are a lot of options and it helps a lot. Now we have to choose a good path <img src='http://storagemojo.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Once we figure it out I will post with an update.</p>
<p>Chris</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeff Darcy</title>
		<link>http://storagemojo.com/2010/01/19/storage-for-version-control/comment-page-1/#comment-207738</link>
		<dc:creator>Jeff Darcy</dc:creator>
		<pubDate>Sat, 23 Jan 2010 22:01:25 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1852#comment-207738</guid>
		<description>There&#039;s more than &quot;a little bit&quot; of complexity involved in clustering your NAS heads if you want them to present a single namespace.  Fortunately, at 500 or even 800 IOPS you can serve that very easily with a single pair of servers in an active/standby arrangement.  For metadata-heavy workloads *any* cluster/parallel filesystem (I&#039;ve worked on several) will struggle to keep up with a single-active-server setup for workloads where the latter suffices.  If it were me, I&#039;d just go buy a couple of Joe L&#039;s boxes and be done with it.  ;)</description>
		<content:encoded><![CDATA[<p>There&#8217;s more than &#8220;a little bit&#8221; of complexity involved in clustering your NAS heads if you want them to present a single namespace.  Fortunately, at 500 or even 800 IOPS you can serve that very easily with a single pair of servers in an active/standby arrangement.  For metadata-heavy workloads *any* cluster/parallel filesystem (I&#8217;ve worked on several) will struggle to keep up with a single-active-server setup for workloads where the latter suffices.  If it were me, I&#8217;d just go buy a couple of Joe L&#8217;s boxes and be done with it.  <img src='http://storagemojo.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tim</title>
		<link>http://storagemojo.com/2010/01/19/storage-for-version-control/comment-page-1/#comment-207736</link>
		<dc:creator>Tim</dc:creator>
		<pubDate>Sat, 23 Jan 2010 18:48:24 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1852#comment-207736</guid>
		<description>how about getting a drobo?

http://drobo.com/products/droboelite.php</description>
		<content:encoded><![CDATA[<p>how about getting a drobo?</p>
<p><a href="http://drobo.com/products/droboelite.php" rel="nofollow">http://drobo.com/products/droboelite.php</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeffrey W. Baker</title>
		<link>http://storagemojo.com/2010/01/19/storage-for-version-control/comment-page-1/#comment-207721</link>
		<dc:creator>Jeffrey W. Baker</dc:creator>
		<pubDate>Fri, 22 Jan 2010 22:21:14 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1852#comment-207721</guid>
		<description>It sounds to me like you need to forget your storage subsystem and switch to a better VCS.  Instead of trying to use modern storage to host the crusty old Subversion, why not switch to a modern, distributed, and therefore self-parallelizing, self-backing-up VCS like Mercurial, Git, or Bzr?</description>
		<content:encoded><![CDATA[<p>It sounds to me like you need to forget your storage subsystem and switch to a better VCS.  Instead of trying to use modern storage to host the crusty old Subversion, why not switch to a modern, distributed, and therefore self-parallelizing, self-backing-up VCS like Mercurial, Git, or Bzr?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Wes Felter</title>
		<link>http://storagemojo.com/2010/01/19/storage-for-version-control/comment-page-1/#comment-207706</link>
		<dc:creator>Wes Felter</dc:creator>
		<pubDate>Thu, 21 Jan 2010 16:38:52 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1852#comment-207706</guid>
		<description>Chris, instead of DRBD I would suggest two servers with a dual-ported SAS JBOD for HA.</description>
		<content:encoded><![CDATA[<p>Chris, instead of DRBD I would suggest two servers with a dual-ported SAS JBOD for HA.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Olivier</title>
		<link>http://storagemojo.com/2010/01/19/storage-for-version-control/comment-page-1/#comment-207704</link>
		<dc:creator>Olivier</dc:creator>
		<pubDate>Thu, 21 Jan 2010 14:21:50 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1852#comment-207704</guid>
		<description>What is the ratio of the &#039;dead&#039; files (files that are not frequently used) versus the &#039;live&#039; files (files that are often requested)? The SUN 7000 series have SSD&#039;s that are used as read cache, and write SSD&#039;s to take all IOPS as buffer in front of the SATA disks. If only 500G of your data is used frequently, you will be very happy with a SUN 7000 system with 500G SSD read cache, and some SSD&#039;s as write cache to take the IOPS. The SUN 7000 has built in dedup as well. We use some 7410 clusters (which are overkill for your requirements), and the NFS performance is quite impressive for the price. We&#039;ve tested with Netapp but their best price was still far more expensive.</description>
		<content:encoded><![CDATA[<p>What is the ratio of the &#8216;dead&#8217; files (files that are not frequently used) versus the &#8216;live&#8217; files (files that are often requested)? The SUN 7000 series have SSD&#8217;s that are used as read cache, and write SSD&#8217;s to take all IOPS as buffer in front of the SATA disks. If only 500G of your data is used frequently, you will be very happy with a SUN 7000 system with 500G SSD read cache, and some SSD&#8217;s as write cache to take the IOPS. The SUN 7000 has built in dedup as well. We use some 7410 clusters (which are overkill for your requirements), and the NFS performance is quite impressive for the price. We&#8217;ve tested with Netapp but their best price was still far more expensive.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Martin Scholl</title>
		<link>http://storagemojo.com/2010/01/19/storage-for-version-control/comment-page-1/#comment-207702</link>
		<dc:creator>Martin Scholl</dc:creator>
		<pubDate>Thu, 21 Jan 2010 08:09:54 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1852#comment-207702</guid>
		<description>Hello Chris,

for failover and replication, DRBD is used at many high-profile sites with great success. I can definitely recommend using DRBD for replication. Used together with heartbeat is DRBD a great foundation for having highly-available shared-nothing block device.
Regarding the hardware used: we are about to launch a storage product especially for *aaS/cloud-providers. 
If you would like to learn more about it, please contact me for more: ms [at] mystoragepod.com</description>
		<content:encoded><![CDATA[<p>Hello Chris,</p>
<p>for failover and replication, DRBD is used at many high-profile sites with great success. I can definitely recommend using DRBD for replication. Used together with heartbeat is DRBD a great foundation for having highly-available shared-nothing block device.<br />
Regarding the hardware used: we are about to launch a storage product especially for *aaS/cloud-providers.<br />
If you would like to learn more about it, please contact me for more: ms [at] mystoragepod.com</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Karl Katzke</title>
		<link>http://storagemojo.com/2010/01/19/storage-for-version-control/comment-page-1/#comment-207700</link>
		<dc:creator>Karl Katzke</dc:creator>
		<pubDate>Thu, 21 Jan 2010 05:23:04 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1852#comment-207700</guid>
		<description>Hmn, that&#039;s REALLY not that many IOPS. I would look more closely at the Sun offerings -- the Unified Storage Systems mentioned above. Google around a bit for &#039;fishworks&#039; -- you&#039;ll get an ear/eye/mind-full. The NetApp system you have is massive overkill for what you&#039;re doing with it.</description>
		<content:encoded><![CDATA[<p>Hmn, that&#8217;s REALLY not that many IOPS. I would look more closely at the Sun offerings &#8212; the Unified Storage Systems mentioned above. Google around a bit for &#8216;fishworks&#8217; &#8212; you&#8217;ll get an ear/eye/mind-full. The NetApp system you have is massive overkill for what you&#8217;re doing with it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Chris</title>
		<link>http://storagemojo.com/2010/01/19/storage-for-version-control/comment-page-1/#comment-207699</link>
		<dc:creator>Chris</dc:creator>
		<pubDate>Thu, 21 Jan 2010 05:01:00 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1852#comment-207699</guid>
		<description>Thanks so much for all of the detailed responses. I sent the question in to Robin late at night after researching some options. Let me clarify some of the items:

* The VCS is SVN
* The NetApp was a 2040
* 500 iops is the usual, but we experience peaks during a given period of 800 for writes and 500 for reads
* 3T is our current usage, but the NetApp would give us around 5T usable.
* We did some tests with dedupe on netapp. Interesting thing is that vmdk was about 50% savings, but directly on disk it was almost no savings.
* We used to use GFS, which caused all sorts of problems.

One advantage we have is sharding. Our app is already setup for this, so we are considering a completely &quot;shared nothing&quot; environment with local disks. Getting away from a shared file system would be ideal. While NFS works well, it will eventually be a bottleneck (I think).

If we go with local disks, we just need to figure out the failover/replication scenario. Any suggestions there would be great. So far ZFS/Nextenta looks great for its flexible management and snapshots.

Btw, I am not an architect or engineer, just trying to personally explore high level options for the biz moving forward.

Thanks again.</description>
		<content:encoded><![CDATA[<p>Thanks so much for all of the detailed responses. I sent the question in to Robin late at night after researching some options. Let me clarify some of the items:</p>
<p>* The VCS is SVN<br />
* The NetApp was a 2040<br />
* 500 iops is the usual, but we experience peaks during a given period of 800 for writes and 500 for reads<br />
* 3T is our current usage, but the NetApp would give us around 5T usable.<br />
* We did some tests with dedupe on netapp. Interesting thing is that vmdk was about 50% savings, but directly on disk it was almost no savings.<br />
* We used to use GFS, which caused all sorts of problems.</p>
<p>One advantage we have is sharding. Our app is already setup for this, so we are considering a completely &#8220;shared nothing&#8221; environment with local disks. Getting away from a shared file system would be ideal. While NFS works well, it will eventually be a bottleneck (I think).</p>
<p>If we go with local disks, we just need to figure out the failover/replication scenario. Any suggestions there would be great. So far ZFS/Nextenta looks great for its flexible management and snapshots.</p>
<p>Btw, I am not an architect or engineer, just trying to personally explore high level options for the biz moving forward.</p>
<p>Thanks again.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Anonymous Sysadmin</title>
		<link>http://storagemojo.com/2010/01/19/storage-for-version-control/comment-page-1/#comment-207695</link>
		<dc:creator>Anonymous Sysadmin</dc:creator>
		<pubDate>Thu, 21 Jan 2010 02:22:59 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1852#comment-207695</guid>
		<description>Ah, prices from an unnamed major North American managed hosting company who is based in Texas.  I know them well.

At the type of scale that your reader describes, there are a variety of viable options, particularly with the fairly low 500 IOPs numbers stated.  That could very easily be handled by a Dell MD3000 with 15 x 600GB 15k RPM drives connected to a pair of front-end hosts to act as redundant NFS heads connected via SAS to redundant controllers.  At RAID10, you&#039;d wind up in the neighborhood of 4TB of usable space and at least a couple of thousand IOPs.  Admittedly, that&#039;s a roll your own solution that does have a little bit of complexity involved in the clustering of the NFS heads. 

The next step from there would be something like a smaller Isilon cluster.  You get NFS out of the box and scale-out capability for both IOPs and storage.  Isilon has gotten much more aggressive on pricing in recent months as well, so the price point can be very competitive.

That&#039;s my $0.02 anyway.</description>
		<content:encoded><![CDATA[<p>Ah, prices from an unnamed major North American managed hosting company who is based in Texas.  I know them well.</p>
<p>At the type of scale that your reader describes, there are a variety of viable options, particularly with the fairly low 500 IOPs numbers stated.  That could very easily be handled by a Dell MD3000 with 15 x 600GB 15k RPM drives connected to a pair of front-end hosts to act as redundant NFS heads connected via SAS to redundant controllers.  At RAID10, you&#8217;d wind up in the neighborhood of 4TB of usable space and at least a couple of thousand IOPs.  Admittedly, that&#8217;s a roll your own solution that does have a little bit of complexity involved in the clustering of the NFS heads. </p>
<p>The next step from there would be something like a smaller Isilon cluster.  You get NFS out of the box and scale-out capability for both IOPs and storage.  Isilon has gotten much more aggressive on pricing in recent months as well, so the price point can be very competitive.</p>
<p>That&#8217;s my $0.02 anyway.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Ragan-Kelley</title>
		<link>http://storagemojo.com/2010/01/19/storage-for-version-control/comment-page-1/#comment-207694</link>
		<dc:creator>Jonathan Ragan-Kelley</dc:creator>
		<pubDate>Thu, 21 Jan 2010 01:36:08 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1852#comment-207694</guid>
		<description>Hosted version control is an embarrassingly horizontal problem. Assuming that no one repository is represents a major portion of their load, simply partitioning the repositories horizontally across machines should allow them to use stupid-simple commodity storage in each node.

For a great perspective on the practical issues in scaling an entire large web + version-control service, cf. http://github.com/blog/530-how-we-made-github-fast</description>
		<content:encoded><![CDATA[<p>Hosted version control is an embarrassingly horizontal problem. Assuming that no one repository is represents a major portion of their load, simply partitioning the repositories horizontally across machines should allow them to use stupid-simple commodity storage in each node.</p>
<p>For a great perspective on the practical issues in scaling an entire large web + version-control service, cf. <a href="http://github.com/blog/530-how-we-made-github-fast" rel="nofollow">http://github.com/blog/530-how-we-made-github-fast</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Uday Mohan</title>
		<link>http://storagemojo.com/2010/01/19/storage-for-version-control/comment-page-1/#comment-207692</link>
		<dc:creator>Uday Mohan</dc:creator>
		<pubDate>Wed, 20 Jan 2010 23:34:04 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1852#comment-207692</guid>
		<description>ParaScale is really simple to manage. They offer a free 4 TB download that can be installed on your own selected commodity hardware running Linux. It can be installed and configured in a morning and does not need much maintenance. 
Also once you have a cloud running you can scale it online without any service disruption. They have Thin Provisioning so you can provision a multi TB file system, and throw in additional storage only when needed.

$8,000/mo over the lifetime of the data can really add up. However, if you repurpose some supermicro or dell boxes, or purchase some new ones, you can build a storage cloud for much less. 
The download is available at www.parascale.com

Uday</description>
		<content:encoded><![CDATA[<p>ParaScale is really simple to manage. They offer a free 4 TB download that can be installed on your own selected commodity hardware running Linux. It can be installed and configured in a morning and does not need much maintenance.<br />
Also once you have a cloud running you can scale it online without any service disruption. They have Thin Provisioning so you can provision a multi TB file system, and throw in additional storage only when needed.</p>
<p>$8,000/mo over the lifetime of the data can really add up. However, if you repurpose some supermicro or dell boxes, or purchase some new ones, you can build a storage cloud for much less.<br />
The download is available at <a href="http://www.parascale.com" rel="nofollow">http://www.parascale.com</a></p>
<p>Uday</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Wes Felter</title>
		<link>http://storagemojo.com/2010/01/19/storage-for-version-control/comment-page-1/#comment-207690</link>
		<dc:creator>Wes Felter</dc:creator>
		<pubDate>Wed, 20 Jan 2010 20:09:54 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1852#comment-207690</guid>
		<description>With such a small amount of data, $/GB may not be the appropriate metric; perhaps the reader should look at $/IOPS. I also question using a cluster when a single controller could easily satisfy the requirements. Heck, if there is locality you might be able to fit this in four hard disks and three SSDs...</description>
		<content:encoded><![CDATA[<p>With such a small amount of data, $/GB may not be the appropriate metric; perhaps the reader should look at $/IOPS. I also question using a cluster when a single controller could easily satisfy the requirements. Heck, if there is locality you might be able to fit this in four hard disks and three SSDs&#8230;</p>
]]></content:encoded>
	</item>
</channel>
</rss>

