<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Many-cores hit the memory wall</title>
	<atom:link href="http://storagemojo.com/2008/12/08/many-cores-hit-the-memory-wall/feed/" rel="self" type="application/rss+xml" />
	<link>http://storagemojo.com/2008/12/08/many-cores-hit-the-memory-wall/</link>
	<description>Data storage info &#38; analysis</description>
	<lastBuildDate>Fri, 19 Mar 2010 09:23:11 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Rex</title>
		<link>http://storagemojo.com/2008/12/08/many-cores-hit-the-memory-wall/comment-page-1/#comment-199126</link>
		<dc:creator>Rex</dc:creator>
		<pubDate>Mon, 19 Jan 2009 03:02:42 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1038#comment-199126</guid>
		<description>Want to dive deeper on the problems with massively multi-core system design?

David Patterson, UC Berkeley, on &quot;The Parallel Revolution Has Started: Are You Part of the Solution or Part of the Problem&quot;, at a Google TechTalk a few days ago:
http://www.youtube.com/watch?v=A2H_SrpAPZU

He even touches on i/o issues and the future of flash.  It&#039;s worth watching to the end, his answers to audience questions are illuminating, too.</description>
		<content:encoded><![CDATA[<p>Want to dive deeper on the problems with massively multi-core system design?</p>
<p>David Patterson, UC Berkeley, on &#8220;The Parallel Revolution Has Started: Are You Part of the Solution or Part of the Problem&#8221;, at a Google TechTalk a few days ago:<br />
<a href="http://www.youtube.com/watch?v=A2H_SrpAPZU" rel="nofollow">http://www.youtube.com/watch?v=A2H_SrpAPZU</a></p>
<p>He even touches on i/o issues and the future of flash.  It&#8217;s worth watching to the end, his answers to audience questions are illuminating, too.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: The Memory Bandwidth Gap &#171; Permabits and Petabytes</title>
		<link>http://storagemojo.com/2008/12/08/many-cores-hit-the-memory-wall/comment-page-1/#comment-199021</link>
		<dc:creator>The Memory Bandwidth Gap &#171; Permabits and Petabytes</dc:creator>
		<pubDate>Mon, 05 Jan 2009 22:18:46 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1038#comment-199021</guid>
		<description>[...] at StorageMojo, Robin comments on the challenges of shared memory controllers with multi-core processors. This is actually something that&#8217;s been a big problem for regular [...]</description>
		<content:encoded><![CDATA[<p>[...] at StorageMojo, Robin comments on the challenges of shared memory controllers with multi-core processors. This is actually something that&#8217;s been a big problem for regular [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Taylor</title>
		<link>http://storagemojo.com/2008/12/08/many-cores-hit-the-memory-wall/comment-page-1/#comment-198921</link>
		<dc:creator>Taylor</dc:creator>
		<pubDate>Wed, 24 Dec 2008 20:34:28 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1038#comment-198921</guid>
		<description>Rex: I would guess, thought the IEEE article does not say, that the graph shows performance as the *total* number of cores remains the same, but the number of cores per CPU goes up.  For example, the &quot;4 cores&quot; example was a simulation for a 128-node quad-core-per-node cluster, and the &quot;8 cores&quot; example was a simulation for a 64-node 8-core-per-node cluster.

So, for the stacked-memory case, things are just fine: there isn&#039;t much performance gain to be had by going multi-core beyond 8 cores / CPU, BUT the density gain is linear, and efficiency gain is likely to be a significant positive slope as well.</description>
		<content:encoded><![CDATA[<p>Rex: I would guess, thought the IEEE article does not say, that the graph shows performance as the *total* number of cores remains the same, but the number of cores per CPU goes up.  For example, the &#8220;4 cores&#8221; example was a simulation for a 128-node quad-core-per-node cluster, and the &#8220;8 cores&#8221; example was a simulation for a 64-node 8-core-per-node cluster.</p>
<p>So, for the stacked-memory case, things are just fine: there isn&#8217;t much performance gain to be had by going multi-core beyond 8 cores / CPU, BUT the density gain is linear, and efficiency gain is likely to be a significant positive slope as well.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Links for 12 Dec 2008 - 24 Dec 2008 :: Col&#8217;s Tech Stuff</title>
		<link>http://storagemojo.com/2008/12/08/many-cores-hit-the-memory-wall/comment-page-1/#comment-198913</link>
		<dc:creator>Links for 12 Dec 2008 - 24 Dec 2008 :: Col&#8217;s Tech Stuff</dc:creator>
		<pubDate>Wed, 24 Dec 2008 09:01:15 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1038#comment-198913</guid>
		<description>[...] StorageMojo &#187; Many-cores hit the memory wall - Good read here about the problems encountered with the ever expanding number of cores on a CPU. This is exactly the reason the UltraSPARC T1/2 processors have been designed the way they have. The CPU is multithreaded so whilest we wait for memory for one thread, the others can carry on. See http://www.sun.com/products/microelectronics/pdfs/Sun-Microelectronics-WindRiver_DS.pdf for more details.    Filed under: Links  &#160;&#160;&#124; &#160;&#160; Tags: 2008.11, application, architecture, atom, BEA, CMT, cpu, disks, Flash, glassfish, Intel, memory, Micron, multithreaded, MySQL, Niagara, OpenSolaris, opensource, Oracle, performance, review, Ruby, Solaris, SSD, stack, Storage, Sun, techie, UltraSPARC, web. &#160;&#160;&#124;&#160;&#160;SHARETHIS.addEntry({ title: &quot;Links for 12 Dec 2008 - 24 Dec 2008&quot;, url: &quot;http://www.lildude.co.uk/links-for-12-dec-2008-24-dec-2008/&quot; }); [...]</description>
		<content:encoded><![CDATA[<p>[...] StorageMojo &raquo; Many-cores hit the memory wall &#8211; Good read here about the problems encountered with the ever expanding number of cores on a CPU. This is exactly the reason the UltraSPARC T1/2 processors have been designed the way they have. The CPU is multithreaded so whilest we wait for memory for one thread, the others can carry on. See <a href="http://www.sun.com/products/microelectronics/pdfs/Sun-Microelectronics-WindRiver_DS.pdf" rel="nofollow">http://www.sun.com/products/microelectronics/pdfs/Sun-Microelectronics-WindRiver_DS.pdf</a> for more details.    Filed under: Links  &nbsp;&nbsp;| &nbsp;&nbsp; Tags: 2008.11, application, architecture, atom, BEA, CMT, cpu, disks, Flash, glassfish, Intel, memory, Micron, multithreaded, MySQL, Niagara, OpenSolaris, opensource, Oracle, performance, review, Ruby, Solaris, SSD, stack, Storage, Sun, techie, UltraSPARC, web. &nbsp;&nbsp;|&nbsp;&nbsp;SHARETHIS.addEntry({ title: &#8220;Links for 12 Dec 2008 &#8211; 24 Dec 2008&#8243;, url: &#8220;http://www.lildude.co.uk/links-for-12-dec-2008-24-dec-2008/&#8221; }); [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Steve Jones</title>
		<link>http://storagemojo.com/2008/12/08/many-cores-hit-the-memory-wall/comment-page-1/#comment-198824</link>
		<dc:creator>Steve Jones</dc:creator>
		<pubDate>Fri, 12 Dec 2008 08:35:20 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1038#comment-198824</guid>
		<description>Grumpy
   the graph is generated from modelling - which is fine providing the assumptions are right (better than building something to find it doesn&#039;t work). The modelling was for a particular workload type and memory access pattern - it could be completely different for another workload. 

As for all servers going NUMA, then that might well be (and at least it means that operating systems will be optimised for this). However, this isn&#039;t about servers - its about multi-core chips. It&#039;s possible to go NUMA with a multi-core chip and support multiple external memory buses with the appropriate high speed interconnects in the chip&#039;s memory management, but it still amounts to providing more memory bandwidth to the chip. Faster links will play a part, but that will hit clocking limits so we will still end up with more paths creating packaging and cost issues.  The inter-chip NUMA inteconnnects would similarly have to be scaled up adding even more to the packaging issues. Such a server architecture would also be NUMA at two layers requiring more optimisation. It&#039;s also possible to come up with workload types that don&#039;t scale well on NUMA. 

However, I&#039;m more optimistic that major strides can be made in increasing  memory bandwidth than can be done with hard disks. The latter are inherently constrained by mechanical and geometric issues. It&#039;s also easier to deal with bandwidth issues than latency ones. The mismatch between increased processor speed and memory latency is one of the main factors behind the drive towards hardware multi-threaded cores.</description>
		<content:encoded><![CDATA[<p>Grumpy<br />
   the graph is generated from modelling &#8211; which is fine providing the assumptions are right (better than building something to find it doesn&#8217;t work). The modelling was for a particular workload type and memory access pattern &#8211; it could be completely different for another workload. </p>
<p>As for all servers going NUMA, then that might well be (and at least it means that operating systems will be optimised for this). However, this isn&#8217;t about servers &#8211; its about multi-core chips. It&#8217;s possible to go NUMA with a multi-core chip and support multiple external memory buses with the appropriate high speed interconnects in the chip&#8217;s memory management, but it still amounts to providing more memory bandwidth to the chip. Faster links will play a part, but that will hit clocking limits so we will still end up with more paths creating packaging and cost issues.  The inter-chip NUMA inteconnnects would similarly have to be scaled up adding even more to the packaging issues. Such a server architecture would also be NUMA at two layers requiring more optimisation. It&#8217;s also possible to come up with workload types that don&#8217;t scale well on NUMA. </p>
<p>However, I&#8217;m more optimistic that major strides can be made in increasing  memory bandwidth than can be done with hard disks. The latter are inherently constrained by mechanical and geometric issues. It&#8217;s also easier to deal with bandwidth issues than latency ones. The mismatch between increased processor speed and memory latency is one of the main factors behind the drive towards hardware multi-threaded cores.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Grumpy ol' Wes Felter</title>
		<link>http://storagemojo.com/2008/12/08/many-cores-hit-the-memory-wall/comment-page-1/#comment-198822</link>
		<dc:creator>Grumpy ol' Wes Felter</dc:creator>
		<pubDate>Fri, 12 Dec 2008 00:44:12 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1038#comment-198822</guid>
		<description>We can only assume the graph is a lie since zero supporting evidence was presented.

Nobody used Rambus RAM except Sony, so their initiatives are irrelevant.

All servers are going to be NUMA anyway, so it&#039;s a little late to worry about that.</description>
		<content:encoded><![CDATA[<p>We can only assume the graph is a lie since zero supporting evidence was presented.</p>
<p>Nobody used Rambus RAM except Sony, so their initiatives are irrelevant.</p>
<p>All servers are going to be NUMA anyway, so it&#8217;s a little late to worry about that.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tony</title>
		<link>http://storagemojo.com/2008/12/08/many-cores-hit-the-memory-wall/comment-page-1/#comment-198813</link>
		<dc:creator>Tony</dc:creator>
		<pubDate>Wed, 10 Dec 2008 21:37:11 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1038#comment-198813</guid>
		<description>Chuck,
Yes, the idea of massive amounts of on-chip RAM is interesting - but remember that typical fast SRAM is 6T (requires 6 transistors), so your 16 billion transistors = ~300M bytes.  I&#039;ve heard of SRAM designs with fewer transistors, but don&#039;t know if they have the same speed.  Also, even on chip, it appears its hard to maintain the highest speed across a large memory array.</description>
		<content:encoded><![CDATA[<p>Chuck,<br />
Yes, the idea of massive amounts of on-chip RAM is interesting &#8211; but remember that typical fast SRAM is 6T (requires 6 transistors), so your 16 billion transistors = ~300M bytes.  I&#8217;ve heard of SRAM designs with fewer transistors, but don&#8217;t know if they have the same speed.  Also, even on chip, it appears its hard to maintain the highest speed across a large memory array.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Steve Jones</title>
		<link>http://storagemojo.com/2008/12/08/many-cores-hit-the-memory-wall/comment-page-1/#comment-198809</link>
		<dc:creator>Steve Jones</dc:creator>
		<pubDate>Wed, 10 Dec 2008 11:02:21 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1038#comment-198809</guid>
		<description>Memory bandwidth has been an encroaching problem for years. It&#039;s another of those issues which has come about because mismatches in the characteristic periods of exponential growth. There are some things (memory capacity, disk capacity, processing power etc.) which go up on an arial density basis, whilst some are based on a single dimension (typical of interlink speeds, serial I/O speeds and so on).

There are little tweaks (wider busses, bigger caches), but these all break down eventually and the bottlenecks become very real. 

The introduction of hardware threading (really virtual CPUs) to make use of processing time otherwise lost due to memory stalls eeks out a little more. Hypethreading, SUN&#039;s Niagara processor - this is popping up all over the place).

However, we&#039;ve no arrived at a processor architectures which now has to be viewed as a system - the hardware threads on these machines are now virtual resources which are presented to the operating system as individual CPUs. The procesossor has to be viewed as a system with memory access like I/O to understand it properly. It&#039;s causing massive problems of understanding of capacity planning figures now as reported CPU utilisation does not reflect the underlying processor resource usage figures with all these hardware threads. Only the SUN Niagara has any really measurements of underlying core resources, and those are not the ones seen by the OS.

Basically the unit of what a core computer is has shrunk back to the chip. Once the Hyperviser arrives there too, there will be a very different relationship with memory.</description>
		<content:encoded><![CDATA[<p>Memory bandwidth has been an encroaching problem for years. It&#8217;s another of those issues which has come about because mismatches in the characteristic periods of exponential growth. There are some things (memory capacity, disk capacity, processing power etc.) which go up on an arial density basis, whilst some are based on a single dimension (typical of interlink speeds, serial I/O speeds and so on).</p>
<p>There are little tweaks (wider busses, bigger caches), but these all break down eventually and the bottlenecks become very real. </p>
<p>The introduction of hardware threading (really virtual CPUs) to make use of processing time otherwise lost due to memory stalls eeks out a little more. Hypethreading, SUN&#8217;s Niagara processor &#8211; this is popping up all over the place).</p>
<p>However, we&#8217;ve no arrived at a processor architectures which now has to be viewed as a system &#8211; the hardware threads on these machines are now virtual resources which are presented to the operating system as individual CPUs. The procesossor has to be viewed as a system with memory access like I/O to understand it properly. It&#8217;s causing massive problems of understanding of capacity planning figures now as reported CPU utilisation does not reflect the underlying processor resource usage figures with all these hardware threads. Only the SUN Niagara has any really measurements of underlying core resources, and those are not the ones seen by the OS.</p>
<p>Basically the unit of what a core computer is has shrunk back to the chip. Once the Hyperviser arrives there too, there will be a very different relationship with memory.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David Magda</title>
		<link>http://storagemojo.com/2008/12/08/many-cores-hit-the-memory-wall/comment-page-1/#comment-198807</link>
		<dc:creator>David Magda</dc:creator>
		<pubDate>Wed, 10 Dec 2008 01:27:38 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1038#comment-198807</guid>
		<description>In one of the videos originally released by Sun on their 7000 storage system (since been pulled / replaced), they mention that the only thing limiting I/O is memory bandwidth. One of the options available is dual 10 GigE interfaces, and supposedly they can saturate that (streaming from the JBODs).

Given that the controller heads support up to 128 GB of RAM in some configurations, that&#039;s a lot of cache.</description>
		<content:encoded><![CDATA[<p>In one of the videos originally released by Sun on their 7000 storage system (since been pulled / replaced), they mention that the only thing limiting I/O is memory bandwidth. One of the options available is dual 10 GigE interfaces, and supposedly they can saturate that (streaming from the JBODs).</p>
<p>Given that the controller heads support up to 128 GB of RAM in some configurations, that&#8217;s a lot of cache.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Chuck McManis</title>
		<link>http://storagemojo.com/2008/12/08/many-cores-hit-the-memory-wall/comment-page-1/#comment-198805</link>
		<dc:creator>Chuck McManis</dc:creator>
		<pubDate>Wed, 10 Dec 2008 00:19:55 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1038#comment-198805</guid>
		<description>Robin

This trend was startlingly obvious to storage manufacturers because they spent more of their time flinging bits from disk to client than they did computing on those bits. One of the reasons I, among others, was thrilled with the original Opteron work from AMD was that it added more memory controllers, a four socket Opteron with each chip having its own memory controller is an impressive beast. 

The massive cache experiment that was the Itanium was also flawed in that multiple cores  not only raw memory bandwidth requirements to feed the execution engine pipelines but also add coherence traffic which consumes bandwidth as well (and was the death of 8 core Opteron systems which, with a simple probe/reponse coherence protocol quickly lost the extra bandwidth benefit)

Dave Hitz commented that Flash was the new Disk and Disk is the new Tape, perhaps he got it wrong and DRAM is the new disk. When you have 16 billion transistors in a 45nm process why not just a couple of cores and 8GB of static RAM? 

--Chuck</description>
		<content:encoded><![CDATA[<p>Robin</p>
<p>This trend was startlingly obvious to storage manufacturers because they spent more of their time flinging bits from disk to client than they did computing on those bits. One of the reasons I, among others, was thrilled with the original Opteron work from AMD was that it added more memory controllers, a four socket Opteron with each chip having its own memory controller is an impressive beast. </p>
<p>The massive cache experiment that was the Itanium was also flawed in that multiple cores  not only raw memory bandwidth requirements to feed the execution engine pipelines but also add coherence traffic which consumes bandwidth as well (and was the death of 8 core Opteron systems which, with a simple probe/reponse coherence protocol quickly lost the extra bandwidth benefit)</p>
<p>Dave Hitz commented that Flash was the new Disk and Disk is the new Tape, perhaps he got it wrong and DRAM is the new disk. When you have 16 billion transistors in a 45nm process why not just a couple of cores and 8GB of static RAM? </p>
<p>&#8211;Chuck</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Greg Schulz</title>
		<link>http://storagemojo.com/2008/12/08/many-cores-hit-the-memory-wall/comment-page-1/#comment-198797</link>
		<dc:creator>Greg Schulz</dc:creator>
		<pubDate>Tue, 09 Dec 2008 08:37:43 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1038#comment-198797</guid>
		<description>Hello Robin,

Enjoyed your post.

Granted those of us who have been around as customers, vendors, analysts, media or what ever are familiar with the decades old I/O performance, storage capacity and processor gap, its far from being known by everyone in the industry.

From what I continue to see and hear from IT customers, even vendors and others particularly those who have not been in the industry for a decade or more, there is still an amazing lack of awareness of the I/O performance gap. In fact, the industry trends white paper &lt;a href=&quot;http://www.storageio.com/Reports/StorageIO_WP_080706_Cover.pdf&quot; rel=&quot;nofollow&quot;&gt;Data center performance bottlenecks and the server storage I/O performance gap&lt;/a&gt; continues to be a very popular download and point of discussion during presentations, seminars and keynote discussions.

So just how long until we hit the wall and is it a moving wall similar to the &lt;a href=&quot;http://www.computerworld.com/action/article.do?command=viewArticleBasic&amp;articleId=9003646&amp;pageNumber=2&quot; rel=&quot;nofollow&quot;&gt;super parametric barrier&lt;/a&gt; wall for disk drives that we were supposed to hit several years ago?

Granted the disk drive wall has been pushed back for awhile, however will the related memory wall be pushed back before or it is supposed to occur delaying the impact, is it an isolated case for extreme corner case environments, or, a major concern for commercial computing, or, simply something fun to talk about.

Cheers
Gs</description>
		<content:encoded><![CDATA[<p>Hello Robin,</p>
<p>Enjoyed your post.</p>
<p>Granted those of us who have been around as customers, vendors, analysts, media or what ever are familiar with the decades old I/O performance, storage capacity and processor gap, its far from being known by everyone in the industry.</p>
<p>From what I continue to see and hear from IT customers, even vendors and others particularly those who have not been in the industry for a decade or more, there is still an amazing lack of awareness of the I/O performance gap. In fact, the industry trends white paper <a href="http://www.storageio.com/Reports/StorageIO_WP_080706_Cover.pdf" rel="nofollow">Data center performance bottlenecks and the server storage I/O performance gap</a> continues to be a very popular download and point of discussion during presentations, seminars and keynote discussions.</p>
<p>So just how long until we hit the wall and is it a moving wall similar to the <a href="http://www.computerworld.com/action/article.do?command=viewArticleBasic&amp;articleId=9003646&amp;pageNumber=2" rel="nofollow">super parametric barrier</a> wall for disk drives that we were supposed to hit several years ago?</p>
<p>Granted the disk drive wall has been pushed back for awhile, however will the related memory wall be pushed back before or it is supposed to occur delaying the impact, is it an isolated case for extreme corner case environments, or, a major concern for commercial computing, or, simply something fun to talk about.</p>
<p>Cheers<br />
Gs</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Fazal Majid</title>
		<link>http://storagemojo.com/2008/12/08/many-cores-hit-the-memory-wall/comment-page-1/#comment-198794</link>
		<dc:creator>Fazal Majid</dc:creator>
		<pubDate>Tue, 09 Dec 2008 00:42:22 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1038#comment-198794</guid>
		<description>Thanks for the JvN pointer. It&#039;s nice to see the terminology of the time.

That memory bandwidth and latency are essential is not news. That the problem is now hitting mainstream computing is. This is why Intel is long overdue with the QPI switched interconnect in Nehalem processors (AMD&#039;s Opteron has had it for years). The 8-core Mac Pro was benchmarked as no faster than the 4-core version for many workloads, so this is not a theoretical concern.

One important solution to the problem is seldom discussed - refactoring software to make it more efficient, and reduce the bloat from layers and layers of abstraction layers piled upon legacy code. Apple is adopting this approach with Snow Leopard, and other software vendors would do well to follow suit. Moore&#039;s law stalled a number of years ago for single-thread workloads, the sloppy coding habits of many programmers have to go, and be replaced with a newfound emphasis on performance and its necessary precondition, a ruthless culling of unnecessary features that add bloat but little value (JVMs inside the database, anyone?).</description>
		<content:encoded><![CDATA[<p>Thanks for the JvN pointer. It&#8217;s nice to see the terminology of the time.</p>
<p>That memory bandwidth and latency are essential is not news. That the problem is now hitting mainstream computing is. This is why Intel is long overdue with the QPI switched interconnect in Nehalem processors (AMD&#8217;s Opteron has had it for years). The 8-core Mac Pro was benchmarked as no faster than the 4-core version for many workloads, so this is not a theoretical concern.</p>
<p>One important solution to the problem is seldom discussed &#8211; refactoring software to make it more efficient, and reduce the bloat from layers and layers of abstraction layers piled upon legacy code. Apple is adopting this approach with Snow Leopard, and other software vendors would do well to follow suit. Moore&#8217;s law stalled a number of years ago for single-thread workloads, the sloppy coding habits of many programmers have to go, and be replaced with a newfound emphasis on performance and its necessary precondition, a ruthless culling of unnecessary features that add bloat but little value (JVMs inside the database, anyone?).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Rex</title>
		<link>http://storagemojo.com/2008/12/08/many-cores-hit-the-memory-wall/comment-page-1/#comment-198792</link>
		<dc:creator>Rex</dc:creator>
		<pubDate>Mon, 08 Dec 2008 22:21:37 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=1038#comment-198792</guid>
		<description>Stacked memory won&#039;t solve the problem.  Look again at the chart in the IEEE Spectrum article.  Non-stacked (traditional) memory shows a dramatic decline in performance above 8 cores.

Stacked memory shows a leveling off -- i.e. no performance drop, but no performance gain.  What&#039;s the point of adding more cores?  Back to the drawing board to solve the memory bandwidth problem.

Massively multi-core systems have at least two major barriers to overcome -- memory constraints, and the difficulty of writing software to take advantage of MMC systems.

Generations of PhD theses and commercial research have not yet cracked the software problem for supercomputing, so don&#039;t expect major breakthroughs soon.</description>
		<content:encoded><![CDATA[<p>Stacked memory won&#8217;t solve the problem.  Look again at the chart in the IEEE Spectrum article.  Non-stacked (traditional) memory shows a dramatic decline in performance above 8 cores.</p>
<p>Stacked memory shows a leveling off &#8212; i.e. no performance drop, but no performance gain.  What&#8217;s the point of adding more cores?  Back to the drawing board to solve the memory bandwidth problem.</p>
<p>Massively multi-core systems have at least two major barriers to overcome &#8212; memory constraints, and the difficulty of writing software to take advantage of MMC systems.</p>
<p>Generations of PhD theses and commercial research have not yet cracked the software problem for supercomputing, so don&#8217;t expect major breakthroughs soon.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
