TPC-C: comparing SSD & disk

by Robin Harris on Friday, 7 August, 2009

Steve Jones of BT sent in the following, which I am publishing – with his permission – as a guest post. It has been edited and some headers added so any dodgy parts may not be his fault.

Begin the guest post:
SSD vs storage arrays
I thought it was worth doing a very quick back-of-the-envelope comparison of the Texas Memory Systems RamSan-6200 with the setup that IBM put together for their class-leading TPC-C benchmark. TPC-C is notoriously heavy on I/O and the majority of the costs are in the storage configuration.

TMS claims RamSan-6200 is capable of 5 million IOPs and 60GB/s of throughput with 100TB of RAID-protected SSD storage in a 40U rack. It lists at $4.4m.

In comparison, IBM’s TPC-C benchmark had 11,000 15K drives, all but 8 of which were (on the costed configuration) 73.4GB drives all mapped through 68 storage controllers with write cache. I think that this would occupy upwards of 30 racks and consume more than 230 kW.

The DB data space was almost all configured as RAID0 (some RAID5 on log files). Configured as RAID0 those 11,000 drives would provide about ~800TB of storage. The total 60 day data storage requirement, from the full disclosure report is 172 TB.

You might get 2 million random IOPs on a good day if you weren’t too heavy on the writes, although I/O queuing might be a problem at that density of access. According to the TPC-C costed report, the storage setup listed at $20m.

Comparing the configurations
A quick and rough comparison of two 400TB configs with 74.3GB RAID0 setup vs a RamSan-6200 would give something like the following

RamSan-6200 (theoretical)
4 racks
240 GB/s throughput
20 million random IOPs (not sure about read/write ratios)
Random access time (< 0.2ms with Inifiniband, < 0.4ms on FC; my estimates)
Power < 30kW (5TB seems to take 325W).
List price - $17.6m
Discount available ???
Note, however that the RAMSAM config might be missing a few elements costed in the IBM configuration.

IBM
30+ racks?
220 GB/s (68 controllers, each with 8GB of cache and 8 x 4Gbps host optimistically 400MBps per FC).
2 million random IOPs (my estimate with a read-dominated load and RAID0)
Random access time 4-5ms read, < 0.5ms write due to write cache (my estimates) before contention
Power - 230kW (my estimate - perhaps 180kW on drives, rest on 68 controllers).
List Price $20.4m
Discount available better than 65%
(TPC rules allow for available discounts to be included)

Some of those differences are about an order of magnitude or so.

This is the IBM TPC-C result.

Power and cooling
I’d suspect the SSD setup is going to save annually perhaps 1.8GWh per year in electricity costs assuming the IBM storage config is using about 230kW and the RamSan-6200 about 30kW. To this needs to be added the A/C costs, so more like 2.5GWh per year plus the A/C maintenance and so on.

I rather think the available discount structures will be less favourable on the SSD setup. However, it is interesting that the SSD setup is already looking more than cost effective against the small enterprise disk model, even before environmental costs are taken into account. If the above is typical, then even 15K 147GB enterprise drives will be killed by this sort of thing very shortly unless the prices are reduced closer to those of commodity drives (which they have room to do of course). The thing that rotating disks can’t address is the random I/O latency issue (which is why many apps are driven to 15K drives).

Where is the SSD TPC-C benchmark?
It’s interesting that nobody seems to have done a full-one SSD TPC-C benchmark, perhaps because the accountants in IBM & HP are unwilling to finance a brand new SSD setup (IBM actually used 37GB 15K drives but costed for 74GB so I suspect even their test lab has to make the kit last). However, it must surely happen one day soon.

Of course getting SSD down to cost per GB figures closer to those of commodity drives isn’t going to happen for a long time, although if the vendors can start to exploit MLC and lower fabrication costs with devices acceptable to the enterprise, it will further push into the rotating disk market.

Getting back to the real world
Of course this particular case is a somewhat insane configuration. There aren’t many organisations that I know that extensively make use of large numbers of small 15K drives to maximise IOP performance.

In our case, the transactional loads are borne by much larger 15K drives (usually now 300GB) carrying mixed workloads spread across all the available disks in an array with the low and high usage spread out. We also have a lot of very large DBs where the average access density is moderate, but latency has to be low for acceptable batch and online usage. Consequently cost per GB is much lower, and transactional latency is bearable although you can pay the penalty as contention goes up. However, many of our transactional type apps are still primarily I/O bound on reads.

End of guest post

The StorageMojo take
Steve’s rough comparison – and readers are encouraged to refine it in the comments – pulls on several data center issues.

  • Economics. SSDs have always been fast, but the $/GB number shut down the conversation before it got started for most folks. But the attention given consumer SSDs and their tradeoffs has reopened the conversation. And thanks to the econoclypse people are much more ready to listen.
  • Performance and capacity. The big difference with the flash-equipped RamSan is the capacity. It is now feasible to go all SSD for a set of major database applications – with the performance of traditional SSDs.
  • Power/cooling/floorspace. The GW/hrs saved sounds significant – especially if you are provisioning a new data center or are running out of power in an existing one.
  • Availability/reliability. TPC-C doesn’t address this directly, but with 11,000 drives there would be almost daily drive failures. The system can recover from all of them, but there has to be some recovery performance hit and, more importantly, there is non-negligible chance of human error in drive replacement. How does that factor in?

Steve, thank you for this guest post.

Courteous comments welcome, of course. TMS advertises on StorageMojo.

{ 18 comments… read them below or add one }

David Magda Friday, 7 August, 2009 at 6:05 pm

It seems that just about everyone is doing an either-or situation between SSDs and spinning rust. Sun is the only company that is (publicly) using both technologies simultaneously with their “hybrid storage pools”, where they put SSDs in front of disk for read and write caching:

http://blogs.sun.com/brendan/entry/slog_screenshots
http://blogs.sun.com/brendan/entry/l2arc_screenshots

Good on Sun for doing something cool (and hopefully Oracle will continue this), but the benefits seems obvious enough now that I’m surprised no one else has picked up on the idea (which would of course be one less advantage that Sun could crow about in their product line, should the idea go mainstream).

Or am I missing something about putting these technologies “in-line” with each other?

Fazal Majid Friday, 7 August, 2009 at 6:29 pm

I have been benchmarking Oracle 11g and PostgreSQL 8.4 for a high-performance, low-latency OLTP application (the SLA mandates 30ms response at 95th percentile). Due to Oracle’s licensing agreement I cannot publish benchmark results but Postgres handles about 6800 transactions per second with a 1.2ms response time on the 320GB FusionIO drive (rebranded as a HP Storage Accelerator for their BladeCenter blades, in this case the Nehalem-EP Xeon E5530 blades with 8GB RAM).

No disk array can come close to this latency. In-memory databases can, but they don’t handle persistence properly (at best the approach is to replicate data to RAM on multiple systems, checkpoint to disk periodically and hope there isn’t a whole data-center power failure).

Nik Simpson Friday, 7 August, 2009 at 7:41 pm

I don’t know if he noticed Dell’s TPC-H result with [4 Fusion-io ioDrives] last month, results are here:

http://www.tpc.org/tpch/results/tpch_result_detail.asp?id=109060201

DeltaTango Saturday, 8 August, 2009 at 12:49 pm

But the RAMSAN isn’t SDD but RAM? No?

Steve Jones Sunday, 9 August, 2009 at 12:16 am

I’d seen the TPC-H result, but hadn’t analysed it in comparison with a disk alternative. TPC-H has a very different I/O profile to TPC-C and it was really the transactional profile I was interested in. The low latency of these SSD drives is particularly relevant to the issues I see with many of those sort of systems.

I’ve also seen what SUN are up to with SSD caches in their unified storage device. I believe all but the lowest end appliance uses very large, and relatively slow disks (1TB 5,400 RPM drives at launch with plans to go to 2TB). Large enterprise arrays include large non-volatile memory (in the tens, or even hundred+ GB region), albeit at a very high cost. This large NV memory is used for cache, including such hidden things as the maintenance of pointers and maps for various replication capabilities. In many ways SUN’s use of cache is an alternative to that large NV cache, although it is organised in a fundamentally different way and has both write- and read-optimised parts.

In our experience, all these cached systems hit limits on very large databases with random access. The very good cache prospects tend to be grabbed by the database, leaving the storage array to cope with only writes and the remaining “dross”. Writes are easy to deal with (providing the ratio isn’t too high). It’s the random reads on large data sets which is the real problem.

It’s an old story with cache – the law of diminishing returns sets in as you add more, and you eventually get limited by the small proportion of misses. We have systems with 99.8%+ database cache hits and they still get I/O bound on random reads as the remaining rump of 15,000 random IOPs is still the dominant factor.

I think the SUN box has certainly got a place for many workloads. I suspect it will make a very good general-purpose file server, and it will use its combination of SSD and large, slow disks to hit some really good price/performance points, especially with various replication capabilities built-in. However, it isn’t going to be a good device if your requirement is for a large (multi-TB) heavily randomly-accessed data store, as you’ll get I/O bound on those slow disks.

Steve Jones Sunday, 9 August, 2009 at 12:29 am

I missed the comment about the nature of the RamSan storage. Despite the name, the RamSan 6200 (and it’s related products) are all flash-, not RAM-based. If you tried to build a 100TB storage device using DRAM then the cost, footprint and power consumption would be enormous.

Certainly there was a time when the only available solid state storage was of that type, but flash-based SSD has essentially taken over apart from small, and extremely fast requirements where NV RAM may still may play a part.

xfer_rdy Sunday, 9 August, 2009 at 10:40 am

I for one would like to see a full TPC-H benchmark for TMS’s 6200 for large datasets. I’d also like to see smallest dataset verses media life.

If media life is reasonable, they may have just carved out a niche for emerging multi-tenent providers.

@Fazal: I agree with you about FusionIO’s product, it is very fast. Most don’t realize its one of the few products that are true “packet” storage devices. Now we just have to wait 20 years until the mother board architectures and operating systems catch up with its potential.

:)

Fazal Majid Sunday, 9 August, 2009 at 10:48 am

@David – SSDs are indeed too expensive to be used for an entire database. Sun’s ZFS hybrid storage pool technology is very cool but it is not directly relevant to OLTP databases.
Typically you will use SSDs to optimize access to critical tables and indexes, and sometimes transaction journal or redo logs (although disks are very good at sequential I/O and SSDs are best deployed for random I/O, as Steve notes).
This brings up an interesting phenomenon – SSDs’ extremely low latency reveals bottlenecks in database engines themselves, e.g. low lock granularity. You need to benchmark your system to ensure there are no priority inversion effects where a query involving both SSD and HDD holds up a higher-priority query (or one that has a more stringent latency SLA) that uses SSD exclusively.

David Garvie Wednesday, 12 August, 2009 at 6:14 am

I am continually amazed that $20M is still considered a reasonable price for ~400TB of high performance disk. As I can put that capacity (usable, RAID-5 w/ 160 hot spares) on commodity 2.5″ 10K SAS drives and controllers in 4 racks (100 TB per rack) for less than $2M list, one really wonders where all that extra cash is going.

Not an apples-to-apples, to be sure, but what price/IOP are you willing to spend, what ROI are you getting for the extra $18.5M, and to what standards of support, uptime, and technology refresh timeframes are you holding your “enterprise” storage vendors for that kind of CAPEX? I sure hope it’s worth it.

-D

Robin Harris Thursday, 13 August, 2009 at 7:01 am

David,

$50/GB is steep, no doubt about it, and probably not what anyone considers reasonable – unless you are the salesman. Nonetheless, the one constant over the last decade is that for protected storage 90-95% of the cost is for all the stuff around the capacity, while the capacity itself is only 5-10% of the storage cost.

I keep wondering when that will change. Guess I have to keep wondering.

Robin

KD Mann Wednesday, 9 September, 2009 at 10:30 am

Steve, Robin:

I think there is one really simple reason why we have not seen TPC-C or TPC-E running on SSD.

If we were to see an actual, audited cost/performance number on a transational database system — the vast majority of even really smart storage and database guys would be picking their jaws up from the floor after seeing how expensive these things are on a cost/transaction basis compared to HDD.

FYI, IBM has already built a HUGE SSD based system in Q3’08 . It’s configuration was identical to those that IBM uses for it’s TPC-C testing. It was called “quicksilver”, and after publishing a “million iops” number (IOmeter), lots of us were waiting for the TPC results.

They never came. Quicksilver has not been heard from again. This silence (combined with the fact that IBM is the most prolific publisher of TPC benchmarks on the planet) speaks volumes.

I DID however recently see a shocker from IBM on SPC-1C/E, though almost nobody noticed. When IBM ran STEC SSD’s (again, on a configuration that was identical to thier TPC-C/E setups), the STEC SSDs:

(a) delivered only about 12% of STEC advertised IOPS (while HDDs usually deliver slightly more than advertised IOPS on SPC-1)
(b) cost-per-IOP was no cheaper than spinning disk (actully slightly higher when compared to Seagate 10K SFF disks)
(c) cost/GB was 135X enterprise-class spinning disk
The net here is that there is no market for Enterprise Flash SSD when dollars/IOP in REAL workloads are no cheaper than HDD, and costs/GB are more than two orders of magnitude higher than HDD.

This is EXACTLY the situation today, and an audited TPC-C/E “Cost per TPMc” result would illustrate this clearly and spell disaster for the Enterprise Flash Hype Party.

This would not be Good. EMC (and a few others) are making obscene margins on the SSDs that are already obscenely profitable for STEC (at $23,000 for 146GBytes!!!!). In this context, it’s not surprising that the overwhelming majority of sales that STEC is reporting are from EMC. This particular hype-cycle is a hugely profitable one.

It will be interesting to look at EMC’s inventory levels at the end of STEC’s fiscal year, to see how much sell-thru is really going on.

KD Mann Wednesday, 14 October, 2009 at 4:47 pm

The world’s first TPC-C result running on Flash SSD is out, courtesy of Sun and Oracle.

Well…not exactly. What we’re really seeing is the world’s first TPC-C running on a THREE-tiered storage solution, Flash SSD, 15kRPM HDDs and cheap SATA disk.

I haven’t finished sifting the ~500 pages of Full Disclosure Report, but one thing is clear so far. Even with the most write-intensive part of the workload kept far away from the SSDs (log files are striped on 384 fast spinners), the Flash SSDs didn’t even reach 3x IOPS/Disk improvement over HDD.

HDDs are reliably good for about 600 TPMc per disk in these large scale systems. In this test the 4,800 SSDs were only 2.5x better than HDD — around 1,500 TPMc/SSD.

Oh well…so much for replacing hundreds or dozens…or even a handful of spinning disks with a single Flash SSD.

As far as storage provisioning costs, the 3-Tier approach looks to be about 1.5x more expensive than a single-tier of HDD — using Oracle and HP’s previous big machine for comparison. Of course that doesn’t include the costs of managing three islands storage instead of one.

http://www.tpc.org/results/FDR/TPCC/Sun_T5440_TPC-C_Cluster_FDR_101109.pdf

rockmelon Monday, 19 October, 2009 at 2:04 am

KD,

I think you need to give Enterprise Flash a fair run for it’s money.

In these comments some 5 or 6 weeks ago, you were claiming that we would
never see an audited TPC benchmark using Flash because it would be jaw-
droppingly expensive and that Flash $/IO would be no cheaper than conventional
HDDs for real workloads. You really ought to test some Intel X25-E or Sun
SSD products (I have) before making these assertions.

Now that there is such an audited benchmark, you see fit to bash it because
it was not done with 100% Flash.

Every viable storage product has its niche …

Looking at the benchmark executive summary, I see 8.35 M$ of server storage,
of which 6.62 M$ is F5100s. (Incidentally, 2.08 M$ of server hardware
and 7.88 M$ of server software [Oracle licenses] rounds out the 18 M$
total system cost.)

I really doubt that Sun/Oracle would configure 80% of their storage budget
on Flash unless it was beneficial to performance and price/performance.

The 508 page full disclosure report may be somewhat daunting, but thankfully
the juicy parts relating to the storage configuration are detailed within the first
20 pages.

The next biggest item of the 8.35 M$ server storage was 0.95 M$ spent on
24 x ST6140 conventional disk arrays. On page 12 of the full disclosure report
we see that these were connected via 4GB FC, two per database server node,
as Oracle log files. Each ST6140 was comprised of 16 x 300GB SAS
drives and they were mirrored by the database. A generation ago, databases
could log to tape (log IO is sequential), so disk really is the new tape :-)

It makes economic sense to log to conventional HDDs, where the important
metrics are MB/sec/$ and MB/$.

Sun/Oracle would have put 80% of their storage $$s behind Flash and 20%
behind HDDs because it made the most sense to optimize the benchmark
metrics.

KD Mann Monday, 19 October, 2009 at 9:09 am

rockmelon,

Regarding “I think you need to give Enterprise Flash a fair run for it’s money.”

I’ve taken up a countervailing position vs. the prevailing hype. What’s not fair? The Flash SSD value proposition, as it has been presented, always plays on some variation of “replace tens or hundreds of spinning disks with SSD”, and “the much higher cost-per-GB of SSD is justified by much lower cost per IOP”. The basis for both the outright replacement scenario and the “Flash Tier” scenario is that cost-per-IOP is one or two orders of magnitude cheaper than HDD . It’s not.

The business case for Flash falls apart if cost/IOP is not significantly lower than HDD in…and here’s the key…real world, real application workload scenarios. Over the past several weeks, we have seen the first audited, application benchmarks utilizing Flash SSD for realistic “transactional” workloads.

None of them even demonstrates Cost/IOP parity with HDD, much less a cost advantage. In other words, Enterprise Flash Reality is at least an order of magnitude different from Enterprise Flash Hype.

– In SPC-1C/E, SSD cost/IOP was slightly higher than HDD, not lower.
– In SPC-1, SSD cost/IOP was 2x-3x higher than HDD
– In TPC-C, the three-tiered SSD/HDD/HDD setup resulted in storage cost/performance ~50% higher than a single tier of HDD.

I stand by my remark…if Sun had attempted to run the entire TPC database on a single tier of Flash, the cost/TPMc would have been at least 5x higher than HDD. Even with a three-tier setup, Flash still increases storage provisioning costs dramatically. Given that we’ve been told for so long that Flash SSD was going to reduce storage costs for I/O intensive applications — I’d say “jaw dropping” is a reasonable way to describe the size of the gap between hype and reality.

I’d add another observation. These first-ever audited benchmark results align quite perfectly with the (widely ignored) conclusions reached here, which also happen to be my own conclusions:

http://research.microsoft.com/en-us/um/people/antr/ms/ssd.pdf

rockmelon Monday, 19 October, 2009 at 8:36 pm

KD,

that would be the Microsoft paper which looked at several of their in-house
apps (notably none of them OLTP though one called “websql”) and concluded:

“Depending on the workload, the capacity/dollar of SSDs needs to improve
by a factor of 3–3000 for SSDs to be able to replace disks.”

Well 3X is not a huge leap before some of their apps become economically
viable to be deployed on SSDs.

The Storage Anarchist (Sr. Director and Chief Strategy Officer for the Symmetrix
Product Group within EMC’s Storage Division) made these comments about
that paper in http://blogs.netapp.com/shadeofblue/2009/02/emc-says-ssd-is.html

“And here’s but one flaw in Microsoft’s analysis: they presumed an expected wear-out of flash chips after 100,000 erase+write cycles – built it right into their models and calculations.

Plug in the real-world cell wear numbers of the ZeusIOPS 146GB or 300GB FC-based SSD and today’s pricing, and Microsoft’s math changes dramatically.”

The Microsoft Tech Report was published in April and considers only a single
SSD – the 32GB Memoright MR 25.2 which costed $23/GB. Here we are in
October and a better buy would be the Intel X25-E at $15/GB
http://www.cdw.com/shop/products/default.aspx?EDC=1774816

Additionally, the SLC-based X25-E offers 10X the read IOPS and 5X the
write IOPS afforded by the Memoright. Most Enterprise-focussed users
would stick with SLC, but MLC can be had for $5/GB, e.g. Intel X25-M -
http://www.cdw.com/shop/products/default.aspx?EDC=1736412

Again, with 5X the read IOPS and 10X the write IOPS afforded by the
Memoright. For the light-duty IO loads of the Microsoft servers, MLC would
probably suffice.

Another interesting takeaway from the Microsoft report was:

“Of the three enterprise-class devices shown Table 4, the
Cheetah 10K disk was the best choice for all 49 volumes.”

I checked the executive summaries for the next 5 highest throughput TPC-C
publications and note that they exclusively use 15k rpm drives. Does this
mean that the benchmark engineers at IBM, HP & Fujitsu are clueless or that
the fileservers traced by Microsoft are not as demanding in IOPS as TPC-C ?

Another thing the Microsoft report acknowledges ignoring:

“The approach in this paper is to find the least-cost configuration that meets
known targets for performance (as well as capacity and fault-tolerance) rather
than maximizing performance while keeping within a known cost budget.”

What if you are trying to manage response times downwards as well ?

This lack of response time consideration also feeds into their quantitative model
where this important parameter is ignored. For example, a Seagate Cheetah 15K
drive is considered good for 384 Read IOPS, but does not factor in that the
response time from the drive at this throughput is probably 30ms. I’m watching
an Intel X25-E at the moment doing a little over 24k IOPS of 4KB random
read, average queue length a little under 4 and the response time is 160-170
_micro_seconds. 60X the thruput and 180X better latency….

Wake up and smell the roses!

One thing Microsoft got right was to recognize the multiple dimensions
of price performance which they show in Figure 3: IOPS/$, GB/$ and MB/s/$.

Not even the most ardent Enterprise Flash protagonist would suggest deploying
SSDs everywhere. Flash has its place, disk has its place and so does tape.
Database logging does not need IOPS, it needs Gigabytes, so disk makes
economic sense there.

I’ll finish with a rough calculation of my own. If Sun/Oracle had used
conventional disk instead of Flash, they would have needed something
like 85 racks of 160 drives. Allowing 3kW/rack and 20 square feet/rack,
going with flash saved 1700 sq.ft of floor space at 150 Watts/sq.ft. The cost
to construct that floor space at that power density would be around
$2000/sq.ft – 3.4 M$.

KD Mann Tuesday, 20 October, 2009 at 8:56 am

Rockmelon,

Your comments on the Microsoft paper contain a number of errors and misinterpretations. I only have time to touch on a few of these, and encourage others here to look a the paper itself.

Regarding your assertion of inflated SSD costs in the Microsoft paper due to faulty “wear out” calculations, it appears neither you nor EMC’s “Storage Anarchist” read the paper. It says:

“However, even if we conservatively assume a high overhead of 50% (one background write for every two foreground writes), the majority of volumes have wear-out times exceeding 100 years. All volumes with the exception of one small 10GB volume have wear-out times of 5 years or more. Hence, we do not expect that wear will be a major
contributor to the total cost of SSD-based storage.”

From this it’s absolutely clear that Barry Burke did not plug any STEC numbers into Microsoft’s model, wear, pricing or otherwise.

As regards the cost of the Memoright SSD vs. Intel; $23/GB vs $15/GB, this also doesn’t change results of the cost-benefit equation for any of the applications modeled. Far more importantly though, the “enterprise class SSD” market is typified by EMC/STEC at ~$180/GByte and by Sun at $80/GByte (FMOD landed in an F5100 socket). Intel’s SSDs are not currently part of that landscape — the only player of note who tried them, Pillar Data, recently dumped Intel for STEC.

Meanwhile…the STEC and Sun prices represent some pretty outrageous margins on SLC FLash — the kind of margins that drive hype-cycles.

As regards the performance numbers you provide for X25-E, “…60X the thruput and 180X better latency.”, these are quite similar to typical manufacturer claims — claims that reliably fall apart in real-world tests and audited benchmarks. For example STEC claims 33K read IOPS and 17K write IOPS at 4KBytes, 250MBytes/sec. and with microseconds latency, and “200x faster than HDD”.

When run against SPC-1, the $13,000 STEC units deliver only 3.4K IOPS and 28Mbytes/Sec. with response time only about 1.4x (not 180X) better than HDD. I’m wide awake, thank you, and those numbers are orders-of-magnitude worse than either you or the manufacturers are claiming…I don’t smell any roses.

The numbers Microsoft reported for their SLC example device are much closer to how these devices actually perform in real applications than the IOmeter benchmarketing numbers you and Intel quote for X25-E. Given Pillar Data’s “Oracle heritage”, I doubt that Pillar dumped Intel for STEC if Intel was delivering the goods — and we have yet to see the first audited benchmark on an Intel SSD.

Regarding “85 racks of 160 drives” and “3KW per rack (160 drives); as you were reviewing TPC results you might have noticed that nobody uses 3.5″ HDDs anymore. Nowadays, these are 2.5″ HDDs that use 5-7W, and 40-60 of them fit in 3RU. Your rough calculation of racks required is off by roughly 10x.

If you base the numbers on current HDD technology you’ll see that it would take more than 10 yrs for the Sun Flash SSD setup to pay for itself in energy savings. That analysis is also included in the Microsoft Research model — real-world energy savings from Flash are orders-of-magnitude smaller than the costs incurred in provisioning.

Taylor Wednesday, 21 October, 2009 at 4:34 pm

It seems pretty clear to me. Compare the #1 and #2 (or #3 or #4) TPC-C results. The IBM disk array is 68 racks full of spindles. The Sun solution is 7 racks. Storage system COGs works out about the same ($9m) given the 57% discount from retail on the IBM systems. I’d wager that the Sun system consumes a wee bit less power overall.

The Sun solution costs 6% more initially but performs 28% better. It takes up (looking at storage only here) 90% less floor space.

The Bull solution uses roughly the same disk setup as the IBM. The HP after that uses 2.5″ drives, taking up 44 racks.

KD Mann Friday, 6 November, 2009 at 4:00 pm

Taylor, re:

“…seems pretty clear to me. Compare the #1 and #2 (or #3 or #4) TPC-C results….”.

The Flash SSD value proposition is cost/performance. In this light, the “PR stunt configurations” (I think Steve Jones above called them “insane”) are not very relevant to real world customers. This is evidenced first and foremost by observing that the cost-per-transaction-per-minute in the stunt-class averages 5x higher than that of the top-10 cost/performance systems in the “real-world” class.

http://www.tpc.org/tpcc/results/tpcc_price_perf_results.asp

Getting back to the real world, we need to look at the top systems from a cost-performance perspective, and see whether Flash can deliver any improvement.

We need to look at systems like these instead:

http://www.tpc.org/results/individual_results/HP/HPML350G6OELTPCC_ES.pdf

In the lowest cost-per-TPMc system (above), the entire storage infrastructure — connectivity included – costs about $430/spindle, and each spindle is good for 1,160 TPMc.

In the Sun F5100 SSD result, the total storage infrastructure costs (including the two other disk tiers and connectivity) are about $2,000 per SSD, and 4,800 SSDs were supporting only 1,583 TPMc each.

Given these numbers, I’m pretty sure you can’t plug $2,000 SSDs into the leading cost-performance system without quadrupling the price. Haven’t done the power-savings calculations, but I’m pretty sure it would take more than 10 years for the SSDs to pay for themselves.

Leave a Comment

Previous post:

Next post: