The customer disconnect between what they need – I/O – and what they buy – gigabytes – is boiling over in the storage world. Hitachi Data Systems’ always thoughtful Hu Yoshida had a couple of posts whose juxtaposition crystalized the problem. The first post noted, in reference to Web 2.0 companies and the disk drive’s 50th birthdary, that
A great debt is also owed to the engineering innovations . . . which drove the cost of random access storage from about $50,000/MB to less than $0.002/MB. Storage is now cheap enough to give away freely and still cover costs and make a profit based on the services they provide from the stored content. [emphasis added]
Then, in an earlier post on industry revenues, he talks about storage consolidation:
. . . the growth rates for external disk storage have increased to about 60% over the last two years, while utilization of storage has been dropping to about 30%. . . . The CIO of a large financial company with over 2PB of storage said that his storage was only 20% utilized, and over 70% of it was expensive tier 1 storage.
Expensive, underutilized asset or cheap service?
Both observations are true. Both are in dire conflict. Yet it is a weak mind that can’t hold two opposing ideas at once. So what does it mean?
Two illusions
Economists refer to the money illusion, a term for an individual’s belief that a larger number of dollars means being richer, even if inflation is eroding the value of those dollars faster than the value increases. Individuals look at the number of dollars they have, not their purchasing power, and they get fooled thinking are better off even as their purchasing power declines.
In storage, the capacity illusion reigns supreme. We measure storage utilization by looking at capacity in gigabytes, which, as Hu points out, is the cheapest part of storage. The expensive storage component is I/O. And the expensive management component is people.
Run the numbers
Five years ago, the average disk drive cost about $4/GB while the average cost of OLTP tpmc was about $20. Today, 3.5″ disks are about $0.30/GB and OLTP tpmc is about $4. So capacity is less than 1/10th the cost of five years ago, while I/O is about 1/5th the cost. The relative cost of I/O has doubled in the last five years.
Go 1,000 miles between fillups!
The bursty nature of most I/O means that storage systems have to be over-configured to meet peak traffic needs. So the complaints Hu hears from customers have an underlying cause: arrays are configured to meet I/O requirements, but customers buy gigabytes, not I/Os.
It is as if people bought automobiles based on how far they go on a tank of gas. Everyone could tout a bigger gas tank, but the real issue is mileage. Likewise, everyone touts capacity, but I/O is the issue.
It is past time for vendors to change the conversation from gigabytes to IOPS. This would help the big iron vendors sell more big machines, while focusing customers on the I/O demands of their applications. It isn’t the whole solution, but it would be a good first step.
Comments welcome, as always.
Interesting stuff.
What I get from this is that cheaper storage is only really cheaper if it is not accessed a lot. IOW great for backup and archival purposes. Not so great for the Youtubes of the world!
Nick,
My take is a little different. By measuring and marketing the wrong thing, the industry gets angry and confused customers, like the CIO Hu quoted. Not easy to change the metric – look at what Intel went through dropping MHz – yet getting customers focused on more important issues ultimately pays dividends: happier customers; more accurate competition; higher margins.
Robin
What you say (I/O is more important than GB of storage) is true in most cases but there are some of us that really *do* care about the GBs and not so much about the I/O. In this case, I’m talking about using disk strictly as an archival media (as a possible replacement for tape).
So, when will Hu Yoshida and HDS help clients by publishing SPC-1 and SPC-2 benchmarks?
Yeroc, life just keeps getting better for folks doing heavy sequential read/write of large files as capacity gets cheaper. As Jim Gray pointed out a few years ago, disk is the new tape.
Chuck, I sent you a note asking about how valuable those benchmarks really are. I hope you’ll respond as it can be controversial.
Robin
Selling in terms of IO is great but that leads right back to capacity. A disk drive can only attain so many ops, so if you want X IOPs from the storage system you need Y disk drives to provide it. And since capacity is cheap it makes little sense to buy less than 300GB drives, which brings you right back to capacity.
Specifically, if I know I need 500,000 IOPs from a system, and I know (educated guess) that a 10k FC drive (144 or 300) will give me 170 iops, that means I need ~2,941 disk drives to achieve that. Or 882TB of capacity at 300GB drives.
In essence, buying by capacity also gets you IO.
Jeff, we know that because we’re smart, good-looking guys with stunning storage Mojo. The CIO’s problem is that the CFO is a terrible pinchpenny. So at budget time he’ll use any number he can to show that IT is wasteful and should have their budget cut.
Decreasing utilization is a natural outcome of the fast growth of capacity and the slow growth of I/O. So storage vendors should be figuring out how to make their customer, the CIO, look good despite the awful numbers.
Then there is the problem that most customers don’t know what their I/O usage looks like. They just know that faster is better, so they buy faster.
There is another alternative: go to lower capacity drives; which also means smaller form factors. And smaller boxes. Yet that is just temporary because the capacity/IOPS gap is still there. So at some point somebody like EMC will realize that their customers will be happier buying IOPS instead of capacity they can’t use and will start to move in that direction.
“Then there is the problem that most customers don’t know what their I/O usage looks like”, said Robin above. I’ll agree to that.
Going further, this seems to beg the question as to how customers can/will actually determine what their I/O usage looks like (and not only down at the device but also up at the other end of the cable, i.e., the application).
Interesting dialogue. I am commenting on this on my blog. http://blogs.hds.com/hu
I am also responding to Chuck on the SPC benchmarks.
The permalink for Hu’s response is http://blogs.hds.com/hu/2006/10/the_capacity_il.html
I’d be interested in seeing comments from people who read it.
Hu makes the point that customers find it cheaper to buy more storage – driving down utilization – than to manage it. Which is true and a sad commentary on storge management vendors.
Yet the secular trend is ever more expensive IOPS and ever cheaper GB. Will there ever be a crossover point where customers slap their foreheads and say “IOPS are the critical metric!”
Robin
The idea of dropping GB as a criteria for disk is a strange one; disks are, after all there to store data. And yes it does matter how fast you can fetch/retrieve it but GB is a pretty concrete number whereas IOPS is anything but. I mean, just off the top of my head some things that will affect the IOPS of a disk in an array being accessed by a single host are the physical characteristics of a disk, the number of disks behind the controller, the number of links from the controller to the host, the protocol used to transfer the data from controller to host, the type of HBA on the host (offload or not) and the application accessing the data. And each of the components listed can have multiple concurrent access. And that’s before we’ve even started to pick on the question of what an IO is (are we talking 1 IO per block? Per NFS operation? What about mirroring, what about cached operations?) So an IOPS measure would be, for me at least, meaningless as it lacked the context of my data access patterns and volumes.
If you want to talk about this way of measuring disks (or disk arrays) then it would have to be in terms of % capacity against a given application load e.g. your application will load the array by 20%. But of course that requires a lot of work profiling the application and translating from an existing (or yet-to-exist) setup to whatever new array is being looked at. Not an easy task by any means.
So yes, customers will purchase capacity that they don’t need because it gives them the overhead to be sure they will meet their capacity and performance demands, and to Hu’s point yes this will often lead to lower utilization but if the application has the capacity and performance that it needs and it was not significantly cheaper to purchase smaller disks then the utilization number is an artifical metric.
Apropos of this, I think that the question of unused storage is the wrong way around. I wonder why the array vendors aren’t finding ways of making more use of this ‘unused’ storage? I’m not talking about traditional virtualization because that’s another technology that requires user input but what about automatically taking the spare disk in an array and using it for additional mirroring, longer-term snapshots, spreading data across more devices, etc?
BTW on the numbers in the initial posts, you should look at measuring the increase in active storage against IOPS not total storage. For example, looking at my home server in the last few years its storage has gone from about 40GB to about 400GB, but the IOPS against it have not risen anywhere near 10x and so capacity is still the primary factor; I suspect that this is similar for very many applications.
Jim.
To help move the focus beyond simply gigabytes/capacity, I believe that empirical metrics are needed, especially those that reflect actual “production” workloads (and not simply benchmarking results, but rather in situ within customer environments). And when applying such metrics to the assessment of storage utilization, I suggest promoting due focus upon a “top-down” approach, which starts off by looking at storage usage and performance from a particular application’s perspective.
As Jim pointed out above, the “capacity” metric (i.e., GB) is fairly straightforward – although thin provisioning and virtualization (as examples) can introduce ambiguity. But as also mentioned above, usage/performance metrics (which reflect exactly how – and how well – the storage is used) entail much more. Along with IOPS, there are throughput (MB/s), response time, data transfer amount, random/sequential access, cache hits/misses, queuing, and contention along with other metrics that might be considered.
What appears difficult is the actual collection of such usage/performance metrics, particularly by customers in a ready manner and in terms relevant to their own particular applications. It seems to me that addressing this difficulty is one of the key steps required in order to look beyond the capacity illusion.
Is this discussion a classic example of the old economics maxim that one should waste what is cheap and conserve what is expensive? To wit, capacity is cheap, I/O and management are expensive.
CFO level folks should be able to comprehend the wisdom of this vis a via storage budgets. A storage strategy that reduces management cost (labor $/used GB) is superior to one that seeks to increase storage utilization (First cost $/total GB installed).
Best.
As Tom says…
“What appears difficult is the actual collection of such usage/performance metrics, particularly by customers in a ready manner and in terms relevant to their own particular applications”.
This absolutely true. What is needed are some tools enabling the measurement of the actual I/O per second requirement at the application level …. and the ability to verify the actual performance of the system at run-time.
Any suggestions as to how this can be achieved … perhaps Hu may comment?
Taking what Brook said above, remember that price does not equal cost. For every dollar buying disk, you will spend between $3-6 dollars in ownership costs (electricity, maintnenace, software, sw maintennace, labor, network connections, backup, data protection, RAID overhead, security etc). We can talke about storage econoimcs (price of disk) but cannot exclude data economics. After all, we are storing and protecting data.
Cheers
David, thanks for stopping by.
Your comment is correct, as far as it goes. However, I would argue that many of the TCO items listed are part of the problem.
When RAID arrays were developed, capacity was expensive and I/Os relatively cheap. Which is why the bright but impoverished academics and students at Cal came up with the idea of using small, cheap, unreliable drives to build a big reliable drive.
Now the world is different – or at least the technology is – and capacity is cheap and I/Os expensive and getting more so. Therefore, I submit, if Patterson et. al. were designing a fast, very big, very reliable drive today, it would look very different.
How? For one thing, lots of copies on different disks would provide both reliability and performance. Writes would be bunched for sequential write performance. Overwriting would be a background garbage collection function rather than a function of writing. Variable stripe writes might be implemented for performance. Cheap mirrored independent controllers might maintain small write caches if needed, rather than today’s costly dual-port caches.
I don’t know what a really clever team of engineers would design. I’m real confident that it would not be the 20 year old architecture we use today. That is why the discontinuities I saw in Hu’s posts are significant: customers are feeling uncomfortable, they don’t know why, and it is pointing to a bigger problem.
The company that designs and successfully markets the next generation (RAID 2.0?) storage array is going to make a hell of a lot of money. Why doesn’t HDS do it?
Robin
WOW! I guess the Ferrari does make the man!
_or_
WOW! I guess the _______? does make the Storage man!
IOPs are kind of hard to see on the same level as Ferrari’s and manhood.
We need the more easily discerned Yottobytes, or above.
Actually it has only to do with marketing. Perhaps by Strategy or perhaps
by Serendipity vendors discovered that revenues were climbing by
pushing Gigabytes. Not above, yet. Yottabytes are still too frightening.
How would you manage Yottabytes with today’s tools?
On the consensus building, solution providing side let’s ignore having
a Strategy for the moment.
Let’s concentrate on something we already know – Enabling Technology.
That’s all IT is. It uses “product” to “Enable” the delivery of Information to
a client or a Profit delivery point.
Period. End of story!
Think of your rational in buying a car. Even women now prefer fast, red
sporty cars to anything else. We all know about the “jack-rabbit” start and
getting ahead of the “other” guy with our fast car. We all want a car that will
go 125 mph “in case we need it!”.
Why do we shun IOPs and bask in the glow of Gigabytes, and above?
Because it is common parlance and well understood. It makes people
like us because we talk their language.
Back to consensus building and solution providing,
How about thinking in terms of Information High Availability and Information
Integrity? What are those? They are concepts that are easily translated
into Strategy by ordinary mortals. No vendors allowed except by express request!
Look at the Robin’s magnificent work delivering “The Word” about Internet
Data Centers to the Blog masses. I’m now hearing that phrase everywhere.
StorageMojo has become a household read.
Preferring Gigabytes over IOPs tells me is that no one is really using
your magnificent IT infrastructure for making money. It is more a
playground for the wealthy, or wanna-be wealthy.
If the “Enabled” Information was in demand clients would be complaining
about the slow or “no” delivery.
Perhaps you have sufficient IOPs or bandwidth to more than handle the
demand you have.
What growth trends are you tracking? I/Os? IOPs?
What Development meetings do you sit in? What is the next big bag of
goodies that will be thrown over the wall in the dead of night?
What would an I/O map of your IT infrastructure look like? Ever seen one?
What is the Speed Limit of the Information Universe in your shop?
Well, have you forgotten about the increased functionality and availability here? I love the OLTP reference because it is quantified and can be related to price (another quanitifiable metric), but in order to truly assess storage’ relative costs, you need to consider all of the metrics that may similarly be considered in a real buying situation TODAY (below). vendors offer varying amounts and capabilities behind each of these, which drives price up & down.
over the last 20 years storage arrays now have –
1. interoperability with server and application APIs
2. enterprise management software
3. replication software
4. checksum software (IE for Oracle’s HARD certification)
5. multipathing
6. increased security with varying levels of such
7. compliance for federal and international regulations, archiving, WORM
8. snapshotting – pick a flavor
9. RAID6
10. arrays that offer 100% availability (no, I am not from one of these vendors)
So the quick analogy in this blog is “neat”, but incomplete. Do companies talk about how many PB they have the floor? sometimes, but the companies spending the most $/GB are generally not the largest, just the most interested in the well-being of their data. (maybe you could make a “well-being index” :>)
Would love to see more.
Thanks
Rob,
The “well-being” index is a great idea. I have been working on something like this for a while. The need for this is now accelerating as the desire for “invisible” IT infrastructure becomes very important.
Hu Yoshida is talking about the “Invisible Cloak” and he quotes Steve Duplessie’s Rants Blog where Steve talks about “invisible” came to him in a dream as the IT of the future.
I have been working on defining the static baseline for IT health plus the dynamic health for a while. You can get a report card of the current state of “well-being” of any or all of the IT infrastructure.
I borrowed the words “well-being” index from a post by Rob on StorageMojo.
Parameters of interest in the “well-being” index are seamless, transparent and
invisible for Units of Information and their “Enabling” Units of Technology.
What are all the factors impinging on this Unit of Information?
The easiest and most obvious is the “Enabling” Unit of Technology. This has its
own “well-being” index parameters.
Another “Parameter of Interest” is the “Doubt” parameter.
“Doubt” means:
We want to trust system.
We do not want to believe in system.
Why are any of these visible to the user?
The UX is all that is important to the user.
UX is Peter Morville’s User Experience.
Thinking that a disk array performance should be measured only in terms of IOPS is not actually true. It also depends on response times, delays, latency. Also if the IO rate is delivered for read or for write operations, for sequential or for random IO operations, for OLTP or for OLAP based applications, etc. There is no easy way of selling IOPS and there is no easy way of guaranteeing performance to a customer.
Benchmarks help to figure out a system performance but they don’t guarantee performance for every single case and to make it worse there are few benchmarks available on the storage array arena.
I don’t agree that we should better focus on performance, for me everything is important: array features, management, capacity, cost, performance.
If you plot the customer needs on a multdimensional graph and you could give weights to all his needs then you would find that for every customer different things are important on a different degree and so they’re trying to get a solution that fits all their requirements.
Selling IOPS seems to me a difficult task, because storage performance depends on the array cache, processors, architecture, capacity and also on customer requirements and its environment. That’s why no vendor gives you a cost per IO.
That doesn’t mean that a customer only buys capacity but no IO. The customer buys a lot of things, believe me. Cache, processors, front end directors, back end directors, software, features, licenses.
Great topic! Since my view on such things are very Oracle-centric, I will say that one significant step would be if all intelligent arrays would allow you to create LUNs that consist of partitions only on the outside ~50% of each platter…and if storage admins would do such for their arrays that support this approach. The inside tracks of the drives can be used to whatever non-peak I/O requirements there might be. Really smart Oracle shops have been doing that for years.
As an aside, myself and other Oaktable.net members routinely preach IOPs over capacity at conferences and infact, we were just doing that at UK Oracle User Group last week.
This is probably the best comment thread in StorageMojo.com’s history. Thank you everyone for contributing.
I’d like to add some observations:
-I’ve never seen a three element measure of goodness, such as I/O per $/GB, actually work with customers. Even smart customers who can do the math and understand the concepts and see the merit just tune out. I suspect we are up against some cognitive wrinkle that isn’t going away.
-The same cognitive limit (1, 2, many) applies to feature weighting. I’ve seen customers with long check lists, very complete, and when they finally have weeded out all the people/products they don’t want to deal with, the competition between the two or three remaining vendors gets very subjective. A neatly implemented feature that captures interest suddenly outweighs several other boring features.
-The fact that we’ve got all this investment in stuff that ties storage arrays into our infrastructure begs the question: is this stuff enabling or an encrustation? Sure, it works, but can we afford it? Will it prove flexible enough to enable IT to compete with the non-encrusted infrastructures of on-line providers?
Thanks all for commenting. I clearly tapped into something here, and I’ll try to figure out another approach to it to spur more dialogue.
Robin