Companies and practitioners spend billions of dollars a year on RAID to protect against disk drive failure. Yet all the research I’ve seen shows that the most common reasons for data loss are, and always have been, caused by people: accidental file deletion and operator error. Why don’t we spend billions on those problems instead of disk drive failure? We aren’t rational about risk.
Bruce Schneier is the founder and CTO of BT Counterpane security. He is a witty and smart writer about security and security technology and is highly recommended. While reading his recent post Perceived Risk vs. Actual Risk it flashed on me that much of what I find goofy about the storage industry might be explained by Schneier.
Now, I could have done the obvious and called him up and asked him to actually explain it, but what fun is that? Instead, I’m going to apply some of his ideas to storage practice and marketing. Just for the record, many of these are actually the ideas of Daniel Gilbert, a psych prof at Harvard, (but I’m not holding that against him) whose book Stumbling on Happiness talks about why we are bad at predicting the future. A short intro to his work is this charming article If only gay sex caused global warming.
Schneier quotes himself from his book Beyond Fear on some of the common misperceptions:
People exaggerate spectacular but rare risks and downplay common risks. They worry more about earthquakes than they do about slipping on the bathroom floor, even though the latter kills far more people than the former. Similarly, terrorism causes far more anxiety than common street crime, even though the latter claims many more lives. Many people believe that their children are at risk of being given poisoned candy by strangers at Halloween, even though there has been no documented case of this ever happening.
File deletion is equivalent to slipping on the bathroom floor. Why not, for example, put deleted files into the trash for 10 days so you’ll have time to reconsider?
People have trouble estimating risks for anything not exactly like their normal situation. Americans worry more about the risk of mugging in a foreign city, no matter how much safer it might be than where they live back home. . . .
It is difficult to pick out the most likely occurrence from several unlikely choices, or even rank them. Perhaps this explains why so many firms have problems after an incident. They prepared, but not for the incident that actually occurred.
People underestimate risks they willingly take and overestimate risks in situations they can’t control. When people voluntarily take a risk, they tend to underestimate it. When they have no choice but to take the risk, they tend to overestimate it. Terrorists are scary because they attack arbitrarily, and from nowhere. Commercial airplanes are perceived as riskier than automobiles, because the controls are in someone else’s hands — even though they’re much safer per passenger mile. . . .
Back up our precious data to good old tape, where the failure rates range as high as 40%? No problem. Outsource our data archive to Cleversafe or Amazon? A scary thought.
Last, people overestimate risks that are being talked about and remain an object of public scrutiny. News, by definition, is about anomalies. Endless numbers of automobile crashes hardly make news like one airplane crash does. . . . If a lunatic goes back to the office after being fired and kills his boss and two coworkers, it’s national news for days. If the same lunatic shoots his ex-wife and two kids instead, it’s local news…maybe not even the lead story.
Gosh, so what is being talked about these days? Hmm-m. Disk error rates: you need RAID 6! Power density: you need to buy low-power chips! Pick your favorite. It isn’t that these aren’t issues, but we all got along last year without knowing or worrying about them and yet, somehow, now we are. Why?
Comments welcome as usual. Go ahead, take a chance!
I didnt quite get this –
“Back up our precious data to good old tape, where the failure rates range as high as 40%? No problem. Outsource our data archive to Cleversafe or Amazon? A scary thought. ”
Are you saying that outsourcing data to CleverSafe/Amazon is not as bad as people make it out to be ?
I’m saying that people are comfortable with what they control, even if it doesn’t work very well, and risk-averse when it comes to something where they don’t have control.
Obviously, Amazon maintains a 7×24 financial transaction-based business that rivals just about anything out there. They use S3 internally. Realistically, its failure rate must be at least 100x lower than tape.
Yet none of the on-line backup solutions I’m aware of has taken off in a big way. Why? Bandwidth is one problem, of course. Yet it also seems to me that the control/security issue is primary. Not rational, and very human.
Robin
Yep! You are right on!
I remember the first time I heard SPOP, Storage Point(s) of Presence. I thought to myself, “What a great idea whose time has come”.
Are there any xSPs, other than ISPs still in business? I wonder how SOA will affect this? SOA is a big bear to handle for people who balk at the complexity of ILM?
Offshoring and Outsourcing have become common. OutStoring is still a bad word. It is not a bad idea.
Take your corporate laptop. How often is it backed up? Regularly you say. Great! Where, and in what form, is the backup stored? Tape at Iron Mountain? Local disk at the office?
Regardless of where and in what form, what is your RTO (Recovery Time Objective) and RPO (Recovery Point Objective) for the laptop?
It makes a big difference in the Strategy and Solution to use.
Personally, I want my 100GB of “Personal Information” restored in 1 hour, preferred, or 8 hours maximum. Assume the disk is full. That’s either 100GB or 12.5GB per hour. Sounds easy. Anybody benchmarked their laptop to see what they can get in the real world?
Consider this Strategy. I make regular backups to multiple Online Storage services. At least three. Preferably the three best. Determining that is another post. This is automatic and done while I sleep or any time I am online. CDP like.
I lose my hard drive, or worse yet I lose my laptop, everything is out there accessible by me, or my authorized agent, 24x7xforever?x or 365?
All I need is a new drive, even an external one if my laptop will boot from it, and I am off to the races.
If I am really good at Solutions then I have a local external drive with my “up-to-date” laptop image on it which is my local failover and short term Information Storage. I just boot up off the external and keep working until the new disk arrives and I have time to restore it.
That same scenario can be used from a personal laptop to the Enterprise Data Center. Only the granularity changes. Think in terms of a local SPOP for the Data Center.
Robin,
This is a very good question … why has it not taken off in a big way ?.
Why not replicate to two places..? Is this a problem with cost… it builds up if you need to feel very secure.
Also, it is only human to do the backup locally, if you already retain an in-house infrastructure in order to operate your business.
Perhaps one needs to offer more value, at higher cost.
For example… given sufficient direct bandwidth…. a remote mirror, backed by CDP plus traditional backup, all from one remote Storage Center.
The customer is protected when his primary storage fails (continues to run off the remote mirror)… and is able to restore when the local system is back “on airâ€.
Robin, sorry for couple duplicated trackbacks this weekend. I organized post syndication from my company blog to my personal one and, while temporary broke my feed to avoid flooding, I forgot to disable pingbacks. Please feel free to delete this comment and duplicate trackbacks.