Reactions to the post on Amazon’s Glacier secret were varied and sometimes enlightening – with one savvy observation that I wish I’d made. The post made Hacker News (h/t to Mark Watson for the alert) and received 40+ comments.
A number of folks suggested that Glacier used some variation on what Amazon already does: disks in commodity servers. With wrinkles:
- S3 + code to ensure waiting.
- Special low-RPM disks designed to stay powered down, but racked and ready to spin up, with low-power controllers.
- Old hard drives that are no longer economical for more intensive service, supported by disk-handling robotics.
Long story short, almost everyone thought it was disks in some power-down configuration. Which makes sense if power is the driving cost for an Internet-scale data center.
But power isn’t the driving cost. It is one cost, but when you buy megawatts, your pricing is very different than at home.
Why not disks?
An easy reason is that the cost of a disk slot is significant. It has to be powered, racked, controlled and managed. While powering down the disks allows for power system over-provisioning – which lowers the cost per unit of the power system – you still have to cable it up. As StorageMojo noted in a review of a Google paper:
- The capital cost of provisioning a single watt of power is more expensive than 10 years of power consumption.
- Data centers are most economically efficient operating at close to 100% of provisioned power.
- The greatest opportunity for power savings comes reducing the power consumption of idle kit, not from making busy kit more efficient.
Saving power is a Good Thing, but at Internet scale it is also a Different Thing. While power is important to operating expense, it is the capital expense – the first money in – that drives economic efficiency. In 10 years all the servers, switches and disks get replaced, so you can improve OpEx, but capital dollars sit there forever.
Unless the prices of copper, PDUs and diesel-generators have started following Moore’s Law, this is probably more true today than in 2007.
Bottom line: even if power cost nothing, nada, zip, you’d save at most 50% and probably less. And when Glacier came out, the savings over S3 were much greater, but even today it’s 1/3rd the price of S3. Power savings alone can’t justify Glacier’s pricing.
The savvy observation I wish I’d made came from StorageMojo commenter Nikunj Verma:
I can’t help but notice that “lean practitioners” would definitely see a strong case for doing above in the beginning. Why make huge investments upfront in actual datacenters without validating how big the market would be?
That answers the question of why supposedly ex-AWS people might believe Glacier is disk-based.
Glacier’s pricing wrinkle
Some people pointed to Glacier’s pricing as evidence, unless, as some suggested, AWS doesn’t need to make money. Uh-huh. But one point neither they or I mentioned is Glacier’s price for data deletion within 3 months of upload:
In addition, there is a pro-rated charge of $0.03 per gigabyte for items that are deleted prior to 90 days.
Thus AWS is intent on getting at least 3¢/GB out of all data uploaded to Glacier. Which suggests that they have some fixed costs they want to recover, such as, say, media? No deletion charge on S3.
The BDXL question
But NONE of the Hacker News commenters addressed Sony and Panasonic’s continued investment in high-density optical disc technology. Not only are the making triple-layer BDXL today, but they’ve announced plans to go from 300GB to 1TB over time.
It could be that they’re stupid and/or obstinate – which explains a lot of real world behavior – but unlikely given the financial stress both companies are under. There has to be a business reason for the continued investment, i.e. customers prepared to buy a lot of product in the future and buying a lot right now.
The intelligence community could buy a lot and probably does. But I’ve seen credible suggestions that Facebook and Amazon each buy petabytes of storage a week. If, as research has found, much of that data is not accessed after a few months, it would make sense for them to go optical, as FB has announced it is testing.
The need for higher data bandwidth also explains why Panasonic has a 12 disc optical RAID. With replication you could even skip the RAID.
BDXL discs on Amazon are at least $45 each. You can buy a 1TB disk for about that. So somebody is buying BDXL discs in bulk or they wouldn’t exist – and it sure isn’t consumers.
The StorageMojo take
The biggest surprise of the Hacker News comments was how reductionist most views of the issue were. Cheap storage? Powered down disks.
But power isn’t the major driver of cost at Internet scale.
Maybe AWS is making stuff up, or don’t need to make a profit on the service. In competitive analysis, you assume you’re dealing with a rational actor, or anything goes. That may over-estimate their smarts – as the British did looking at German radar during WWII – but at least you won’t be caught unawares. Much better than under-estimating smarts, as the Germans did with Enigma decryption at the same time.
The solution space has to take into account these facts:
- Glacier is significantly cheaper than S3
- They charge for deletions in the first 3 months
- Power is not the driving cost for Internet scale infrastructure
- Sony and Panasonic continue to invest in a product that has no visible commercial uptake
- Facebook believes optical is a reasonable solution to their archive needs
So unless these aren’t facts, the answer points to optical media. But please offer another suggestion.
Courteous comments welcome, of course. Commenters, start your engines!