Update: Read the entire article on one page here.

Storage Cost and Implementation

The Internet Data Center (IDC) is architected very differently from an Enterprise Data Center (EDC). In an EDC, RAID arrays are used to hide the disk’s physical limitations. In the IDC the infrastructure is designed to work with those limitations to reduce complexity, increase availability, lower cost and optimize performance. It seems likely that at some point the EDC will have to follow suit.

As in Part I, I look at the six year old paper Rules of Thumb in Data Engineering by Jim Gray and Prashant Shenoy and relate what they conclude to the trends we see today in the IDC. The value of this exercise is that Rules looks at critical technology trends and draws logical conclusions about the resulting IT model we should be using. The IDCs stand as a test of the paper’s conclusions, enabling us to see how accurate and relevant the metrics the authors use are to the real world of massive scale IT.

Disk and Data Trends
As Rules notes, disk trends are clear and quantifiable. For example, in 1981 DEC’s RP07 disk drive stored about 500 MB and was capable of about 50 I/Os per second (IOPS), or 1 IOPS for every 10 MB of capacity (it also was the size of washing machine). The hot new Seagate 750 GB Barracuda 7200.10 is capable of 110 random IOPS, or about 1 IOPS for every 7 GB. So in 25 years, despite all the technology advances, this amazing device offers 1/700th the I/O performance per unit of capacity.

Looked at another way, in two and a half decades the ratio between disk capacity and disk accesses has been increasing at more than 10x per decade.

Gray and Shenoy conclude these trends imply two things. First, that our data has become cooler, that is, there are far fewer accesses per block than in the past. Second, disk accesses are a scarce resource and have grown costlier. Disk I/Os need wise use to optimize system performance.

IDC Adaptations to Disk I/O Rationing
IDC architectures reveal an acute sensitivity to disk I/O scarcity. Since Google has released the most detailed information about their storage, I’ll use them as the example. From the limited information available it appears the other IDCs use similar strategies, where possible, or simply throw conventional hardware at the problem, at great cost (see Killing With Kindness: Death By Big Iron for a detailed example).

Two I/O intensive techniques are RAID 5 and RAID 6. In RAID 5, writing a block typically requires four disk accesses: two to read the existing data and parity and two more to write the new data and parity (RAID 6 requires even more). Not surprisingly, Google avoids RAID 5 or RAID 6 and favors mirroring, typically mirroring each chunk of data at least three times and many more times if it is hot. This effectively increases the IOPS per chunk of data at the expense of capacity, which is much cheaper than additional bandwidth or cache.

I/O rationing favors fast sequential I/O as well. As Porter and Shenoy note:

A random access costs a seek time, half a rotation time, and then the transfer time. If the transfer is sequential, there is no seek time, and if the transfer is an entire track, there is no rotation time. So track-sized sequential transfers maximize disk bandwidth and arm utilization. The move to sequential disk IO is well underway. . . . caching, transaction logging, and log-structured file systems convert random writes into sequential writes. This has already had large benefits for database systems and operating systems. These techniques will continue to yield benefits as disk accesses become even more precious.

Google specifically optimized GFS for large reads and writes. Nor did they stop there. They also append new data to existing data rather than synchronize and coordinate the overwriting of existing data. This again optimizes the use of disk accesses at the expense of capacity.

Conclusion
Gray and Shenoy’s paper is surprisingly successful in predicting key design elements of an I/O intensive infrastructure as exemplified by Google and others. Yet they didn’t get everything right, although even their misses are instructive. Stay tuned.

Next: The Storage Management Crisis in Architecting the Internet Data Center: Pt. III