In yesterday’s post I ran through a quick (really, it was!) overview of the Google File System’s organization and storage-related features such as RAID and high-availability. I want to offer a little more data about the performance of GFS before offering my conclusion about the marketability of GFS as a commercial product.

The Google File System by Ghemawat, Gobioff, & Leung, includes some interesting performance info. These examples can’t be regarded as representative since we don’t know enough about the population of GFS clusters at Google, so any conclusions drawn from them are necessarily tentative.

They looked at two GFS clusters configured like this:

Cluster A B
Chunkservers 342 227
Available Disk Cap. 72 TB 180 TB
Used Disk Cap 55 TB 155 TB
Number of Files 735 k 737 k
Number of Dead Files 22 k 232 k
Number of Chunks 992 k 1550 k
Metadata at Chunkservers 13 GB 21 GB
Metadata at Master 48 MB 60 MB

So we have a couple of fair sized storage systems, one utilizing about 80% of available space, while the other is close to 90%. Respectable numbers for any data center storage manager. We also see that chunk metadata appears to scale linearly with the number of chunks. Good. The average file size on A appears to be roughly 1/3 that of B. The average files sizes appear to be about 75 MB for A and 210 MB for B. Much larger than the average data center file size.

Next we get some performance data for the two clusters:

Read The Rest of Google File System Eval: Part II