In yesterday’s post I ran through a quick (really, it was!) overview of the Google File System’s organization and storage-related features such as RAID and high-availability. I want to offer a little more data about the performance of GFS before offering my conclusion about the marketability of GFS as a commercial product.
The Google File System by Ghemawat, Gobioff, & Leung, includes some interesting performance info. These examples can’t be regarded as representative since we don’t know enough about the population of GFS clusters at Google, so any conclusions drawn from them are necessarily tentative.
They looked at two GFS clusters configured like this:
|Available Disk Cap.||72 TB||180 TB|
|Used Disk Cap||55 TB||155 TB|
|Number of Files||735 k||737 k|
|Number of Dead Files||22 k||232 k|
|Number of Chunks||992 k||1550 k|
|Metadata at Chunkservers||13 GB||21 GB|
|Metadata at Master||48 MB||60 MB|
So we have a couple of fair sized storage systems, one utilizing about 80% of available space, while the other is close to 90%. Respectable numbers for any data center storage manager. We also see that chunk metadata appears to scale linearly with the number of chunks. Good. The average file size on A appears to be roughly 1/3 that of B. The average files sizes appear to be about 75 MB for A and 210 MB for B. Much larger than the average data center file size.
Next we get some performance data for the two clusters: