FAST – File and Storage Technology – is a must-see conference for StorageMojo, and I’ll be reviewing several Best Papers from FAST ’12 . While most emerging technology is developed in private company labs, FAST is where much of the first publicly available research is published.

Case in point, a StorageMojo Best Paper of FAST ’12: Optimizing NAND Flash-Based SSDs via Retention Relaxation by Ren-Shuo Liu and Chia-Lin Yang of National Taiwan University, and Wei Wu of Intel. NAND engineers have known for years that it is possible to speed up writes by allowing for shorter retention, but this paper quantifies the process.

Data retention was a theme of several papers. Disk drives don’t care if an update needs to last a minute or a year, but flash does.

NAND retention
NAND flash writes are spec’d – by JEDEC – for one year of retention. But relaxing that retention requirement can be beneficial.

  • Speed. Writes can be 1.8 to 5.7x faster, depending on how long the data is to be kept.
  • SSD architecture. The need for overprovisioning and other choices is a direct result of incoming data rates and flash write speeds. Faster writes might also mean allow less aggressive garbage collection.
  • ECC. As feature sizes shrink and NAND cells get flakier, the ECC overhead required to achieve a year’s retention grows. Single error correcting codes used to suffice. Now we need 24-error correcting codes and the arms race continues.

These advantages are meaningless if most writes need to be retained for more than, say, 2 weeks. The authors looked at a number workload traces and found that for all but one of them, at least 50% of the writes were retained for 1 week or less. For active enterprise workloads the percentage is likely to over 75%.

What happens when the time is up?
The authors propose that the Flash Translation Layer keep track of how long each block remains unchanged. When – and if – it reaches the threshold, a background process rewrites the data for the standard 1 year retention.

It is feasible to differentiate between host writes and background writes – garbage collection, for example – and to write them differently. Long-term writes would get improved ECC, while host writes would avoid the costly ECC encoding required.

Yes, there is overhead in managing the fast blocks and rewriting long-term data. But the added performance appears to make that a small price to pay.

The StorageMojo take
The paper presents a strong case for relaxing retention requirements to improve performance. As future generations of flash become less reliable and slower we’ll need this and other techniques to improve – or at least maintain – performance.

Many performance enhancement schemes require unrealistic levels of intelligence about application or system behavior to be effective. But this is within the realm of practical implementation.

The retention issue is a fair example of being handed a lemon and making lemonade. Or offering another degree of freedom to system architects.

In fact, some vendors are already exploring this possibility. If it extends the useful life of flash for a few years it will be well worth the engineering effort.

Courteous comments welcome, of course. A somewhat analogous process for disks is the concept of shingle writes, an area UCSC has been working in. Will disk vendors pick it up?