WD has started shipping drives that drop the ancient 512 byte disk sector for a 4096 byte – 4k – sector, and the rest of industry isn’t far behind. For several decades disk sectors have been almost always been 512 bytes (NetApp tried 520 bytes – and irritated their customers no end). Why 4k and why now?
Why?
Rising bit density means smaller magnetic areas and more noise. The underlying or raw disk media error rate is approaching 1 error in every thousand bits on average – while tiny media defects can lose hundreds of bytes in a row. The larger sectors enable more powerful ECC to fix those gaps.
Why now?
A 512 byte sector can’t support enough ECC to correct for higher raw error rates. Thus bigger sectors with stronger ECC capable of detecting and correcting much larger errors – up to 400 bytes on a 4k sector.
The 4k sector enables disk manufacturers to keep cramming more bits on a disk. Without them the annual 40% capacity increases we’ve come to expect would stop.
Note: the longer ECC doesn’t change the drive level unrecoverable read error rate. It remains at 1 in every 1014 bytes.
4k sectors have been cooking for over a decade. The late adopters are the cloning software vendors. More on that in a moment.
Will 4k sectors use capacity faster?
If you write 500 bytes and the minimum sector is 4k, will that write take up the full 4k, wasting 3.5 KB? No.
The initial WD drives – and I assume other vendors as well – will operate in a 512 byte emulation mode. Eventually new disks will operate in native 4k mode, and then you might have a concern. But many operating systems already do 4k IO. And at a couple of cents per future GB, who cares?
Gotchas?
If you are in either of these 2 groups:
- Windows XP users
- Windows users who clone disks with software like Norton Ghost
there are a couple of gotchas if you want to use a 4k drive. Since most drives aren’t 4k and won’t be for another year or more, this may not affect you either. Vista and W7 users are cool except for cloning.
1) Windows XP does not automatically align writes on 4k boundaries, which hurts performance. WD has software – the Advanced Format Align Utility for their drives. I assume other vendors will too when they start shipping.
XP users need to run this utility once to use a 4k drive with a clean install, cloning software or a do-it-yourself USB drive. It isn’t needed for WD-branded 4k USB drives.
2) Windows clone software vendors have yet to implement 4k support. If you clone an XP, Vista or W7 drive you should run the align utility. The cloning vendors need to get on board Real Soon Now. Vendors are welcome to comment on their plans.
What about Macs?
No worries: Mac OS just works with 4k drives – including cloning.
Summary
There’s been a lot of heavy lifting behind the scenes to make this a smooth transition. With Vista, W7, Mac OS and Linux support well in hand most users won’t notice any change.
Some XP users will get bit by performance issues. The easiest solution for XP users: avoid 4k drives. Factory installed XP will be fine.
The StorageMojo take
My question: why not a better read-error spec? Today’s large SATA drives shouldn’t be used in RAID 5 arrays due to the high likelihood of a read error after a drive failure, which will abort the RAID rebuild. A better error spec would fix this.
Oh, RAID 6 sells more drives? Never mind.
Finally, the drive industry doesn’t know how to talk to consumers about technology. It took me an hour of digging to understand how this benefits consumers rather than vendors.
Comments welcome, of course. WD’s dynamic Heather Skinner arranged a briefing for me. No sectors, old or new, changed hands.
I remain skeptical that Linux is there out of the box when it at least comes to aligning partitions – http://thunk.org/tytso/blog/2009/02/20/aligning-filesystems-to-an-ssds-erase-block-size/ . Most Linux partitioning tools will not produce a good alignment by default (they can be forced to though).
If I recall correctly, Clarion also used 520 byte formatted drives, at least in the DG days.
It seems EMC uses 520 bytes, not sure if anyone else does – http://chucksblog.emc.com/chucks_blog/2008/08/your-storage-mi.html
Also I think the file system block size is related? In that if your drive has 512-byte sectors and your file system is using 4096-byte blocks then you’ll probably be more efficient with a 4k physical sector? I don’t recall the last time linux was able to use a 512-byte file system block size(man page says supported values are 1-4k), I think the default with most larger file system sizes is 4k today already(on linux at least).
On my big storage array the average I/O size is about 37kB( to disk, over 100kB to cache).
I think several vendors of raid boxes use a 520 byte formatting *internally* to store extra metadata there.
Even good old VMS presented 512 bytes to the user but used the additional bytes for storing the “forced error flag”.
Great in depth article from Anandtech covering the need for 4K sectors, the issues and the workarounds.
http://www.anandtech.com/storage/showdoc.aspx?i=3691
Linux has supported devices with native 4KB sectors for a long time. As far as drives with 512-byte logical and 4KB physical sectors is concerned the support was included in the 2.6.31 kernel. Our partitioning and LVM tools have been updated to compensate for the alignment reported by drive (ATA and SCSI). I think Fedora is the only distribution that ships the relevant bits at this point.
It is hard for a general purpose operating system to deal with buffer sizes that are not multiples of 512 in an efficient manner. So 520/528-byte sector drives have mostly been used inside RAID arrays. The extra bytes are used to store information proprietary to the array firmware.
Linux does support 520/4106-byte sector drives, but only when they are formatted using T10 Protection Information (DIF) and hanging off a DIF or DIX-capable HBA.
Yes… Clariion uses 520 byte sectors. Secondly, UBE of Enterprise FC and SATA is 1 in 10 to the 15th, not 1 in 10 to the 14th. That’s a *real* important distinction… If the 400 bytes of ECC on 4096 byte sector greatly increases reliability, I’m all for it. The extra 8 bytes on 512 bytes, hasn’t impressed me 😉
The presumption of this is that existing data will be reformatted to 4k sector file systems, and that 512 emulation on 4k will offer as good or better performance. Both are huge assumptions not likely to be true at the 90% level for years.
Clariion uses 520 bytes sectors, but 8 of those bytes are used internally by the storage to store a time stamp and a checksum to ensure data integrity. The host only sees the 512 bytes sectors, so it is the same as with internal disks.
The 4k thing is also sort of screwing over Drobo users too… funny that I saw these two posts on the same day!
The link provided by Karl makes a valid point about DOS/BIOS based partition tables for x86. An issue some SAN/VMWare environments suffer from as well.
This misalignment can cut peak IOPs in half or more.
Still waiting for EFI ( http://en.wikipedia.org/wiki/Extensible_Firmware_Interface ) motherboards with GUID partition table support (http://en.wikipedia.org/wiki/GUID_Partition_Table ) to take off 🙁
Actually, using 4k-sector format drives is strongly motivated by two major persectives which the conventional 512Byte (half-k sector format) drive is far lagged behand:
1. Disk format efficiency improvement by about 10% capacity increase – each sector starts with disk preamble data that has no user information but presents least information necessary for disk manufactorers. using larger sector format, i.e. 4k sector format, has saved 7 such gap space for storing user data per 4096-Byte sector that the conventional 512Byte sector drive cannot save, which largely yields extra recording spaceds for user, given the same density is applied. In WD drives, users should not notice any difference.
2. Using 4k sector format drive not only increases the conventional RS-ECC correction capability, but is potentially replacing the RS-ECC with the new revolutional LDPC code technology, which will provide significant disk drive URE performance. For instance, given URE of every 1 of 10^14 bit data read in conventional RS-ECC, by using LDPC and 4k sector format, the drive can easily go down to every 1 of 10^22 URE. This will translate into a huge benefit to users, Os and HDD industries with new technologies in head and media as well. In the end, parameers such as AFR, SMART and MTBF will be largely improved. RAID-x configuration may be rethought with these new gerneration high-performance drives.