StorageMojo




Robin Harris    


Anatomy of an outage

May 14th, 2008 by Robin Harris in Off-Topic, Security & Public Policy

Getting rid of the hacked files and spam links wasn’t the end of it
Dreamhost notified me that the load on my server was excessive and they’d disabled StorageMojo.

Yikes! Had I been hacked again? DDOS attack? What?

Building the correct mental model
In short order I brought up my SFTP client, my tracking site, the Dreamhost webpanel and my son on chat. He had me toss a new index.html file into the site folder to let people know that the problem was getting addressed.

On to problem solving
It took a while to figure it out because I’d never seen it before.

The load was coming from Google referrals for charming search terms that I’m going to misspell on purpose in hopes of not attracting similar traffic:

  • download sh*mail
  • downlode free 1ndian s3x movies
  • pharmasuitical affiliate prom0
  • 0rgish/behe*ding
  • h1nd1 p0rn m0v1es

*Lots* of pee-oh-rn requests for many different ethnic types. Some things are universal - at least among guys.

There were no hacked files still on StorageMojo - I’d gotten them all last week and they were still gone. But the tracking site was referring to them, so for a while I thought they were there but that for some reason I couldn’t see them.

But then my son checked what happened when someone tried to go to the spam links. The site was delivering a “system error” message - not the static 404 page I’d expect - so the site wasn’t delivering the spam content and it really was gone. Presumably processing for the “system error” page created much of the extra overhead Dreamhost was seeing.

For a while StorageMojo was getting thousands of hits an hour from these Google referrals. At some point Google must have crawled the site again, saw the content was no longer there, and stopped referring people.

Not a moment too soon!

So what was this all about?
My son hypothesized:

This looks like a two-step scheme…step one is that they hacked your site and got all those bad SEO files uploaded. Step two is to send lots of fake Google traffic through your site to increase PageRank.

Then I went one step further and checked out one of the spam pages that Google had cached. In big bright colors it told me that my XP system was infected with viruses and I should download their *free* virus scanner.

Whoa, scary. Except I’m on a Mac.

Botnet recruitment? I don’t know.

The StorageMojo take
I’ve made a number of changes to tighten up StorageMojo. As I was researching this I found that there are many security “folk remedies” out there, but very little on what the high priority issues are.

Keeping software up to date seems to be the critical success factor - and sad to say, I’d been lax. In addition to keeping current I’m now checking my site files more often among other changes.

Hopefully these requests will tail off as Google stops referring people. And StorageMojo can go back to being a quiet little site.

Thank you for your patience.

Comments welcome, of course.

Seagate’s head-settling time?

May 13th, 2008 by Robin Harris in Off-Topic

First it was the bogus “national security” argument against a Chinese buy-out of Seagate. Now it’s suing STEC over solid state drives (SSD). Has William Watkins, Seagate’s CEO, jumped the shark?

STEC said, per legal SOP, that the suit was “without merit.” After looking at the patents I agree.

Look at the patents
There are 4 patents named in the Seagate suit.

The links will download pdf versions if your insomnia is acting up.

6336174
The first patent covers an invention called a hardware-assisted memory module (HAMM) that can, when there is a system glitch, isolate

. . . itself from the host computer system before copying digital information from volatile memory to nonvolatile memory.

This reminds me of a common RAM-based SSD strategy - used 20 years ago in DEC SSDs - to copy data held in RAM to a disk drive when power failed.

6404647
The 2nd patent is for a solid-state mass memory storage device. This device

. . . comprises a printed circuit assembly and a plurality of non-volatile, high density storage devices mounted to the printed circuit assembly and electrically connected thereto.

A picture is worth 1,000 words:

More than a passing resemblance to a compact flash card - a product I bought in 1993.

6849480
“Surface mount IC stacking method and device.” This patent covers a technique that solves 3 problems:

  • Routing signals through stacks of similar chips
  • Stacked devices with identical dies that are made into distinct chips - flashed? - later
  • Long, high-capacitance interconnects between stacked devices

Seems like folks have been stacking chips for a while. Is there anything new here?

7042664
“Method and system for host programmable data storage device self-testing.”

. . . providing programmable self-testing of a data storage device comprises selecting one or more host programmable tests stored in memory in the data storage device by setting data in a first log in memory of the data storage device.

This invention’s goal is to enable disk drive testing without removing the drive from the host. It embodies the concept of the host providing test parameters for an attached device - which the patent imagines to be a disk.

Size matters
Part of Seagate CEO William Watkins’ pique with STEC is fueled by a suit from 3.5″ drive inventor Rodime that Seagate paid $45 million to settle. Rodime patented the 3.5″ form factor for disk drives - and got the courts to enforce it.

Watkins knows that patenting disk drive form factors is silly - they have to be standard sizes - but if the USPTO grants them and the courts enforce them, why not?

The StorageMojo take
The IC stacking patent may have some merit - I’m no judge of chip packaging. But the other patents, especially for compact flash, seem dubious at best.

The good news is that the Supreme Court has forced the patent courts to change course. In KSR v Teleflex the Supremes ruled that the non-obviousness is a legal question, not a factual one. That bit of hair-splitting means that lower-court rulings can be appealed. Under the old rule once the trial court made a “finding of fact” it could not be re-examined in the appeal.

Rodime would have lost under that rule. While it will take time for KSR to play out, in the short term it almost certainly reduces the value of existing patents - like Seagate’s ludicrous flash drive patent.

While some have portrayed this as Seagate trying to stymie a competitor, it’s more likely that Seagate believes STEC has some useful technology. The promise of a costly legal battle might persuade a smaller company than STEC to settle with a quick cross-licensing deal.

That would help Seagate catch up in the high-end flash SSD market. I hope STEC resists that temptation and continues to focus on the knotty issue of fast random write flash performance.

Comments welcome, of course.

NAND - an engineer’s perspective, pt zwei

May 12th, 2008 by Robin Harris in Architecture, SSD/Flash Disk

Herewith continues NAND - an engineer’s perspective.

Any you thought marketing guys were wordy! The quoted bits are from the earlier StorageMojo post Notebook flash SSD market: fantasy or mirage?. Teil eins ist hier.

Begin part zwei

. . . tested application performance hardly changes either . . . .

Actually, this makes sense.  If you are accessing 4k of data, then both HDD and SSD are both fast enough and you don’t care.  If you are accessing a 1MB file, then that is 256 x 4k sector accesses, and the sectors will be laid out one after the other, which is where HDDs perform well.  SSDs will shine when you need to do 256 x 4k sector accesses, and the sectors you are accessing are scattered across the disk, but as far as I know this access pattern is not common except on servers.

And what about the 4-bit MLC that Toshiba is counting on to drive costs down?

I’m a NAND flash fan, but this is scary stuff for me.  To store 1 bit in a bit cell, you need to distinguish between two voltage levels.  To store 2 bits, you need to distinguish 4 levels.  For 3 bits, 8 levels.  For 4 bits, 16 levels.  I think at the 4 bit/16 level point, we’re down to where 10-20 individual electrons can make the difference in the bits read out.

This will less durable than current SLC. How do you explain that to consumers?

The answer is easy, but doing it is hard.  You have to make it so that the issues are completely invisible to consumers.

Note that this has been done successfully with flash for years.  Most of the memory cards (SD, MMC, etc) that people have been buying for years use MLC flash.

Flash has read errors - that’s why vendors implement error detection.

NAND chips are generally organized in write pages, with a spare area for each page - typically 2kB page, with 64B of spare area.  The spare area is used to store ECC parity data, and meta data (more about this shortly).

HDDs have read errors as well, they also write their data to the platter using ECC, and other algorithms that make it easier to recover the bit clock and align the heads when reading the data back.

But flash has a problem disks don’t: flash drives move your data around a lot more often than disks do. Every time a flash drive writes a page, it has to erase the entire block that page is in.

Not quite right.  Generally, a page can only be written once, and has to be erased before it can be written again.  And unfortunately, erases can only be done on an erase block, which is usually 64 write pages.  If you have to erase a page, then you might have to move 63 other pages to free up the erase block - yuck!  It happens sometimes, but the FTL (flash translation layer) software that manages all of this is usually optimized to avoid this situation as much as possible.

The normal scenario is that you write a page, and the FTL just puts the new data in a new page somewhere, and marks the old page as obsolete.  Once you the FTL runs low on space, it needs to do garbage collection, but if you put a little extra NAND in your system so that even a full filesystem has some empty pages, you can make that pretty rare.

No hard numbers from the vendors - depends on how good their signal processing algorithms are - but it could easily be 5,000 writes - down from 10,000 today.

Actually, some of the NAND vendors are already at 5k erase/write cycles today.  This, and slow write speeds are definitely the weak links for MLC NAND.

I believe that it is possible to do a good enough job with caches in the computer DRAM, and in the FTL to make a system built from 5k endurance work for a very long time.

Note that the 5k number is a statistical thing - this is the number of cycles at which about x% of the blocks will have failed (I think x% = 50%, but I didn’t look it up).  This means that some blocks might fail when the part is new, and some might last a lot longer.  If the software is done right, then the amount of available storage space will gradually shrink as blocks fail, and the entire drive won’t suddenly fail.

The map that keeps track of where your data is rapidly gets very complex - and itself is regularly read and rewritten. How well protected is this critical data structure? If it isn’t bulletproof you can kiss your data good bye.

All true.  But you can also write metadata information in the spare area, to allow you to rebuild the FTL map if something goes horribly wrong.

Also, HDDs have the same problem with their FAT tables, or the modern equivalent.  This is normally stored on the disk, and in the computer’s RAM, with the disk copy being a little out of date.  Lose power at the wrong moment, and bad things can happen.

The StorageMojo take
Many thanks to the anonymous contributor. Net/net this points again to the suitability of flash drives for servers - and not so much for notebooks - the original subject.

The larger issue is the lack of transparency on the part of NAND SSD vendors. Until their architectures can be independently reviewed, we all have to rely upon marketing assurances - not! - and the useful but skimpy testing provided by sites like Anandtech.

The server-side SSD market can work with those limits. After all, the vendor of the complete system has to stand behind it.

But that is a tiny fraction of the total available market. The big win is on the consumer side: 100+ million units; if the product delivers.

Samsung, Toshiba: your current strategy is doomed. You need to engage at the consumer’s level instead of relying on the usual marketing hype. Your product is too costly, now and 3 years from now, to succeed without delivering real benefits.

You aren’t there yet.

Comments welcome, of course.

StorageMojo in Chicago

May 10th, 2008 by Robin Harris in Off-Topic

I’m spending a couple of days R&R in Chicago. Caught Shemikia Copeland at Buddy Guy’s last night. Cruised the Chicago river this morning. Hope to hit another couple of blues clubs tonight.

Then back to the mountains of northern Arizona.

Moderation has been a bit spotty - but all will be back to normal Monday morning.

NAND - an engineer’s perspective

May 10th, 2008 by Robin Harris in Off-Topic

The post on on notebook flash drives [see Notebook flash SSD market: fantasy or mirage?] generated many comments.

Part of what makes it hard to discuss flash is the dearth of information about how it works. My investigation of flash issues has been helped along by hints and tips from insiders and the occasional paper that sheds light on FTL design issues [see Flash chance, based on a paper from Microsoft Research].

Thus I was pleased to get a 2500 word email from a polite and knowledgeable SSD engineer cum marketing guy commenting at length. I asked him if I could publish his comments and he said yes - if I preserved his anonymity and removed the names of the companies he’s worked for.

Seemed reasonable. Since it’s long I’m breaking it up into 2 parts.

In the editing I’ve removed some info, abridged some comments, added the bold face headers and broken some long paragraphs into 2 or 3 shorter ones for online readability. At all times I’ve sought to preserve the author’s meaning.

Begin SSD guest comment
First up, great post. I agree with most of what you said. I haven’t used an SSD drive myself, but the reviews I’ve seen make me wonder if I ever will - way too expensive, for way too little benefit.

The lay of the land
Quick background comment on flash memory. There are two main kinds of flash memory: NOR & NAND. NOR is similar to SDRAM, NAND similar to HDD. NOR can be accessed randomly, is faster (at least for reads) than NAND, but the chips are smaller and cost a lot more per GB.

NAND can only be accessed in blocks like a HDD, the chips are larger, and the cost per GB is less than NOR. NOR is commonly used for firmware (e.g. the BIOS in your PC), NAND is commonly used for bulk storage. In the discussions about SSDs, we’re always talking about NAND, so I’m going to say “NAND” rather than “flash” in the rest of this email.

NAND flash has a ~10x worse $/GB than HDD, but it has about a ~10x better $/IOPS than HDD.

Your tour guide
I’ve been in the semiconductor business for ~20 years, first as an engineer, then gradually transitioning in the management & marketing. In my last job I developed relationships with all the NAND market players. When I first started looking at NAND chips, 4MB chips were still around, now we’re working with 4GB chips - wow!

The future
I think that the SSD drive makers can do a MUCH better job than they’ve done so far, and that the raw technology is capable of doing much better. I think eventually the SSD products will get better, and we’ll see SSD drives (or their successors) used almost everywhere.

1st, the numbers
A state of the art MLC NAND chip today is 4GB, so a 64 SSD drive has at least 16 NAND die inside. The peak write speed should be ~5MB/sec/die, so the SSD should be capable of ~80MB/sec sequential write. Peak read speeds should be ~30MB/sec/die, so the SSD should be bottlenecked by the SATA interface.

These are MLC numbers. SLC performance will be even higher, about 8x better for write speeds for the datasheets I compared. True, these are best case raw performance numbers, and in the real world there are complications that will keep you well away from these numbers, but it should be possible to do waaayyyy better than we’re seeing now.

Responding to StorageMojo
[He goes on to quote and respond to some points from the StorageMojo post. I've put those in quotes.]

Flash has a place in one notebook niche: below the $40-$50 minimum cost of a disk. As we’re already seeing with the Asus Eee, replacing $50 of disk with $10 of flash makes a big price difference.

I agree 100% with this - if I can build a system using either $10 of NAND, or $50 of HDD, and the $10 of NAND is enough storage, then NAND wins. It doesn’t matter that the HDD has higher $/GB, or that it will have loads of spare GB - it costs $40 more, and it’s out.

$10 of NAND storage will buy a rapidly increasing amount of storage, so the cut-over point where NAND wins based on entry cost along is rising rapidly. I think that the $/GB number is halving every 12-18 months, so in 2-3 years we’ll get 4x more NAND for the same cost.

FABulous

Given the multi-billion dollar cost of semiconductor fabs, getting the notebook SSD market wrong would make Toshiba’s $250 million HD-DVD loss look cheap.

Actually, while the size of the $$$ at stake are probably pretty large (inventories, controller chips design efforts, etc), they are not as large as a fab. A modern day, state of the art wafer fab costs several billion dollars, but that investment won’t be completely at the mercy of SSDs succeeding, for two main reasons.

One, these fabs are built to make both SDRAM & NAND. Both markets are very sensitive to the balance of supply/demand, and therefore both markets exhibit wild price swings. By building the fabs to support both types of (very high volume) products, they can switch from one to the other based on the supply/demand balance in both markets.

Two, there are other huge markets for NAND, primarily memory cards (SD, MMC, xD, memory stick, CompactFlash, and variants), & consumer electronics devices (phones, especially SmartPhones, GPSs). One of the biggest customers on the planet for NAND is Apple (iPods, iPhones).

It is true that Toshiba is playing a billion dollar poker game with (mainly) Samsung as to fab capacity (if there is overcapacity, both companies suffer, but if one under-invests and the other over-invests in capacity, then the over-investor wins), but SSDs succeeding or not will happen slowly enough that the capacity differences can be absorbed by speeding up or slowing down the bringing on of new fab capacity.

. . . today’s spot market MLC $2500/TB . . .

That spot market price is about right. This implies that the 64GB SSD in the Macbook Air should be about a $300 upgrade, not a $1,300 upgrade. True, you do get a slightly faster CPU in the deal, but I think that we’re looking at way high early adopter prices right now.

And if the market doesn’t appear, a billion dollar write off.

I’m guessing that they are betting $10M to $20M on a project to build a SSD controller design chip. They can’t afford not to have the controller, in case the SSD market results in a significant proportion of their volume, and they can’t assume that they will be able to buy the controller from an outside company (or even more risky, a competitor).

Power: no SSD notebook has gained more than 10 minutes battery life over disks. Since flash is already power-efficient that won’t change. Disks have multiple opportunities to improve power use - and with over a $1 billion a year in R&D behind them - they will.

The primary users of power in a note book are (in order)

  • The display back light
  • The CPU
  • Everything else

The HDD is lumped in with everything else. Flash should have a significantly better power consumption than HDD, but since both are operating in the power shadow of the display & CPU, it doesn’t make a lot of difference.

Despite what a commenter said, spinning the HDD platter doesn’t take a lot of energy. Spinning them up to speed from idle does take a lot of energy, but only for a few seconds. Keeping them spinning once they are started only consumed enough energy to overcome the bearing friction, and that friction is pretty low. Most of the power spent in accessing a HDD is in moving the read/write heads, and in the read channel electronics.

One other think you didn’t mention is that after ~30 years of development, Windows (Linux, OS-X) is pretty well optimized to the characteristics of HDDs. Have you ever heard of the Windows XP Prefetcher? Wow!

Now, if we can do something about the power consumption of the display back light and CPU, then SSD vs. HDD might make a difference, but by then we’re talking about cell phone like battery life so it probably won’t matter.

End of part 1
Next up: flash financials; 4 level flash durability; data protection and more in the conclusion to NAND - an engineer’s perspective

Comments welcome, of course. Did you notice that he actually disagreed with much of what I said? But he was nice about it.

StorageMojo: hacked!

May 6th, 2008 by Robin Harris in Off-Topic

Always learning
This week’s learning: a hacked web site. There’s been a lot of that going around. Writing has taken a back seat to fixing the problem.

It took a while to grok how deeply StorageMojo had been hacked.

First I got a note from my hosting company - something about a daemon - and I told them to take it down. Which they did.

Thought I was done.

But I wasn’t
Then Gary at Nexsan noted that StorageMojo was alarming his browser. Went into the StorageMojo files on WordPress and discovered some iframes that I hadn’t put there.

Pulled them out. Upgraded to the latest version of WordPress.

Thought I was done.

Wrong again
Fired up the SFTP client and took a look at my web site files. Saw a bunch with names I didn’t recall, like Emma, Alexander and Jordan. Inside, links to hundreds of sites I’d never heard of either.

Got rid of them.

Checked a couple of other sites I host on the account. One had been completely cleaned out by the spamsters - the site was gone - replaced with more collections of links.

Edited the junk out of those sites. Hoped I was done, but decided to go through every single file and folder on all three sites.

Found the malicious code. Very professional. Replicated in several places. Language = ru, whatever that means.

Corrective action
New passwords, of course. Notices that the Dreamhost web management system doesn’t make that easy to do - password management is spread across several different tools - which guarantees that people won’t change them very often.

Read up on security. A couple of good sites are Blog Security and Stop Badware. Google also has a helpful checklist.

Did some other housecleaning and site hardening.

The StorageMojo take
I now know I will never be done. The rest of you with blogs should learn by my misadventure.

The biggest surprise is that there are many things that can be done to make sites harder, but they are not the defaults. You have to do some research and sometimes some configuration.

That is wrong. Other than general exhortations to update software, the hosting companies do almost nothing to make it easy to manage security. Not many consumers are going to dig into log files every couple of days.

I’m more technical than the average blog writer and some of this stuff is a PITA. The Internet Operating System needs some security patches.

Comments welcome, of course. AFAIK nothing bad got sent to readers of StorageMojo.

NAB Shorts: MatrixStore

May 2nd, 2008 by Robin Harris in Off-Topic

Spent some time with Nick Pearce, a co-founder of Object Matrix, a UK-based software startup supporting commodity-based archiving.

Their MatrixStore product clusters off-the-shelf servers and storage to create a secure disk based archive. MatrixStore runs out of the box on Mac OS and will work with most Linux supported tin.

Commodity hardware and software
Archived data should not be tied to a specific storage platform. Proprietary formats or filesystems are an accident waiting to happen.

MatrixStore keeps the data on industry standard filesystems in the same format as on the client disk. The data will be retrievable even if the company has disappeared.

Platform lifecycle
Older gear can play in the same config as newer stuff. Roll old hardware out of production into the archive, and double its useful life. Upgrade in place, a critical consideration for archives.

Application-centric storage
MatrixStore is integrated with the recently released Final Cut Server from Apple. They provide life-cycle management of assets and metadata from ingest through archive.

The MatrixStore software stores the added FCS metadata using metadata operations supported by XFS on Linux. When ZFS is supported on Mac OS they plan to use its native metadata support as well.

MatrixStore also automates some tasks that usually require manual configuration, adding capacity, data redundancy, data authenticity and the like. Like Final Cut Server it’s designed for people who aren’t storage admins.

Cool pricing
They give away the first 15TB of software licenses away for free. After the first 15TB it’s $1000 per TB of protected content. There’s a pricing widget to help with configurations on their website.

The StorageMojo take
Digital archiving is a critical issue for content creators. Nick - who had worked at EMC - made choices that will become de rigueur for deep archiving as people come to understand the issues:

  • Content in its original format
  • Commodity hardware
  • Upgrade in place
  • Pay as you go
  • Automate the small stuff

MatrixStore’s focus on Final Cut Server and their pricing model are both positives. Final Cut Studio has taken out a huge swath in the NLE market - over 1 million licenses sold - so the FC Server business should be a healthy one.

Their pricing transparency and unlimited-time 15 TB trial should also work well. All in all, an up-to-the-minute approach to the market. You might almost think they’re American.

Comments welcome, of course.

The value of guaranteed uptime

May 1st, 2008 by Robin Harris in Architecture, Enterprise, Future Tech

What, if any, is the value of multi-year storage uptime?

Xiotech and Atrato promise 5 and 3 year uninterrupted service on their new arrays. Now it is time to ask, as some commenters have, so what?

After all, enterprise data centers are already well-equipped to deal with disk failures. RAID keeps the data available. 7×24 service replaces the failed drive with a new hot spare. Experienced storage admins paper over the cracks.

It isn’t like you’re going to fire all your storage admins just because arrays stop breaking.

Opex vs capex
The direct cost saving - no maintenance contract for x years - may or may not be reflected in the purchase price. From a buyer’s perspective there are 2 costs: the capital expense - capex - and the operating expense - opex. Opex is fully tax deductible in the year incurred, so it is easier to get.

Atrato and Xiotech need to think creatively about maintenance pricing.

Breaking into the glass house
Breaking into data centers with the promise of cost savings isn’t easy. The provable cost savings have to be 50% or better to get conservative data centers to change vendors. And it helps if there is a recession or the business is tanking. Motivation.

A case can be made that after adding up a standard array’s maintenance costs, random disruption costs and additional management it will be cheaper to go with the new product. The CFO will demand it.

But if you want to change the market, you have to change the way the market thinks.

Re-thinking the issue
Straight cost-displacement arguments aren’t going to have the legs both companies would like. They need a different model.

Enterprise IT is manufacturing plant - not an engineering testbed. It confuses the engineers because it seems like a techie haven - but it isn’t.

It is all about shipping product, each and every day. Like a real factory.

SPC
Everyone accepts that statistical process control has changes the face of manufacturing. A core idea behind SPC, reducing variability improves quality, is directly applicable to IT factories.

What Atrato and Xiotech do, ideally, is reduce IT ops variability. There is always a known level of performance. Availability is 100%.

Thus most of the usual dependencies are no longer dependencies. I/O slowdowns and timeouts should disappear. Drive rebuilds won’t impact performance. Admins won’t pull the wrong drive - which happens about 2% of the time - and bring down the array. And so on.

The StorageMojo take
Enterprises over-configure because they never know what is going to hit them - but they do know it will be at the worst possible time. Ideally they want to be ready to handle the biggest shopping day of the year - even after an array failure.

Workload variability isn’t going away. But wouldn’t it be nice if equipment performance and availability variability did?

That’s what Atrato and Xiotech are selling. I wish them luck communicating a value prop that strikes at the heart of what every other array vendor is selling.

Comments welcome, of course.



StorageMojo RSS Feed May 2008 April 2008 March 2008 February 2008 January 2008 December 2007 November 2007 October 2007 September 2007 August 2007 July 2007 June 2007 May 2007 April 2007 March 2007 February 2007 January 2007 December 2006 November 2006 October 2006 September 2006 August 2006 July 2006 June 2006