A petascale parallel database

by Robin Harris on Monday, 8 February, 2010

MapReduce and its open source version, Hadoop, are parallel data analysis tools. A few lines of code can drive massive data reductions across thousands of nodes.

Cool.

Powerful though it is, Hadoop isn’t a database. Classic structured data analysis of the model/load/process type isn’t what it was designed for.

That’s where the paper HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads (pdf) comes in. Written by Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel Abadi, Avi Silberschatz and Alexander Rasin (the former 4 @Yale, and the latter @Brown) the paper proposes a method for building an open-source, commodity hardware-based massively scalable, shared-nothing, analytical parallel database.

What it is
HadoopDB coordinates SQL queries across multiple independent database nodes using Hadoop as the task coordinator and network communication layer. It uses the scheduling and job tracking of Hadoop while it intelligently pushes much of the query processing into the individual database nodes.

There are four components to HadoopDB.

  • Database Connector. Each node has its own independent database. The connector is the interface between the database and Hadoop’s task trackers. A MapReduce jobs supplies the Connector with an SQL query and other parameters. The Connector executes a SQL query on the database and returns results as key value pairs. It can implemented to support a variety of databases.
  • Catalog. The information needed to access the databases and metadata such as cluster data sets, replica locations and data partitions is kept in the catalog.
  • Data loader. The data loader is responsible for two jobs. First executing a MapReduce job over Hadoop that reads the raw data files and partitions them into as many parts as the number of nodes in the cluster. Second, the partitions are loaded into the local file system of each node and chunked according the system-wide parameter.
  • SQL to MapReduce to SQL planner. The planner provides a parallel database front end to enable SQL queries. The planner transforms the queries into map reduce jobs and optimizes the query plans for efficiency. This is where scratch that this is the secret sauce of HodoopDB.

HadoopDB complements the Hadoop infrastructure and does not replace it. Analysts have both available as needed.

Heterogeneity
A key issue for Internet-scale systems is the ability to run in a heterogenous environment where multi-year build-outs and rolling node replacement are the norm. That means that some nodes will be faster than others. HadoopDB breaks the work down into small tasks and moves them from slow to fast nodes automagically.

Results
The authors ran some benchmarks on Amazon’s EC to to test performance. The HadoopDB load times were about 10x that of Hadoop, but the higher performance of HadoopDB usually justified the longer set up time.

The authors found that HadoopDB was able to approach the performance of parallel database systems on much lower cost hardware and free software. Given the gift of the projects one can expect higher performance as improvements are made.

The killer app for private clouds?
MapReduce and Hadoop are already in wide use among Internet-scale datacenters. As companies begin to understand and correlate social media, web activity and ad response rates, the demand for large-scale parallel database processing will grow. But will they want to ship it out to Amazon?

Depending on the quantity and sensitivity of the data many organizations may prefer to keep the processing in-house. Private scale out Hadoop clusters may become the poor companies data warehouse of choice.

The StorageMojo take
HadoopDB is more science project than commercial tool today. Yet the project demonstrates the feasibility of using scale out compute/storage clusters for work that day typically requires proprietary high-end scale up system architectures.

If capital costs are reduced by two thirds with a commodity/FOSS architecture, companies could afford to hire the expertise required to make it work. The free software/paid support model will prove quite successful in this space.

Courteous comments welcome, of course.

{ 0 comments }

Why private clouds are part of the future

by Robin Harris on Friday, 5 February, 2010

James Hamilton, Amazon architect and a very smart guy, recently blogged about private clouds. In Private Clouds Are Not The Future he argues that economies of scale make public clouds much more efficient than private clouds.

I think we agree that several effects make web scale public clouds more efficient:

  • Higher quality services. Large clouds can economically employ experts to design and optimize their services and infrastructure. Security and server/storage design are two areas where deep expertise can provide more reliable and efficient service.
  • Utilization. Power systems and power cost are optimized when data centers are run at 100% utilization. As utilization rises across the board so does the capital efficiency, i.e. work per invested dollar.
  • Cost. Large-scale investments create their own lower-cost dynamic. Public cloud providers save money on infrastructure acquisition through volume buys. In addition, their volume enables them to acquire optimized components, such as high-efficiency power supplies or custom cost-reduced motherboards, that offer little economic advantage to small volume buyers.
  • Portfolio advantages. With a mix of customers and jobs web-scale clouds have a more stable aggregate load. Some customers are growing, some are shrinking, but the net demand becomes more stable with size. This, in turn, enables public cloud managers to drive utilization higher with less risk of pegging the system.

With all these advantages it is obvious that private clouds are not the future. Or is it?

It isn’t all about the Benjamins
Economics is not the driver many assume. Individuals and companies often select less economic choices. Some people buy cars that cost $200,000 and get 12 miles to the gallon. Some companies buy $6/GB storage and then utilize just 1/3rd of that costly capacity.

Often perceived benefits are not well measured in dollars. Convenience, availability, consistency and control often relate to emotional needs and wants that are rarely quantified or questioned.

But we don’t have to invoke those to understand why private clouds will be part of the computing landscape. Just a quick look at one of the large Internet data centers will tell us what we need to know.

Show me the power
All the advantages of public clouds have analogs in the world of power generation and distribution. Power generation is cheapest when centralized and large-scale distribution systems move power at the lowest cost per watt.

Electrical power generation and distribution is over 125 years old. The technology is well understood, the industry is mature, and a massive infrastructure — including mile-long coal-hauling trains — supports production and distribution.

And yet, Google’s massive Dalles, Oregon data centers, built next to a substation a few miles from the nation’s largest hydropower system – one of the world’s most reliable power sources – flanks each data center with generators. I expect Amazon does the same.

Access
Clearly, access to data is at least as important as access to power or why would data centers spend the money on uninterruptible power supplies?

Despite the maturity of the power industry people realize it cannot be relied upon 100%. Therefore they maintain their own power storage, generation and distribution systems.

Is the Internet that different?

We cannot rely 100% on Internet access to our data. If the application is important enough, as judged by often subjective human criteria, we will keep our data as close as Google keeps its generators.

Even if it isn’t the most economic choice.

The StorageMojo take
My thanks to James Hamilton and his post for a lucid justification for an all cloud IT infrastructure future. He helped me see why that isn’t going to happen and for that I thank him.

I’ve grappled with the question of private clouds for the last couple of years. The advantages of web scale systems became more obvious, but the human desire for reliable data access and control has not receded.

Public and private will not displace each other: they will coexist just as public and private power sources coexist today. No doubt public clouds will claim the majority of the market whether measured in dollars or exabytes, but private clouds will remain significant contributors to our data infrastructure for decades, if not centuries, to come.

Courteous comments welcome, of course.

{ 7 comments }

Oracle+Sun storage: wiser & brighter

by Robin Harris on Wednesday, 27 January, 2010

While everyone else was watching the Apple iPad intro I was watching Oracle’s John Fowler talk about their systems and storage strategy. I like the iPad, but the O+S strategy could reshape the storage industry.

More details will emerge and many decisions still remain but the basic elements are clear:

  • Focus on direct sales. In the mid-1990s, when I joined Sun, the tenacity and aggressiveness of their direct sales force was a welcome change. Direct sales forces are expensive, but losing touch with your customers is even costlier. The combo’s unique value propositions can’t be sold by channels today. In 5 years – maybe.
  • A dedicated storage sales force. Generalist salespeople with millimeter deep storage product and application knowledge can’t compete with EMC and NetApp. Storage specialists aren’t easy to develop, so they’ll hire them – and they promise top commissions.
  • Deep integration of ZFS into storage systems. A software company should like a software solution to many of the biggest storage problems? Putting real muscle behind ZFS will help thousands of enterprise customers to rethink their high-performance data protection strategies.
  • Flash everywhere. Sun has done some creative things with flash already, such as Logzilla, and Oracle sees that much more can be done.

Not mentioned – not that it should have been – is the fate of ZFS on Mac OS X. That would be a boost for all concerned.

The StorageMojo take
Sun’s primary storage business has been a black smoking crater of disaster for over a decade. And it didn’t help StorageTek to have them answer to know-nothings.

Despite that Sun engineers outside the storage group developed innovative and game-changing technologies that the company couldn’t capitalize on. With Oracle’s investment now they can.

No database/systems company can be successful without a healthy and very competitive storage team — and the high gross margins don’t hurt. With a hard-nosed focus on application performance, marketing competence and continued innovation, the O+S storage group could be a fun place to work. They are hiring!

It will take Oracle 12 to 18 months to develop the kind of customer traction that will make other storage vendors set up and take notice. But Larry Ellison isn’t planning to lose and there is no reason he should.

Storage competition in the enterprise is about to get cranked up several notches. And that is a good thing for all customers.

Courteous comments welcome, of course.

{ 6 comments }

Will a 70 TB cartridge save LTO?

by Robin Harris on Tuesday, 26 January, 2010

IBM and Fujifilm have demonstrated a technology that, if productized, could give us a 70 TB LTO tape cartridge. Tape isn’t dead – that will be a long time coming – but its vital signs aren’t good, either.

Vacuum column, 800bpi tape drives
Magnetic tape is the oldest digital storage technology still in use. Once mass storage meant tape because drums – and later, disks – were tiny and absurdly expensive.

IBM and Fujifilm demonstrated a density of 29.5 billion bits per square inch on linear tape. Disks are approaching 1 T/bit in a controlled environment and much less media area.

Theoretically this supports a single tape cartridge with a 35 TB of uncompressed data capacity – or 70 TB of compressed data in a single LTO (linear tape open) cartridge.

Current LTO tapes, even with compression, are at about 2 TB per cartridge — the same as high-end disk drives. In nine months those 2 TB disks will cost about the same as single LTO cartridge. Why store data on tape where it is so much faster to access?

Defenders point to tape’s energy efficiency — write once and shelve without consuming more energy for decades — but people like the convenience of random-access data. If this drive industry woke up and started offering archive quality disks — Seagate sold an automotive hard drive that carried a 10 year warranty — much of the remaining tape market would disappear.

Lifespan is another benefit of tape technology. I recently transferred a 20-year-old VHS tape that hadn’t been looked at in at least 10 years to my computer. There was some drop out but the picture was very watchable. Try that with a 20 year old disk drive.

Technology
Whether it is commercially feasible or not, the IBM/Fuji technology is impressive:

  • Advanced nano particle technology — they limited the size of the barium ferrite particles to 1600 nm3 — approximately 1/3 of current metal particle volume.
  • Advanced nano coating technology — a smooth and thin magnetic layer with very low variability reduced signal fluctuation significantly, enabling more accurate signal processing.
  • Advanced nano dispersion — a new material controlled agglomeration enabling more uniform dispersion of the nano particles.
  • Nano perpendicular orientation — taking advantage of the barium ferrite particles crystal magnetic anisotropy, a perpendicular orientation improved high-frequency characteristics.

But the remaining obstacles are daunting: mass production of tiny uniform nanoscale particles; mass production of an extremely smooth and thin magnetic layer; and careful control of the particle dispersion and orientation. Plus heads and transports accurate enough to take advantage of the density.

That added technology raises tape’s entry price – further restricting the market – and it isn’t easy to see what, if anything, can reverse that dynamic.

The StorageMojo take
Regardless of whether you think tape has a long-term future, this is an impressive demonstration. When I introduced DLT at DEC, customers were thrilled to get to 2.6 GB on a tape cartridge.

If they can get the cartridge to market in the next 5 years, they’ll can charge 5x what a disk costs – because the capacity is so much higher than any single disk. If they can’t – well, it was a neat tech demo.

Drive marketers should see that a massive archive disk market is fast approaching. Cheap USB 3 SATA drive docks will enable millions to store their memories on rarely used disks – and to rapidly access all the data.

Nevertheless, tape remains the most proven archival storage medium for digital
data. Tape may yet live to see that 70 TB cartridge delivered.

Courteous comments welcome, of course. I had an audio cassette recorder for storage on my first computer. Couldn’t afford $800 for a 144 KB floppy disk. I now have 11 disks – and 2 optical drives – on my Mac Pro. That cassette recorder was my 1st – and last – tape drive.

{ 8 comments }

Verari restart

by Robin Harris on Wednesday, 20 January, 2010

Verari Systems is now Verari Technologies. The company’s assets were purchased by the original founder, Dave Driggers, after an attempt last year to get another round of financing foundered.

They’ve had some success with their containerized compute/storage systems. There haven’t been many buyers amidst the Great Recession and the credit crunch didn’t help.

Here are edited comments from their website:

Original Founder Leads Investment Group in Purchase of Verari Systems’ Assets

Founder aims to re-start company with concentration on data center design and optimization services, modular container-based data centers, blade-based storage and high performance computing solutions.

San Diego, Calif. – January 19, 2010 – David Driggers, the original Founder of Verari Systems, Inc., . . . today announced the successful acquisition of substantially all of Verari Systems’ corporate and intellectual property assets by an Investment Group led by Driggers.

Mr. Driggers is re-starting the Verari engine this week. The new company, Verari Technologies, is offering immediate support to past Verari Systems’ customers.

Verari’s award-winning FOREST containers are one of the industry’s best selling portable data center solutions. The containers, as well as Verari’s BladeRack architecture, utilize Verari’s patented Vertical Cooling Technology to increase energy efficiency while reducing a customer’s energy bills.

“You’re going to see a concerted effort on our part to license and promote these unique technologies,” states Mr. Driggers.

Most of the staff was laid off last year because the company couldn’t meet payroll. The new company retains much of the former senior management.

The StorageMojo take
Verari is wise to take a step back from direct competition with HP, SGI and IBM. HP owns the biggest chunk of the blade market, buys over half the world’s disk drives and, in the 9100, has some very dense storage. But HP can’t be all things to all people – and Verari can help fill the gaps.

While the density benefits of blades are undeniable, some question whether they are cost-effective compared to high-volume commodity boxes. Verari’s pricing seemed more aggressive than most blade vendors – perhaps too aggressive – but price is another competitive tool they may choose to wield to the benefit of buyers everywhere.

Courteous comments welcome, of course.

{ 2 comments }

Storage for version control

by Robin Harris on Tuesday, 19 January, 2010

A reader writes:

I found your blog after searching for storage alternatives. I have to say, its really impressive and has helped me a lot so far. I was wondering if you could offer some advice.

We run an online version control service. Currently we are hosted on a VMware environment using FC SAN (SAS and SATA).

We’re growing into the 3 TB+ range and looking for alternatives, since we’re paying $2.50/GB for FC SAN (crazy). We looked at NetApp, but with all the stuff going on these days I have to think there is something less expensive and more creative.

Basically, our needs are:

  • Fast read and write performance (500+ r/w iops – we have over 13,000 commits per day)
  • Shared across many machines. We are currently using NFS.
  • Something that won’t require a team to manage. Although, we already manage our entire Linux environment.

I noticed a post about Gluster, ParaScale, and Nexenta. They look promising, but my fear is that they will require too much maintenance. SAN and NFS are pretty simple and if we get NetApp from our hosting provider they manage it for us. Although, they want to charge us $8,000/mo for it (two shelf, 28 450 GB 15k SAS).

As I dive into storage I think I get more confused :) Any advice is greatly appreciated.

When I asked if I could publish the note – which has been edited for clarity and anonymity – I had my own questions:

Why do you think that Gluster, ParaScale & Nexenta will require too much maintenance? Also, when you say SAN, are you referring to Fibre Channel or simply a dedicated Ethernet storage network?

The reply illustrated a facet of the marketing problem that new technologies face: uncertainty.

Not sure really, I just have not had experience with any of those solutions yet. Nexenta looks pretty impressive. I’ve also heard some great results from DRBD.

We have Fiber Channel with HBA cards. It’s still shared storage, but really fast.

BTW, DRBD is the name of an open-source software product:

DRBD® refers to block devices designed as a building block to form high availability (HA) clusters. This is done by mirroring a whole block device via an assigned network. DRBD can be understood as network based raid-1.

The StorageMojo take
My first thought is that anyone who manages a technical hosted service that costs several $K per month should be able to manage a fairly modest scale-out cluster whose capital cost may be only 2-3 months of rental. And 28 15k drives seems like overkill on both the IOPS and the capacity.

But I don’t know much about version control I/O profiles. Maybe the problem is harder than that.

Readers, what say you?

Courteous comments welcome, of course.

{ 25 comments }

StorageMojo back up!

by Robin Harris on Thursday, 14 January, 2010

Nothing malicious going on, AFAIK. The latest version of the Thesis WordPress theme isn’t behaving.

Downgraded Thesis to the working prior version. Will be moving the site to a private virtual server to get more RAM.

Will finish loading the last 2 new price lists – IBM & Sun – after the move.

Sorry about the downtime. Thank you for reading StorageMojo.

Robin

PS: If someone knows HTML, CSS, PHP, WordPress and smart web and UI designers, there is a crying need for a professional version of what the Thesis team is trying to do. It is a multi-million dollar market for someone who can deliver a product. Anyone up for getting moderately rich in the next 18 months?

I’ll be your first customer.

Update: StorageMojo is now running on its very own virtual machine. I’m noticing snappier performance – or maybe it is just a light weekend. End update.

{ 2 comments }

Price Lists update

by Robin Harris on Wednesday, 13 January, 2010

Is, technically, 2010 the beginning of a new decade? Only if you count starting with zero – as I’m sure many StorageMojo readers do.

But even civilians seem to agree. Partly out of a desire to see the disastrous double-0s put behind us. Partly because, after all, who cares that 2,000 years ago somebody said “this is year 1.” The Y2K problem didn’t happen in 2001.

But however you count it, 2010 is the beginning of a new fiscal year. People are feeling a tad optimistic now and budgets are in the air, so it is time for StorageMojo to update its Price Lists.

About half the lists have been updated, including fan favorites EMC and NetApp. The rest should be by the end of the week. You can tell if it says Updated January 2010 on the list.

Some of the old – historical interest only – lists are being deleted. If you are a Creek Path alum, copy now or forever hold your peace.

The StorageMojo take
Happy New Year!

{ 0 comments }