Patrick Eaton, the Berkeley PhD. who architected EMC’s Atmos cloud storage product, left EMC about 3 months ago. He joined a Boston-area search firm, Endeca, where he is a software architect working on a team to scale Endeca’s core MDEX search engine.
Dr. Eaton co-authored a couple of key papers on the computer science underpinning Atmos.
The StorageMojo take
The Atmos project is having a rocky time of it inside EMC. The sales force, which traditionally has had a lot of autonomy as long as they deliver the numbers, isn’t excited about selling much lower-cost/GB storage. Conservative enterprise IT buyers are as leery of unproven technology from EMC as they are from anyone else. And EMC hasn’t been crowing about Atmos either.
I recall a quote from Patrick where he commented on the all the resources Atmos was getting – dozens of people, exec attention – and I thought that might be a mixed blessing. Trying to solve hard problems to meet the CEO’s deadline is no one’s idea of a good time – especially a newly minted PhD.
Kudos to EMC for the Atmos effort and I wish them success. Yet several former EMC’ers have told me that EMC is not the most congenial place to do software. Given the enterprise sale force’s focus on hardware I find that easy to believe: the temptation for sales to discount “free” software to close a big hardware deal is almost irresistible.
And Atmos is a far step beyond EMC’s traditional management and data protection software. It is a whole new product family whose economics and implications are not well understood.
The bottom line: it is rarely a good sign when a company can’t keep the architect of a new product on board. Yes, there are other architects out there, but it usually means that some decisions were made that people now wish were made differently. Only time will tell what the case is with Atmos.
Courteous comments welcome, of course. EMC doesn’t brief me on Atmos or anything else since I won’t sign the non-disclosure agreements (NDA) they require of all analysts. Why brief people who talk and write for a living and then require them not to talk or write about the briefing?
MapReduce and its open source version, Hadoop, are parallel data analysis tools. A few lines of code can drive massive data reductions across thousands of nodes.
Cool.
Powerful though it is, Hadoop isn’t a database. Classic structured data analysis of the model/load/process type isn’t what it was designed for.
That’s where the paper HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads (pdf) comes in. Written by Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel Abadi, Avi Silberschatz and Alexander Rasin (the former 4 @Yale, and the latter @Brown) the paper proposes a method for building an open-source, commodity hardware-based massively scalable, shared-nothing, analytical parallel database.
What it is
HadoopDB coordinates SQL queries across multiple independent database nodes using Hadoop as the task coordinator and network communication layer. It uses the scheduling and job tracking of Hadoop while it intelligently pushes much of the query processing into the individual database nodes.
There are four components to HadoopDB.
Database Connector. Each node has its own independent database. The connector is the interface between the database and Hadoop’s task trackers. A MapReduce jobs supplies the Connector with an SQL query and other parameters. The Connector executes a SQL query on the database and returns results as key value pairs. It can implemented to support a variety of databases.
Catalog. The information needed to access the databases and metadata such as cluster data sets, replica locations and data partitions is kept in the catalog.
Data loader. The data loader is responsible for two jobs. First executing a MapReduce job over Hadoop that reads the raw data files and partitions them into as many parts as the number of nodes in the cluster. Second, the partitions are loaded into the local file system of each node and chunked according the system-wide parameter.
SQL to MapReduce to SQL planner. The planner provides a parallel database front end to enable SQL queries. The planner transforms the queries into map reduce jobs and optimizes the query plans for efficiency. This is where scratch that this is the secret sauce of HodoopDB.
HadoopDB complements the Hadoop infrastructure and does not replace it. Analysts have both available as needed.
Heterogeneity
A key issue for Internet-scale systems is the ability to run in a heterogenous environment where multi-year build-outs and rolling node replacement are the norm. That means that some nodes will be faster than others. HadoopDB breaks the work down into small tasks and moves them from slow to fast nodes automagically.
Results
The authors ran some benchmarks on Amazon’s EC to to test performance. The HadoopDB load times were about 10x that of Hadoop, but the higher performance of HadoopDB usually justified the longer set up time.
The authors found that HadoopDB was able to approach the performance of parallel database systems on much lower cost hardware and free software. Given the gift of the projects one can expect higher performance as improvements are made.
The killer app for private clouds?
MapReduce and Hadoop are already in wide use among Internet-scale datacenters. As companies begin to understand and correlate social media, web activity and ad response rates, the demand for large-scale parallel database processing will grow. But will they want to ship it out to Amazon?
Depending on the quantity and sensitivity of the data many organizations may prefer to keep the processing in-house. Private scale out Hadoop clusters may become the poor companies data warehouse of choice.
The StorageMojo take
HadoopDB is more science project than commercial tool today. Yet the project demonstrates the feasibility of using scale out compute/storage clusters for work that day typically requires proprietary high-end scale up system architectures.
If capital costs are reduced by two thirds with a commodity/FOSS architecture, companies could afford to hire the expertise required to make it work. The free software/paid support model will prove quite successful in this space.
James Hamilton, Amazon architect and a very smart guy, recently blogged about private clouds. In Private Clouds Are Not The Future he argues that economies of scale make public clouds much more efficient than private clouds.
I think we agree that several effects make web scale public clouds more efficient:
Higher quality services. Large clouds can economically employ experts to design and optimize their services and infrastructure. Security and server/storage design are two areas where deep expertise can provide more reliable and efficient service.
Utilization. Power systems and power cost are optimized when data centers are run at 100% utilization. As utilization rises across the board so does the capital efficiency, i.e. work per invested dollar.
Cost. Large-scale investments create their own lower-cost dynamic. Public cloud providers save money on infrastructure acquisition through volume buys. In addition, their volume enables them to acquire optimized components, such as high-efficiency power supplies or custom cost-reduced motherboards, that offer little economic advantage to small volume buyers.
Portfolio advantages. With a mix of customers and jobs web-scale clouds have a more stable aggregate load. Some customers are growing, some are shrinking, but the net demand becomes more stable with size. This, in turn, enables public cloud managers to drive utilization higher with less risk of pegging the system.
With all these advantages it is obvious that private clouds are not the future. Or is it?
It isn’t all about the Benjamins
Economics is not the driver many assume. Individuals and companies often select less economic choices. Some people buy cars that cost $200,000 and get 12 miles to the gallon. Some companies buy $6/GB storage and then utilize just 1/3rd of that costly capacity.
Often perceived benefits are not well measured in dollars. Convenience, availability, consistency and control often relate to emotional needs and wants that are rarely quantified or questioned.
But we don’t have to invoke those to understand why private clouds will be part of the computing landscape. Just a quick look at one of the large Internet data centers will tell us what we need to know.
Show me the power
All the advantages of public clouds have analogs in the world of power generation and distribution. Power generation is cheapest when centralized and large-scale distribution systems move power at the lowest cost per watt.
Electrical power generation and distribution is over 125 years old. The technology is well understood, the industry is mature, and a massive infrastructure — including mile-long coal-hauling trains — supports production and distribution.
And yet, Google’s massive Dalles, Oregon data centers, built next to a substation a few miles from the nation’s largest hydropower system – one of the world’s most reliable power sources – flanks each data center with generators. I expect Amazon does the same.
Access
Clearly, access to data is at least as important as access to power or why would data centers spend the money on uninterruptible power supplies?
Despite the maturity of the power industry people realize it cannot be relied upon 100%. Therefore they maintain their own power storage, generation and distribution systems.
Is the Internet that different?
We cannot rely 100% on Internet access to our data. If the application is important enough, as judged by often subjective human criteria, we will keep our data as close as Google keeps its generators.
Even if it isn’t the most economic choice.
The StorageMojo take
My thanks to James Hamilton and his post for a lucid justification for an all cloud IT infrastructure future. He helped me see why that isn’t going to happen and for that I thank him.
I’ve grappled with the question of private clouds for the last couple of years. The advantages of web scale systems became more obvious, but the human desire for reliable data access and control has not receded.
Public and private will not displace each other: they will coexist just as public and private power sources coexist today. No doubt public clouds will claim the majority of the market whether measured in dollars or exabytes, but private clouds will remain significant contributors to our data infrastructure for decades, if not centuries, to come.
Verari Systems is now Verari Technologies. The company’s assets were purchased by the original founder, Dave Driggers, after an attempt last year to get another round of financing foundered.
They’ve had some success with their containerized compute/storage systems. There haven’t been many buyers amidst the Great Recession and the credit crunch didn’t help.
Original Founder Leads Investment Group in Purchase of Verari Systems’ Assets
Founder aims to re-start company with concentration on data center design and optimization services, modular container-based data centers, blade-based storage and high performance computing solutions.
San Diego, Calif. – January 19, 2010 – David Driggers, the original Founder of Verari Systems, Inc., . . . today announced the successful acquisition of substantially all of Verari Systems’ corporate and intellectual property assets by an Investment Group led by Driggers.
Mr. Driggers is re-starting the Verari engine this week. The new company, Verari Technologies, is offering immediate support to past Verari Systems’ customers.
Verari’s award-winning FOREST containers are one of the industry’s best selling portable data center solutions. The containers, as well as Verari’s BladeRack architecture, utilize Verari’s patented Vertical Cooling Technology to increase energy efficiency while reducing a customer’s energy bills.
“You’re going to see a concerted effort on our part to license and promote these unique technologies,” states Mr. Driggers.
Most of the staff was laid off last year because the company couldn’t meet payroll. The new company retains much of the former senior management.
The StorageMojo take
Verari is wise to take a step back from direct competition with HP, SGI and IBM. HP owns the biggest chunk of the blade market, buys over half the world’s disk drives and, in the 9100, has some very dense storage. But HP can’t be all things to all people – and Verari can help fill the gaps.
While the density benefits of blades are undeniable, some question whether they are cost-effective compared to high-volume commodity boxes. Verari’s pricing seemed more aggressive than most blade vendors – perhaps too aggressive – but price is another competitive tool they may choose to wield to the benefit of buyers everywhere.
I found your blog after searching for storage alternatives. I have to say, its really impressive and has helped me a lot so far. I was wondering if you could offer some advice.
We run an online version control service. Currently we are hosted on a VMware environment using FC SAN (SAS and SATA).
We’re growing into the 3 TB+ range and looking for alternatives, since we’re paying $2.50/GB for FC SAN (crazy). We looked at NetApp, but with all the stuff going on these days I have to think there is something less expensive and more creative.
Basically, our needs are:
Fast read and write performance (500+ r/w iops – we have over 13,000 commits per day)
Shared across many machines. We are currently using NFS.
Something that won’t require a team to manage. Although, we already manage our entire Linux environment.
I noticed a post about Gluster, ParaScale, and Nexenta. They look promising, but my fear is that they will require too much maintenance. SAN and NFS are pretty simple and if we get NetApp from our hosting provider they manage it for us. Although, they want to charge us $8,000/mo for it (two shelf, 28 450 GB 15k SAS).
As I dive into storage I think I get more confused Any advice is greatly appreciated.
When I asked if I could publish the note – which has been edited for clarity and anonymity – I had my own questions:
Why do you think that Gluster, ParaScale & Nexenta will require too much maintenance? Also, when you say SAN, are you referring to Fibre Channel or simply a dedicated Ethernet storage network?
The reply illustrated a facet of the marketing problem that new technologies face: uncertainty.
Not sure really, I just have not had experience with any of those solutions yet. Nexenta looks pretty impressive. I’ve also heard some great results from DRBD.
We have Fiber Channel with HBA cards. It’s still shared storage, but really fast.
BTW, DRBD is the name of an open-source software product:
DRBD® refers to block devices designed as a building block to form high availability (HA) clusters. This is done by mirroring a whole block device via an assigned network. DRBD can be understood as network based raid-1.
The StorageMojo take
My first thought is that anyone who manages a technical hosted service that costs several $K per month should be able to manage a fairly modest scale-out cluster whose capital cost may be only 2-3 months of rental. And 28 15k drives seems like overkill on both the IOPS and the capacity.
But I don’t know much about version control I/O profiles. Maybe the problem is harder than that.
I moderated a panel on cloud storage at Tom Coughlin’s Storage Visions 2010 conference. Some good stuff came out of it.
4 companies presented: IBM, Bycast, Cleversafe and Asankya.
IBM, now a services company, talked about the service needs of cloud providers or cloud customers.
Bycast Bycast, which may have the largest installed base of any cloud software provider, presented on the process that they typically see for private cloud implementation. My interpretation of the process:
Edge sites install a gateway node to the central private cloud repository
The edge site learns what its local data needs are
A local disk cache is added to the gateway node to improve performance
A workable balance between local wants and economics is achieved.
It took 3 years for the enterprise to go from pilot to start full deployment. Data storage rose from 36 TB at the end of year 1 to 750 TB at the end of year 6.
Cleversafe Cleversafe may be the leader in implementing advanced erasure codes in storage software. RAID 5 & 6 are both forms of erasure codes, but the math has been refined in the last 20 years. Much higher levels of data availability with lower overhead are now possible.
As disk capacities climb and disk error rates remain constant, the expected annual data loss rises. By 2020 you can expect that a 1,000 disk storage farm will lose over 200 GB of data annually – even with mirrored RAID 6. (RAID 16? The mind boggles).
Advanced erasure codes combined with physically dispersed storage make all that go away. Cleversafe estimates that a dispersed storage infrastructure requiring 10 of 16 nodes to reconstruct the data is 1,000,000 times more reliable than RAID 16 – reducing expected data loss from 200 GB to 200 KB.
Asankya
If Bycast has proven private cloud software and Cleversafe has disaster-proof storage, then we’re done, right? Except for the freakin’ network latency that makes “cloud” storage synomous with “slow” storage. That’s where Asankya comes in.
Their basic insight is this: TCP/IP was built when a 200 nanosecond CPU and a couple of meg of RAM was a Hot Box. What if we were to change the protocol to take advantage of modern resources – could we do better? Well, duh!
They’ve developed the RAPID protocol and an overlay network called RAPIDnet that they claim dramatically improves network performance. How?
Multipathing. Instead of tying a session to a single network path, RAPID decides on a per-packet basis the fastest route for that packet.
Maximum bandwidth utilization. Multiple paths means more available bandwidth – and RAPID loads each path as full as it can.
Network deduplication. Originating nodes keep track of all packets that pass through, so when a duplicate packet shows up it doesn’t resend it.
Net net: by increasing bandwidth and reducing delays, Asankya cuts latency, making cloud storage much more feasible for interactive apps. Cool!
Of course, this all has to work in the Real World. Evidently it does, as they have customers. And the technology came out of Georgia Tech.
The StorageMojo take
The latter 3 companies make an important point about cloud storage and computing: we can do much more to make it economical, safe and fast. That’s a Very Good Thing.
Asankya is asks if network intelligence should be in the core or on the edge? Cisco, of course, prefers a smart core, so Asankya is a clear threat to them. The rest of us might disagree.
Courteous comments welcome, of course. I’m doing some work for Bycast, but, alas, not for the other companies. Thanks to Tom Coughlin for assembling a good group for the panel. I’m hoping I can post links to more info on all of them.
Mobo-mounted SSD. Soligen has announced an SSD that mounts on motherboards. The drive mounts firmly, requires no special cooling and takes little board space.
Tiny USB drive. Verbatim has announced a tiny USB thumb drive that is a fraction the size of most current thumb drives. Call it a thumbnail drive. Perfect for keychains.
Super Talent is showing a 2 TB PCI-e SSD and claiming strong performance. At $6k gamers won’t buy it, but enterprises might.
Raidon is showing a nice collection of 2.5″ drive enclosures, including 8 drive arrays. Not much larger than a 5.25″ drive. Can’t find them all on the web yet, though.
A 32 GB Class 6 Micro SD is close to announcement. Micro.
Supermicro showed a 48 drive JBOD/36 drive server chassis. The server is almost as dense of Sun’s Thumper – and drives are front and rear accessible.
Eye-fi’s Wi-Fi enabled SD cards don’t handle AVCHD video files, but they’re working on it. With all the SD card using consumer, prosumer and even pro camcorders using SD, this will be a popular market for them.
How about a double-ended flash drive: one end for personal; the other for work? Developed with the help of the social community at Quirky.com. They pay developers and influencers a percentage of the revenues. Cool!
PoketyPoke is a con-call management service that reminds you of your concalls and optionally records them and provides transcripts for $9/hr. I like.
In other news
I moderated a too-short panel on Cloud storage at Storage Visions. Several technologies are out there that will change the current economics and application profiles of online storage. The field is young.
Got an update on USB 3.0 from Symwave, the fabless IC firm that makes USB 3.0 chips. Bottom line: unlike USB 2.0, whose marketing made promises the protocol could not keep, the new version can achieve over 400 MB/sec.
Here’s the 30 seconds over USB 3.0 video:
The StorageMojo take
No blockbuster, sector-defining new products. But many stepwise enhancements that move us forward.
USB 3.0 is going to push consumer storage as we can move gigabytes in seconds rather than minutes. But it looks like Apple is poised to miss this one – which could cost them a big chunk of their pro market.
Courteous comments welcome, of course. Fixed the pooched hyperlinks and a couple of other minor edits.
2009 has been an eventful year: the Great Recession has driven big changes in enterprise behavior, opening up the field to many new players. Isilon, for one, is reporting healthy growth and they were on the ropes 2 years ago.
Those changes are reflected in my take on the biggest stories of the year:
(8) Tiny server clusters
Instead of putting many virtual eggs in one power-hungry basket, why not build low-power/low-cost servers that don’t need VM software at all?
Microslice servers achieve availability through cheap redundancy. Of course, no enterprise salesman will sell them, so if their advantages prove out the efficiency gap between cloud and enterprise shops will only grow.
(7) Nightmare on DIMM street
Bianca Schroeder’s, et. al. finding that DRAM is hundreds to thousands of times more error-prone than chip vendors said means that every device that claims to be “enterprise” better have at least SECDED – single error correction/double error detection – ECC.
(6) Apple drops ZFS
A golden opportunity to bring a 21st century file system to millions of people sank without a trace. But if the Sun/Oracle deal gets closed it might be revived.
(5) Data Domain bidding war
An EMC blogger was trashing DD 2 weeks before the bid – and singing their praises after it. So what else is new?
EMC legitimized dedup – and the bastards say welcome.
(4) Cluster-based scale-out storage
HP bought IBRIX and Isilon is growing fast – storage clusters have arrived. EMC will continue to pooh-pooh it until they get Atmos functional – or maybe they’ll bite the bullet and buy someone who already has it working.
(3) Flash
STEC’s 10x stock leap – and crash – to everyone announcing flash drives and cards and appliances: this is not a flash in the pan. Fusion-io’s big OEM deals and announcements by newcomers say the party is just getting started.
(2) Cisco’s bong-sized cloud
Cisco’s UCS may not be a success, but they have forced everyone to rethink their businesses. Is a new round of verticalization about to begin as big companies seek to drive growth by taking away their former “partner’s” markets?
It used to be a commonplace that he who owned the customer’s data owned the business, but the horizontal model of the last 25 years changed that. But if the Oracle/Sun deal completes, Cisco will find that Oracle’s grip is tighter, giving HP and Cisco common cause once again.
(1) Cloud infrastructure
Unlike some other hype-driven IT trends, cloud infrastructure is here to stay because Google, Amazon, Yahoo and Microsoft have proven it makes economic sense. Which is more than client-server had going for it for many years.
Smart IT people looking to demonstrate added-value will figure out how to leverage that for real competitive advantage over less-nimble foes. It isn’t a quick fix though and enterprises will need to think long term – a skill rusty from disuse.
The StorageMojo take
Like a termite-riddled barn after a heavy snow, the Great Recession is seeing old models collapse. We can’t afford to keep doing what we’ve been doing.
As the new models emerge, competition will grow in the hot areas, leading to even more innovation in the next 3 years than we’ve seen in the last 5. More on that in a future post.