Comments on: A petascale parallel database

By: Roger

Roger — Mon, 31 Jan 2011 12:52:17 +0000

You are very good at this!
HadoopDB is focused on structured data (rather than unstructured data). HadoopDB can also be applied to other workloads, but the less structured the data gets, the less useful it will be relative to just using Hadoop.
I hope the project HadoopDB lives on…

By: Ryan Garrett

Ryan Garrett — Fri, 12 Feb 2010 00:42:22 +0000

Very interesting article, Sebastian.

Anyone interested in MapReduce, big data management and big data analytics you should check out the Big Data Summit Bay Area next week in Burlingame, CA. You can register at http://bit.ly/5KUX01. This is the premier conference on data warehousing and big data analytics. Learn how leading companies are leveraging technologies like Hadoop and MapReduce to turn data into dollars. Hear from Aster Data customers like Intuit and Mobclix and leading analysts on new technologies and trends in big data management and advanced analytics.

By: rdp

rdp — Wed, 10 Feb 2010 13:40:05 +0000

RE: “google does internally such as their own file system, mapreduce, server builds, their own switches and routers, their own http server, their own java servlet server” all key components of Enterprise Computing and its BIG brother, Cloud Computing”

I have always been a believer in scalability in computing. To this end I have been trying to decide on a Home/SOHO Cloud Computing design that could be implemented in this lifetime. Remember when “SuperComputing” looked so far out of reach of the little people?
My spin on this draws on Robin’s previous post “Why private clouds are part of the future”. So for a Home Cloud you would need:
“your heavily modified file system, Hadoop (mapreduce), your custom
server builds, your hand picked switches and routers, your own http
server, your own java servlet server”.
You could use COTS (Commercial Off The Shelf) components for the Home
Cloud since bandwidth and throughput will not make the difference
between your making a profit or not and surviving. This means that a new
market for Private Cloud components is developing to supply some of
the Google in-house developed components. “New” in the sense that these components have to be highly configurable (easily modified) to your local environment. A set of switches (hardware/software/firmware), and a roadmap for “dummies” to produce the required feature/function set without an in-house staff of programmers. Auto/Self configuring components are a possibility. It is not “rocket science” to do this.
IMHO,YMMV the rise of Private Clouds is a major shift in the computing paradigm.
I want a good cheeseburger (Information on Demand) with fries and a Pepsi (all “Value Add” Services) every time.
It is interesting to me that Robin has written on every topic required to compete with and even surpass the IDCs on a lesser scale. All the key components are outlined in his posts. Has anyone put them together?

By: juliet

juliet — Wed, 10 Feb 2010 07:37:57 +0000

Forget the problem of an application crash or slow data access or response time for an overloaded SAN switch port with Traverse’s service container that can monitor application response time and correlate that with the underlying storage components which are relevant to that application using its Business Container technology.
http://zyrion.com/solutions/server.php

By: ryan

ryan — Tue, 09 Feb 2010 22:25:11 +0000

Maybe you could elaborate on why one might choose to use this platform over some of the other options, namely, hbase, pig, sqoop, hive, etc, etc.

By: nate

nate — Tue, 09 Feb 2010 18:43:52 +0000

I was talking to a developer working on a project that will be running on hadoop soon and was interested to hear his comments on hadoop itself, it’s extremely poorly written, apparently Yahoo built it mostly by outsourcing the development overseas to some low quality coders, and the result is some pretty poor code. It can work it’s just not that good.

I find it pretty interesting how much stuff google does internally such as their own file system, mapreduce, server builds, their own switches and routers, their own http server, their own java servlet server.

Meanwhile others struggle to keep up trying to use as much off the shelf stuff as possible because they don’t have the engineering resources internally to even begin to approach doing it themselves, even a Microsoft insider admitted as much recently in an interview http://www.theregister.co.uk/2010/02/03/microsoft_bing_number_two_wannabe/

I suppose the message here is hope & pray you aren’t in a market that google is or might become interested in at some point if your relying on hadoop. Because whatever you can do, they can do 1000x faster with their ~billion servers, and their ~million PhDs.