What’s Hadoop Good For?
Posted by Bob Warfield on June 24, 2010
Hadoop, for those who haven’t heard of it, is an Open Source version of Google’s Map Reduce distributed computing algorithm. After reading that Adobe has agreed to Open Source their Puppet modules for managing Hadoop, I got curious about what Adobe might be doing with it. It didn’t take long on Google to find a cool Wiki page showing what a whole bunch of companies use Hadoop for.
I went in thinking (actually without too much thinking, LOL) that Hadoop implied some sort of search engine work. I knew it was more versatile, but just hadn’t thought about it. A quick read of the Wiki shows all sorts of companies using it, and it seems like one of the most common applications is log analysis. The other quasi surprising thing is that it often seems to be used with relatively fewer nodes than I would have thought. After all, it is a massively parallel algorithm. However, it is apparently also pretty handy for 10-15 node problems. Hence much smaller organizations and problems are benefiting.
My conclusion, if any, is that it must be a really handy toolkit for throwing together analysis of all sorts of things that take a little grid computing (that term is probably no longer popular) in an elastic Cloud world.
Cool beans! I love the idea of scaling up a quick hadoop run to crank out a report of some kind and then scaling the servers back down so you don’t have to pay for them. Makes sense.