Why Don’t Search Startups Share Data, Part 2
Posted by Bob Warfield on August 22, 2007
I mentioned in an earlier post that search startups ought to look into a divide and conquer approach when crawling the web. After all, one of the biggest complaints about a lot of interesting search services is they don’t find as much as Google does. TechCrunch, for example, complains that Microsoft’s new Tafiti produces search results that are “not as relevant as Google or Yahoo“. And yet, they also admit Tafiti is beautiful (as an aside, it is very cool and worth a look to see what Microsoft’s Flex killer, Silverlight, can do for a web site). If the Alt search sites band together to do the basic crawling and crunching using Google’s MapReduce-style algorithms (possible based on the Open Sourced Hadoop Yahoo is pushing), they could share one of the bigger costs of being in business and ameliorate the huge advantage in reach that the biggest players have over them.
ZDNet bloggers Dan Farber and Larry Dignan ask whether Open Sourced Hadoop can give Yahoo the leverage it needs to close the gap with Google. Their first words are that “Open source is always friend to the No. 2 player in a market and always the enemy of the top dog.” I don’t think Hadoop by itself is enough, but if Yahoo were to create a collaborative search service, maybe it would be. In fact, what if search was much more like Facebook only more open (Hey, if Scoble can do it with a hotel, I can do it with a search engine!)? In a manner similar to my “Web Hosting Plan for World Domination“, Yahoo could undertake a plan for “Search Engine World Domination”. Here’s how it would work:
- Yahoo builds up the Hadoop Open Source infrastructure for Web Crawling. Alt Search engines can tie back into that to get the raw data and avoid doing their own crawling. Even GigaOm says “The biggest hindrance to any search start-up taking on Google (or Microsoft, Ask or Yahoo for that matter) is the high cost of infrastructure.” Let’s share those costs and further defray them by having a big player like Yahoo help out.
- Yahoo can also offer up the Hadoop scaffolding to do any massively parallel processing these Alt Search Engines need to compute their indices. Think of it as being like Amazon’s EC2 and S3, but purpose-built to simplify search engines. People are already asking Amazon for Search Engine AMI’s, so there is clearly interest.
- Now here is there Facebook piece of the puzzle: Yahoo needs to turn this whole infrastructure play into a Social Networking play. That means they offer Search Widgits to any Social Network that wants them, and they let you personalize your own search experience by collecting the widgits you like. Most importantly, Yahoo creates basic widgits that reflect their current search offering, but they allow the Alt Search Engines to make widgits that package their search functionality. Take a look at Tafiti and see how it let’s you select different “views”. Those views are widgits!
- Yahoo gets a big new channel for its ads, and it gracioulsy shares the revenues with the Widgit builders because that’s what makes the world go round. Perhaps they even have virtual dollars that can be used to pay for the infrastructure using ad revenue, although I personally think they should give away as much infrastructure as possible to attract the Alt Search crowd to their platform.
Don Dodge, meanwhile, is wondering what the exit strategy is for the almost 1,000 startups out there trying to peddle alternative search engines. It sure seems to me that creating this search widgit social network world solves a big problem for Yahoo and at the same time creates a lot of new opportunity for the exit strategy of these engines. Suddenly, they have access to large volumes of data they couldn’t afford and a distribution channel in which to build an audience.
Open Source Swarm Competition in the Search Engine Space is Born!