Why Can’t I Search My Enterprise Data As Well As Google Searches the Internet?
Posted by Bob Warfield on September 2, 2007
Emergence is a property of the web that allows it to create high order pattern out of seeming chaos and thereby improve everybody’s experience. He introduces the concept by asking why you can’t search data within your own company as easily or as well as Google searches the web? Why indeed. Putting aside the issue that for every company I’ve been at the information is heavily siloed and therefore not searchable by any single tool, McAfee blames a lack of emergence. In this case, it is emergence related to Google’s PageRank algorithm.
Let’s take a brief aside to talk about PageRank, because McAfee has it exactly right when he says:
Google’s founders realized that even though the Internet is extremely decentralized the Web still has a huge amount of structure thanks to links. This structure can be exploited not just for navigation (i.e. hopping from page to page via links) but also for search. Links provide so much structure, in fact, that the Web appears to us to be a very orderly place; we can find what we want on it.
He goes on to liken this emergent web behavior to that of an ant colony which is highly decentralized but has emergent social behavior. Think of the pathways that get worn through the web as links are created as being like the pheromone trails that govern how ants navigate out in the world away from their colony. By using these trails the ants eventually identify for their brethren exatly where the good stuff (i.e. food) is as well as how to avoid the bad stuff (i.e. obstacles and enemies). Likewise, the link structure of the web helps us to find the good stuff too, and using links to modify what is learned from keywords alone is exactly the insight provided by the Page Ranking algorithm that all major search engines employ today.
With this insight about Page Ranking (and ant doings) in mind, let’s consider the Business Web or the so-called Intranet. In particular, let’s ask what business has to do to take advantage of the emergence phenomenon.
The first thing on the list has to be escaping from silos. Nothing can be more destructive to emergence than breaking up the information habitat into small fiefdoms that are not connected. Eliminating silos is much easier said than done because many times silos are there for sound business reasons. Some information should not be broadly available. For example, the company’s financial results cannot legally be made available far and wide within a company in advance of announcing those results to the public for companies whose shares trade in the public markets. Because of this, Business Web 2.0 requires a different Trust Fabric than Social Web 2.0. Until standards-based mechanisms exist to impose those Trust Fabrics, we will be stuck. The second issue is performing the search indexing over disparate tools. Here organizations have a little better leeway to plan for the future. They can at least start out using the same tools used on the web, rather than opting for various proprietary tools. In cases where proprietary tools are a requirement, it will be important that those tools provide access for search engines to catalog their results. Perhaps a minimal subset capability would be a facility to deliver HTML views of the data. Lastly, on the subject of silos, the organization will need to find a search engine that has the flexibility to navigate the disparate collection of information repositories and dispense the results of searches according to a Trust Fabric that’s compatible with the requirements of the business.
Put another way, Enterprise IT is often obsessed with providing users with Single Sign On (SSO), so that logging in once gives people access to all the online resources based on their permission levels. Those interested in fostering better reuse of information need to be obsessed with Single Search Engines. It should be possible to type a search query into a single type-in and get results from every system that is a repository. Here’s a thought on that: what about mashups for search? Maybe you can’t get to a SSE. No worries, the appearance of an SSE will suffice. There have been metasearch engines around since there’s been more than one search engine, so why not here too? If your organization has not reached the SSE ideal, and it is hard to do so, why not make creating an Enterprise-wide metasearch a major initiative? It isn’t all that hard to do. Of course this all assumes thin clients running in web browsers, so make sure you aren’t too dependent on some proprietary fat client for search or a key information repository!
The second requirement when enabling emergent behavior is a mechanism whereby the information consumers can provide the feedback needed to create the emergent pathways. This is also not an especially easy requirement to satisfy because we need some universal mechanism for establishing the pathways that anyone can follow and that are later amenable to analysis by the search engine. The two most popular mechanisms at large in the World Wide Web for this are links and tagging. Marc Canter and others have recently bemoaned that tagging hasn’t really kept up, so perhaps links are a more preferable mechanism. Tagging without links is possible, but one wonders how useful, anyway. Tags don’t contribute to emergence unless we have a powerful search algorithm that harnesses them, so that’s a strong argument not to become too dependent on them anyway.
The upshot is that it has to be possible to link to every bit of information that will be searched, and it must be easy to create new links (new ant trails). The links are not so bad if you have SSO and SSE (even via Metasearch). You can hardly get back search results without getting back a list of links.
The last requirement has to do with encouraging a culture of emergence. Does your organization foster behaviors that lead to emergence, or does it discourage them? McAfee calls this Freeform Software, but I think it is just as important to apply these principles to your organization’s culture:
To sum it all up, here is my wish list for enabling emergent behavior inside the Enterprise:
1. Single Sign On
2. Single Search Engine, including Metasearch to get there
3. Thin clients throughout
4. Mechanisms to give back structure: I want to embed links in every document type, or as many as I can. Tags are an excellent follow-on, but they’re less important.
5. Cultural factors need to encourage creation of content and structure.
In the next installment, we’ll look at Business Alternatives to Page Rank.