Giant Global Graph: Do You Need A Clue?
Posted by Bob Warfield on November 29, 2007
Sir Tim, who more or less invented the World Wide Web, recently did a blog post entitled the Giant Global Graph. It’s a long rambling post that touches on multiple themes. It is a logical reductionist discussion that only geeks are equipped to fully understand and appreciate. Not because it is a superior way to organize and article, but only because our minds are pitifully linear compared to more intuitive thinkers. Opinions vary on how well these themes go together as the primary insight behind the whole article is that the notion of additional structure for the web beyond mere hyperlinks in the form of a graph is valuable and far reaching. Let me say it again, slightly differently: Berners-Lee is on about the idea of additional structure and content for the web beyond hyperlinks. Hyperlinks are navigational. They convey some meaning beyond navigation, but not much. Perhaps the most famous is Google’s Page Rank which makes the assumption that lots of links to a page indicate the page may be of more value to a searcher than a page with few links into it. There may be other things one can intuit by examining hyperlinks, but it’s hard. Making it easy, and especially making it easy for computers, is what Tim Berners-Lee wants to accomplish with his Semantic Web notions. As long as we’re layering weird but related notions into this mashup, I’d like to add one I haven’t seen the other commentators write about which is the use of what are essentially web hyperlinks (a bit more, but close) to allow computers to interact directly with one another in a practice that has been called RESTful Architecture.
It’s quite amazing, really, what’s possible with a clean, simple, and well designed architecture like the web. The danger is that if we extend it as Sir Tim proposes, that we do so equally as elegantly. There’s a lot out there now, and a lot of moving parts interacting with what’s out there. Adding sand in the machinery is not helpful. So what exactly did Sir Tim’s latest missive bring to the table?
There’s a nice historical / layered architectural view of what the Internet and World Wide Web are and how they differ. Put simply, the Internet is the generic plumbing that lets computers talk to each other Internationally through standard protocols. The World Wide Web is a notion of documents that users interact with over the Internet. Both are what mathematicians call graphs, which are nothing more than nodes with connections between them. The Internet is a graph of computers. The World Wide Web is a graph of documents. Again, for “graph” substitute “network of nodes with connections between them.” Pretty easy so far, no? Okay, we’re a third of the way through the post, and we’re going to kick things up a notch.
TBL’s next concept that he brings to the table is, “It’s not the documents, it is the things they are about which are important”. He goes on to say this is obvious, but I don’t think it is as obvious as he thinks when you go on to consider the real ramifications of all that. TBl wants to somehow factor out the core ideas in these documents and use those ideas to create another kind of graph, which he calls the Semantic Web or Giant Global Graph. These core ideas become the nodes of the graph, and they link together documents and related ideas in interesting ways.
Why? Because computers are actually pretty lousy at reading plain English (or any other language) and figuring out what those underlying ideas are. For examples, TBL mentions things like:
– Biologists wanting proteins, drugs, or genes. BTW, any profession or interest area will have a big list of jargon that is peculiar to that interest area and that should be factored and graphed for any web document.
– Business People want customers, products, and sales information.
– People in general want Social Relationship information, and that is what people refer to as the Social Graph.
You see where he is coming from? I wasn’t trying to be insulting with my post title. When I say, “Do you want a clue?”, I’m referring to this new graph structure as providing clues to computers about what the heck is actually on a web page so that you can use the web pages in novel ways that are hard today but very useful if you can get a fully annotated Semantic Web, er GGG. This is not the easiest thing in the world to do, as you can imagine. There is a heck of a lot of work involved in doing all that annotation, and a lot of it may have to be done by hand.
However, if we are very very clever, some useful pieces may become automatic. Take the Social Graph. If we create our own Social Graph about our relationships with people, it may contain enough information that the web can meaningfully change how it interacts with us as regards those people. Today, we look at it as happening in the context of a Social Network, but it should not be limited. Why can’t I go into my address book, pick a person, and reach out with high certainty into the entire web to see as much as possible about that person? Where are their blogs and home pages? Which Social Networks do they belong to? What articles quote them? What company do they work for? If I visit the company’s web site, wouldn’t it be cool to be able to tell who I know that works there?
A couple of things should be coming clearer now. First, I hope you can see why many of us (and now TBL), recognize the term “Social Graph” as being separate from “Social Network.” The Social Graph can be so much more than a particular web site focused on Social Networking. It can literally impact every aspect of your web experience. And it is a collection of data that is at once both very open and very private and personal. There are pieces we want everyone to see, and pieces we want to keep entirely to ourselves. It is a very tangled web we are weaving. It grows and morphs constantly. We would like to start building it once and never start over. This is why I’ve said the Real Social Graph Hasn’t Shown Itself Yet.
Here is another way to think about the GGG or Semantic Web. The web of today is manual and literally. You create a concrete link between documents. You traverse the links. They are largely fixed and relatively inviolate. This is a good thing. You don’t want to lose track of a thing. But that is only one form of navigation. Sometimes you don’t know where the first bread crumb on the trail is. For that, you need search tools of various kinds. The Semantic Web can inform the search process much more fully than keywords and Page Rank. Beyond search, we would like a living web that restructures itself as it learns. A change in one place can ripple through this graph structure to have far reaching and beneficial effects. Suddenly the map of the web can be personalized around your interests, knowledge levels, relationships, and needs. That’s pretty cool!
TBL winds up with a cautionary note about control. Each of these layers has involved some loss of control. First we gave up the idea of private networks to get to the Internet. Anyone can be on it, including your worst enemies, competitors, criminals, and other evil doers relative to you. Second, the World Wide Web involved a loss of document control. Everything went to HTML instead of native document formats. HTML involves a lot of loss of control. It has gotten better, but real page layout and typography afficionados cringe. Now we’re talking about sharing that graph data. A graph requires two components, a lock and a key. You hold the key. Your Social Graph is your set of friends and relationships. The lock is the set of pages that the key unlocks. There is cooperation on both sides. And, as TBL points out, this loss of control doesn’t have to mean that someone can access data they have no right to access. It is important that you maintain control. Even though the Internet is not a private network you can still run HTTPS to encrypt the packets or even some other protocol. We routinely trust sensitive information to the Internet these days because there are cultural patterns for how it’s done and real technology to help protect us. These things have yet to evolve for the next graph layer that TBL wants us to construct, but it is necessary infrastructure for this all to work.
What has been the reaction of others to this?
Umair, as usual, gets it, and points out an important pitfall to avoid: the social graph is not web 3.0 (and the converse is also important: web 3.0 is not the social graph). I hope from my post above it is easier to see how the Social Graph is a subset of the GGG, and how it is also different than Social Networks. GGG is a lot bigger than just Social.
Stowe Boyd, for example, had been very anti-Social Graph, but now says he “gives”. Boyd was right to insist on more clarity before giving, but he is still suspicious that TBL is somehow trying to hitch a free ride on Social Graphs for his Semantic Web. On the latter Boyd is more suspicious, but I think needlessly. I hope you can see from my notes above what its all about, this Semantic Web. The reason Stowe sees so much more fire around Social Networking is because this is an area where users have found sufficient value to create Social Graphs for themselves. So far, we haven’t seen much action elsewhere, but we should.
My guess is that there are others areas of sufficient interest to generate spontaneous volunteer work of the kind we see around Social Networking. There just needs to be the right enablement. Perhaps it will be some form or flavor of the bookmarking trend that surrounds sites like Digg. Perhaps it will be around online retailling. Merchants have uniform means of referring to products in the form of standardized EDI, Bar Code, SKU, and other information. Perhaps if shopping search got dramatically better when such information was available in the GGG for a page, it would drive many to add the information.
Anne Zelenka on GigaOM takes a dim view of all this. She feels that computers are poorly suited to understanding relationships, and that trying to shoehorn the Social Graph into the Semantic Web sells it short. I can’t seem to find a good concrete objection or counter example in her article though, other than a vague sense of unease about it all. She cites another of her articles that talks about the downsides of a distributed and open social graph. My problem is I can’t see where TBL is advocating this. In fact, I’m not sure I see where anyone is. We all want control over our Social Graph. Remember my analogy of the lock and key. I’m the only one with my key. Also go back to the example of how the Internet involved a loss of control but that various standards came into play so that privacy could still be preserved. That has to be done for the GGG as well. There will be lots of kinds of data there that we may not want out roaming freely. Facebook’s tracking of what you purchase with Beacon is another great example of a GGG like Graph Structure (call it the Global Purchases Graph or GPG) that some folks are upset at losing control over. In other words, with some maturity in the standards, the tradeoffs can be extremely palatable and do not amount to putting all that data right out in the open. That we don’t have this today is another reason why I say nobody has yet seen the Real Social Graph or the Real GGG either for that matter.
One thing that still seems missing from many of the other commentaries I’ve read is that this GGG notion puts a lot of the value that is currently being delivered by proprietary platforms like Facebook back into the Web. That’s a better and more open model. It’s in the best interests of everyone to pursue that model. In fact, GGG tries to go beyond just being a Social Graph precisely so that it becomes general purpose means of capturing almost any kind of structural annotation and linkages around the Web’s document model. That sort of thing can help future proof a big idea so it runs further before we find ourselves having to add a fourth and fifth layer on top of the first two that are already there.
In conclusion, I think TBL did a good job tying his vision back to some immediate realities thereby making it more concrete and touchable. It still has quite a ways to go, but it’s great to still be in early days and not have to worry about whether Facebook and Google have a permanent and irrevocable stranglehold on all innovation. My own little contribution is that now that we’ve tied Social Graphs into the Semantic Web, I’d like to see REST somehow get tied in. After all, why shouldn’t we annotate rest API’s on web pages too so that they can easily be found and connected to? It’s like putting electrical sockets in a room for future use.
I guess it’s not just me seeing a connection between the GGG and REST, see Discipline and Punishment for more.