David Hillis | 06.30.10

The Narrative Algorithm – The Missing Link in the Semantic Web

Semantic content is the future of the Web. Whether you call it Web 3.0, intelligent content, or the semantic web, most experts agree that we are moving towards an internet landscape where information is easily shared beyond a single website and repurposed in relevant ways.

Generally speaking semantic content is XML. XML is a simple language for tagging content to define what the information means. An element of content tagged <date> would be assumed to be a calendar date whereas <person> would be the name of a person. By providing descriptive mark-up to content it becomes manageable highly relevant ways, like creating a list of events from across multiple websites. Of course this is exactly how Ingeniux CMS works and an area we have done a lot of innovation. Yet a CMS only manages content within its own repository. When you look at the billions of pages of unstructured content on the web, finding the semantic meaning becomes much different challenge.

Search has always been seen as the Killer App for semantic content. When a search engine understands the semantic meaning of content it can thematically organize information and make results much more useful. A semantic search for "Abraham Lincoln" could include a short biography, photo gallery, a timeline with milestone dates, a map with key historical pinpoints, and other types of information. This approach can be much more useful than a simple "10-box" of links to popular web pages.

The idea of semantic search is the search engine would "write an article for you." This is exactly the tact CPEDIA has taken.

Cpedia (http://www.cpedia.com/) is a new off-shoot of the semantic search engine Cuil.com (pronounced "cool"). It is a Wikipedia-style website, but rather than having human edited content, Cpedia is fully automated. The software actually writes articles by taking snippets of content from different sites and assembling them in a sort of "semantic mosaic" bound by an underlying narrative.

This is a groundbreaking idea. It leverages search to tell stories assembled from billions of bits of information from across the web. Many people understand information best when in a narrative or story format. CPEDIA has made a bold first step be extending semantic search with a narrative algorithm to transform information into stories. Creating a narrative algorithm is an important step in advancing the semantic web, but in my option the route that Cpedia has taken has not gone far enough.

Take for example a search for "Ford Motor Company." (http://www.cpedia.com/search?q=Ford+motors) Cpedia returns a full article, but the narrative seems off the mark. One would expect the narrative of the article to be on Henry Ford, assembly line manufacturing, the Model T, and other topics; however in the Cpedia-generated article the lead of the story focuses on Robert McNamara, the once president of Ford better known for his role in Vietnam and cold war politics.

After looking at a few articles it seems that the "narrative algorithm" is trying to join to snippets of information together based on common terms. So the Ford article slants towards McNamara not because the narrative of the article is about him, but because the first snippet happens to reference him and the algorithm is trying to tie the snippets together in clusters around a topic. It is trying to apply cohesion to the bits of content harvested on the web to make it read like a story.

Of course Cpedia is only in Alpha and it will get better. In fact, some articles have amazing narrative and structure. The article found by searching for "Winston Churchill" (http://www.cpedia.com/search?q=winston+churchill) reads almost as if human had edited it.

Where the system fails though is not in technology or even the relevance of the narrative, but in the artifice of creating traditional articles from indexed web content. It assumes that the linear narrative of traditional media is the medium for disseminating information, when the medium of the web is clearly non-linear. As I read articles in the system I am frustrated by the inability to pivot or search from the snippets of referenced content. One snippet may catch my interest as information that I want to read more about, but with Cpedia there are no links from the snippets to the source or to related snippets. In exchange for narrative Cpedia ignores the true underlying power of the web, which is hyper-linking and attribution. This is a trade off we do not need to make.

Rather than imitating an encyclopedia Cpedia should use the "Narrative Algorithm" to invent a whole new paradigm. It should aim to be a travel guide - or "Lonely Planet" for the web that binds links to sources and topics around a narrative that refines itself as we navigate. What I really want is the ability to write the article in real-time based on the links I follow and the topics I pursue. The experience should be personal; the narrative should be our own.  

Cpedia is undoubtedly an important advance in semantic technology. But it is funny to me. First Cuil tried to best Google and failed. Now they are trying to best Wikipedia with Cpedia, which I assume will also fail. Yet they have created something unabashedly original - a "narrative algorithm" that could change the way we find and consume information on the web. Cpedia should not try and imitate but innovate - it should write a new chapter in the history of the web where we discover content through narrative. If one technology has been designed to rewrite the web, it is Cpedia. 

