What Web agents want

We need to enhance the Web if we like smart agents to do things for us.

The iPhone’s Siri has given the world a glimpse of the digital personal assistant of the future. “Siri, when is my wife’s birthday?” or “Siri, remind me to pick up flowers when I leave here” are just two examples of things you don’t have to worry about anymore. However cool that is, Siri’s capabilities are not unlimited: unlike a real personal assistant, you can’t teach her new tricks. If you had a personal agent that could use the whole Web as its data source—instead of only specific parts—there would be no limits to what it could do. However, the Web needs some adjustments to make it agent-ready.

31 January 2013

The initial Semantic Web vision featured intelligent software agents that do things for us using the Web. Just like Siri, they can answer your questions and meet your demands. The difference is that these agents would not be pre-programmed: they’d somehow be intelligent enough to browse the Web by themselves and employ the information on there, just like we humans do. I see three main conditions that are necessary for such agents to become reality one day.

Digital personal assistants that can browse the Web possess an fascinating potential. ©Stéfan

Machine-readable resources

Machines don’t understand natural language yet. Moravec’s paradox says that computers are exceptionally good in things humans are not, but perform poorly on tasks wherein humans excel. The hard is easy and the easy is hard. For now, we therefore have to make information machine-readable if we want our agents to understand it. There are essentially two ways to do this.

On the one hand, we can augment human representations with annotations. For example, text inside HTML documents can be annotated with RDFa or HTML5 microdata. A problem here is that many of the large industry players are each forcing their own vocabulary upon content creators. Google, Bing and Yahoo! want you to use Schema.org, while Facebook promotes Open Graph and Twitter recently launched Twitter Cards. If you want their agents to understand your website, you must express the same content in all three different vocabularies… This defeats the whole idea of semantic technologies.

On the other hand, we can supply different representations for machines. If humans request an item, you give them HTML; if machines request something, you give them RDF or JSON (or even JSON-LD). The benefit of separation is that each client only receives what he consumes. After all, it’s either a human or a machine that requests the information, not both. Then, the problem is: how to serve the same thing differently? That’s what the uniform interface is for.

A uniform interface

Agents cannot pick up clues from Web pages the way we do, so it is important they have a simple interface to access Web content. The uniform interface is not a new idea. In fact, it has always been part of the foundation of the Web: the HyperText Transfer Protocol (HTTP). In his doctoral thesis, HTTP specification co-author Roy Fielding describes four elements that contribute to the uniform interface.

Identification of resources: The Web consists of resources (not of services or methods, like in programming languages), and they are identified by a URL. Every URL identifies at most one resource, and that mapping must never change. The resource itself, however, can change: “the current weather” is a valid resource, while it varies every hour. Thinking in resources can be very hard at first for developers.
Manipulation through representations: Unlike in several other distributed systems, resources are not manipulated directly in HTTP. Instead, clients manipulate representations of those resources. This is crucial, because it enables humans and machines to access the same resources, only through a different representation. Humans read an HTML representation of the news, while software agents will prefer RDF or JSON.
Self-descriptive messages: Messages sent over an HTTP connection should be self-descriptive: no other messages should be necessary to understand their content. One aspect is that only a small, standardized set of methods is used, whose semantics should be respected. Another aspect is that communication should be stateless. This enables agents to easily predict the effects of their actions.
Hypermedia as the engine of application state: The Web owes much of its success to the power of hyperlinks. We can navigate Web pages we’ve never seen before by just clicking around. However, many machine-oriented representations currently lack an equivalent. This makes it much more difficult for agents to find their way around. Therefore, it is essential that machine-targeted representations also include hyperlinks.

The above four principles are essential to the REpresentational State Transfer (REST) architectural style, and in my opinion necessary conditions for an agent-enabled Web, because they simplify interactions and make them predictable.

Semantics of change

However, the uniform interface can help agents only so far. The effects of resource manipulations are highly application-specific and cannot be predicted by a single set of rules. Siri’s method of overcoming this is to only interface with a limited set of Web APIs whose behavior is pre-programmed. If we want agents to access the full power of the Web, they need to work with APIs they’re not programmed for.

While RDF and JSON offer machine-readable ways to describe information, they don’t have the capability to describe dynamic actions. What happens if resources are changed or new ones are created? Part of my research focuses on RESTdesc, a language explaining to agents what a particular Web API does and how it can be accessed. Combined with machine-readable data and the uniform interface, describing dynamics lets agents surf the Web to places they’ve never been before.

There is only one Web

Unfortunately, we’re not there yet. Most information on the Web is not fully machine-accessible (although various organizations are trying hard), the uniform interface is blatantly violated by nearly all Web services, and I don’t know any public Web API that describes its functionality in a way agents understand. However, if we look back at the tremendous growth of the Web so far, we can say with confidence that everything is possible.

The most important lesson is that the Web for agents is not a different Web. It’s just another dimension to the World Wide Web. Or in the words of Tim Berners-Lee et al.:

The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation.

And yet even more famous words by the same man: