Ruben Verborgh

The devil is in the details,
but the demons are in the semantics.

The lie of the API

When you ask for an API, you will get one, even though you never needed it.

Really, nobody takes your website serious anymore if you don’t offer an API. And that’s what everybody did: they got themselves a nice API. An enormous amount of money and energy is wasted on developing APIs that are hard to create and even harder to use. This is wonderful news for developers, who get paid to build two pieces of software—a server and a client—that were actually never needed in the first place. The API was there already: it’s your website itself. Shockingly, a majority of developers seems unable to embrace the Web and the important role URLs and hypermedia play on it. The lie called “API” has trapped many publishers, including the Digital Public Library of America and Europeana.

Imagine you’re a Web publisher. Pick any domain you like. You could be a museum opening up its collection to the public. You could be Amazon and make heaps of money selling books. Or you could be a team of meteorologists creating forecasts. Which one doesn’t matter, since you share one mission: making content accessible.

In the beginning, it was easy: you made whatever content you had available as HTML. This markup format lets you shape all of your content so that your visitors can comfortably read and navigate it. But as times change (technology changes fast), you notice that people are no longer the only potential consumers of your content. Increasingly, automated clients want access: another museum wants to list related objects from your collection, a book review site wants their visitors to buy books from you, or a tour operator wants to show tomorrow’s weather forecast on its site. And of course, they don’t want to hire people to update this every day, so they want to a use piece of software that does it automatically for them.

The problem is that software deals badly with unstructured content, so the people-friendly HTML does not work anymore. Machines rather process formats such as JSON or RDF. So this gives rise to a problem: how do we keep serving our content as HTML to people, but as JSON/RDF to machines?

Before we have time to think about a solution, our friends the developers have already found one. All they see is two computer systems that need to communicate, so their handbook (and wallet) screams API! An Application Programming Interface or API enables one piece of software to make use of another. So by adding an API to your website, they allow another piece of software (an automated client) to access your contents. And then they also write this automated client, so other developers can see how their API is used.

But wait… does this mean that the same content now has two different interfaces? Exactly, you’re right! If I’m a human, I use the browser interface to visit your as HTML. If I’m a machine, I use the API to visit your content as JSON. Oh, and if you’re rich, maybe you decide you want to offer additional machine-friendly formats as well. So the IT people can build you a third API. Even a fourth.

Unnecessary complexity is by far the worst mistake we can make in any design. ©Gigi C.

Unfortunately, you’ve been lied to. Or more precisely, you got what you asked for. Because I bet that you were the one who ultimately decided: yes, I need an API!
No, you don’t. It’s the same content, why would you need two interfaces, just because your consumers speak different languages?
Do you also exclusively phone your Spanish customers, e-mail Italian colleagues, and fax your Dutch friends? That would be really crazy and utterly unnecessary.
Yet it’s exactly what you’re doing if you add an API to your website.

Your website can serve people and automated clients in the exact same way.
And it’s not difficult. But let’s first look at how things went so horribly wrong.

Example: the DPLA API

The Digital Public Library of America (DPLA) provides access to millions of data records from all over the USA and makes machine-based access rather difficult.

Access for people

Accessing the website is quite easy: you just go to the URL of an object to visit it. For instance, here is a photo of the Apollo 11 launch. Its URL is:

http://dp.la/item/ecdafcf9b06be6efed042e40b3923e57

This single URL serves two purposes at once. On the one hand, it uniquely identifies a document about this photograph; on the other hand, it locates that document so your browser can find it. That’s the job of each URL. URLs are for identification and location of resources, real or virtual, anywhere in the world.

Access for machines

Now developers come in. It can’t be as easy as reusing this unique identifier, can it? Of course not, we first have to read the documentation.
Here are the steps you need to take:

  1. Request an API key.
  2. Receive an e-mail with this key.
  3. Find the right URL template for the “API call”.
  4. Fill out the details in the template to construct the URL.
  5. Open this URL.

This is the abridged version; I parsed the documentation for you.
So here is the URL to the JSON version of the Apollo photo record:

http://api.dp.la/v2/items/ecdafcf9b06be6efed042e40b3923e57?api_key=xxxxxxxxxxxx

Oh wait, I can’t share the JSON version with you. The URL contains my secret key, which I’m not allowed to disclose. There’s no way to link to a DPLA JSON version.

Example: the Europeana API

Europeana contains metadata from many large European institutions, and is plagued by similar problems.

Access for people

For people, it’s really really simple. Just use the URL:

http://www.europeana.eu/portal/record/90402/collectie_RP_P_1958_434.html

It identifies and locates a document about an item from the Rijksmuseum.

Access for machines

Well again… for machines, it couldn’t be as simple as just using the URL. The steps:

  1. Request an API key.
  2. Receive an e-mail with this key.
  3. Find the right URL template for the “API call”.
  4. Find out the Object ID of your object (hint: it’s somewhere in the URL).
  5. Fill out the details in the template to construct the URL.
  6. Open this URL.

Those 6 steps are the short version. It actually took me 27 steps.
But here is how you link to it:

http://europeana.eu/api/v2/record/90402/collectie_RP_P_1958_434.json?wskey=xxxxxxx

Well, actually not. API keys, you see.

What’s wrong with our APIs?

Now what is so terribly wrong with those APIs?
Okay, it might seem very complicated, but that’s just the way it is?
The point is that none of this complexity is necessary.

URLs that don’t identify are a lie

URLs are designed to identify concepts. They are all about identification: the document about the Apollo 11 launch photo is uniquely identified by its URL. By forcing different parties to use other identifiers, you break information exchange.
It’s like saying that French-speaking people cannot say “Barack Obama”, because English-speaking people already do. That wouldn’t make sense, yet APIs do it.

Furthermore, the above APIs do not allow to identify concepts for JSON clients. Each of them uses a different URL for the same resource because the URL includes information irrelevant to that resource: the API key.

API keys are a lie

Let’s think about this. What are the API keys supposed to shield off? Data access? If so, they do a terrible job, because the content’s HTML version is freely available.

Probably, they’re meant to shield off JSON access. But that’s also highly ineffective. If I really want to have unlimited JSON access, a developer can just write a script to convert the templated HTML into JSON. It’s not hard: the HTML version is open.

Plus, they make URLs completely useless, as they cannot be shared. In the case of Europeana, even the JSON document is unshareable, as it contains your API key.

Solution: use the Web’s API

There is only one correct solution: we have to stop building APIs. The one API we need is already there: your own website runs on top of the Web’s API called HTTP.

URLs identify concepts, and each URL can have multiple representations. This means that a single resource can be identified with one URL for all clients. Each client just indicates to the server whether it wants HTML or JSON or something else, and the server replies with a representation the client understands.

Futuristic? It’s not: it works already, and it’s really simple. Here is a designer chair from the Cooper-Hewitt museum:

http://collection.cooperhewitt.org/objects/35460799/

The cool thing is that machine clients use the same URL to access a JSON version:

curl http://collection.cooperhewitt.org/objects/35460799 -H "Accept: text/html"
curl http://collection.cooperhewitt.org/objects/35460799 -H "Accept: application/json"

Not only does this enable to share URLs between different parties, it also makes access really simple. I don’t have to read the manual. Instead, I just use the same interface I use every day: the URL. Works the same way everywhere.

This technique is called content negotiation and it is a characteristic of REST APIs.

Why did they get it wrong?

The interesting question here is why multi-million dollar projects like DPLA and Europeana got it wrong.

The first part, I think, is lack of knowledge. The DPLA website states:

This is called a RESTful approach to API design and is employed by the DPLA API.

However, an essential characteristic of a REST API is that you only have one interface for any client. So the mere fact that they have an additional API makes their design not follow the REST principles. This lack of knowledge comes from developers being all too familiar with the programming-oriented environment they usually work with, but mostly oblivious about the resource-oriented nature of the Web.
The Web is an information space, not a programming space.

The second part, I believe, is you get what you ask for. I imagine that developers were approached with the question “can you build an API?” And this is what they did.
But the question was wrong. It should have been: “can you add machine access?” That’s what we actually wanted all along, and an API is not the Web way to do that.

The result is that expensive APIs and clients have been built for both DPLA and Europeana, while simply adding support for JSON representations under the same URL would have been easier, cheaper, and long-lived. There’s no excuse.

The cartoon What the customer really needed tragically illustrates how things like this happen. While there is willingness to change, we’re stuck with expensive mistakes now. I mean… just try to use either the DPLA or Europeana API in your Web app. You cannot, because that would mean exposing your API key, which is prohibited.

The one thing it was supposed to do, it doesn’t do. That’s the lie of the API.
Never ask for an API. Ask for new representations on your existing URLs.
Only by embracing the information-oriented nature of the Web, we can provide sustainable access to our information for years to come.

Sustainable access to your data is the topic of the forthcoming book “Linked Data for Libraries, Archives and Museums” that Seth van Hooland and me will publish in May 2014. It tackles these and other actual subjects on metadata publication in great depth. We’ll be touring with it in Europe, North-America and Australia. Happy to speak at your institution!