Ruben Verborgh

The devil is in the details,
but the demons are in the semantics.

Ruben's blog

Towards Web-scale Web querying

published on

Most public SPARQL endpoints are down for more than a day per month. This makes it impossible to query public datasets reliably, let alone build applications on top of them. It’s not a performance issue, but an inherent architectural problem: any server offering resources with an unbounded computation time poses a severe scalability threat. The current Semantic Web solution to querying simply doesn’t scale. The past few months, we’ve been working on a different model of query solving on the Web. Instead of trying to solve everything at the server side—which we can never do reliably—we should build our servers in such a way that enables clients to solve queries efficiently.

My PhD on semantic hypermedia

published on

More than three years of research and several hundred pages of text later, I’m finally ready to defend my PhD. Why did I start this whole endeavor again? Well, I was—and still am—fascinated by the possibilities the Web has to offer, and working as a PhD student gives you the opportunity and the freedom to dive into the things you love. I wanted to make the Web more accessible for machines, so they can perform tasks in a more autonomous way. This brought me to the crossroads of Semantic Web and REST APIs: semantic hypermedia.

Apologies for cross-posting

published on

Apologizing is a polite and functional act of communication: it helps people to let go any negative sentiments you may have caused. However, communication is only effective when it is actually meant to help others, not to help yourself. We sometimes send messages out of habit, which strangely can give them the opposite effect than was intended by adopting that habit. Therefore, always think before you communicate to ensure you convey the right message.

Promiscuous promises

published on

Promises allow to asynchronously return a value from a synchronous function. If the return value is not known when the function exists, you can return a promise that will be fulfilled with that value later on. This comes in handy for JavaScript, which is often used for Web programming and thus has to rely on asynchronous return values: downloads, API calls, read/write operations, … In those cases, promises can make your code easier and thus better. This post briefly explains promises and then zooms in on the methods I used to create promiscuous, a small Promise implementation for JavaScript.

The lie of the API

published on

Really, nobody takes your website serious anymore if you don’t offer an API. And that’s what everybody did: they got themselves a nice API. An enormous amount of money and energy is wasted on developing APIs that are hard to create and even harder to use. This is wonderful news for developers, who get paid to build two pieces of software—a server and a client—that were actually never needed in the first place. The API was there already: it’s your website itself. Shockingly, a majority of developers seems unable to embrace the Web and the important role URLs and hypermedia play on it. The lie called “API” has trapped many publishers, including the Digital Public Library of America and Europeana.

Research is teamwork

published on

Research is a rewarding job. You get to work on a cool thing, communicate about it, travel around the world to demonstrate it to others… But most of all, you get the opportunity to work together with highly talented people, in ways that are impossible in industry. The International Semantic Web Conference reunited people working on future Web technology for the 12th year in a row, and I was very lucky to be there. Moreover, our MMLab team, together with the Web & Media Group of the VU, set a new record by winning the Best Demo Award two consecutive years. I’ve come to realize how important communicating and collaborating with people are for good research—simply invaluable.

Can I SPARQL your endpoint?

published on

SPARQL, the query language of the Semantic Web, allows clients to retrieve answers to complex questions. It’s a core technology in the Semantic Web Stack, as it enables flexible querying of Linked Data. If the Google search box is the entry to the human Web, a SPARQL query field is the entry to the machine Web. There’s only one slight problem: nobody seems able to keep a SPARQL endpoint up. Maybe the issue is so fundamental that more processing power cannot solve it.

Using OpenRefine: data are diamonds

published on

Data is often dubbed the new gold, but no label can be more wrong. It makes more sense to think about data as diamonds: highly valuable, but before they are of any use, they need intensive polishing. OpenRefine, the latest incarnation of Google Refine, is specifically designed to help you with this job. Until recently, getting started with OpenRefine was rather hard because the amount of functionality can overwhelm you. This prompted Max De Wilde and myself to write a book that will turn you into an OpenRefine expert.

One hammer for a thousand nails

published on

“When all you have is a hammer, every problem starts to look like a nail” is but one of the many wordings of the infamous Law of the Instrument. Many of us are blinkered by our tools, instantaneously choosing what we know best to solve a problem—even though it might not be the best solution to that problem. It doesn’t take long to end up with complex solutions for simple things. Fortunately, the more tools you master, the higher the chance you choose the right one. Thus, an extensive toolbox is exactly what I recommend.

Scientific posters are ineffective

published on

Dreaded scientific posters—if you attend conferences, you definitely saw them. They’re boring and ugly. On purpose. Because that’s what everybody does, right? The adjective scientific seems to imply that we should restrict our creativity. After all, content is king, and too much fanciness won’t get you anywhere? And the term poster is just because “abstract of 84cm × 119cm where you choose the colors” is too long? It’s this kind of reasoning that gets us nowhere.

Towards serendipitous Web applications

published on

Hyperlinks are the door handles of the Web, as they afford going to the next place you want to be. However, in a space as large as the Web, there is an awful lot of possible next places, so the webpage might not offer the door handle you are looking for. Luckily of course, there’s a thing called Google, but wouldn’t it be much more awesome if the links you need were already there on the page? Because right now, the author of the webpage has to make the decision where you can go, as he is the architect of the information. But should he also be the architect of the navigation or should that be you, the person surfing the Web?

Lightning-fast RDF in JavaScript

published on

Node.js has spawned a new, asynchronous generation of tools. Asynchronous thinking is different from traditional stream processing: instead of actively waiting for data in program routines, you write logic that acts when data arrives. JavaScript is an ideal language for that, because callback functions are lightweight. I have written a parser for Turtle, an RDF serialisation format, that uses asynchrony for maximal performance.

Affordances weave the Web

published on

What makes the Web more fascinating to read than any book? It’s not that the information is more reliable or people have become tired of the smell of paper. The exciting thing about consuming information on the Web is that you can keep clicking through for more. Hyperlinks have always been a source of endless curiosity. Few people realize that the hypertext concept actually far predates the Web. The idea that information itself could become an actionable entity has revolutionized our world and how we think.

Programming is an Art

published on

People who have programmed with me or have seen my open-source work on GitHub know that I put a lot of effort in my coding style. I indeed consider programming a creative act, which necessarily involves aesthetics. And then, some people consider aesthetics the enemy of the pragmatic: “don’t spend time writing beautiful code when you can write effective code”. However, I argue that my sense of beauty serves pragmatism much better, because it leads to more concise and maintainable code, and is thereby far more effective.

What Web agents want

published on

The iPhone’s Siri has given the world a glimpse of the digital personal assistant of the future. “Siri, when is my wife’s birthday?” or “Siri, remind me to pick up flowers when I leave here” are just two examples of things you don’t have to worry about anymore. However cool that is, Siri’s capabilities are not unlimited: unlike a real personal assistant, you can’t teach her new tricks. If you had a personal agent that could use the whole Web as its data source—instead of only specific parts—there would be no limits to what it could do. However, the Web needs some adjustments to make it agent-ready.

Asynchronous error handling in JavaScript

published on

Anything that can go wrong will go wrong, so we better prepare ourselves. The lessons we’ve been taught as programmers to nicely throw and catch excep­tions don’t apply anymore in asynchronous environments. Yet asynchronous programming is on the rise, and things still can and therefore will go wrong. So what are your options to defend against errors and graciously inform the user when things didn’t go as expected? This post compares different asynchro­nous error handling tactics for JavaScript.

Everything is connected in strange ways

published on

What’s the connection between the Eiffel Tower and the Big Ben? How are you related to Mickey Mouse? Or Elvis Presley? Today, there’s a fun way to find out: Multimedia Lab’s new Web app Everything is Connected allows you to see how any two topics in this world connect. Choose a start topic (this might be you!) and watch an on-the-fly video that takes you to any destination topic you select. You’ll be amazed to discover how small the world we live in really is. In this post, I’ll take you behind the scenes of this fascinating app.

Social media as spotlight on your research

published on

As researchers, communication is arguably the most important aspect of our job, but unfortunately not always the most visible. Sometimes, our work is so specific that it seems impossible to share it as a story with the outside world. Surprisingly, day-to-day social media such as Facebook and Twitter can be highly effective to give your work the attention it deserves. To achieve this, researchers must become conscious social media users who engage in every social network with a purpose—and a plan.

The object-resource impedance mismatch

published on

Most programmers are not familiar with resource-oriented architectures, and this unfamiliarity makes them resort to things they know. This is why we often see URLs that have action names inside of them, while they actually shouldn’t. Indeed, URLs are supposed to identify resources, and HTTP defines the verbs we can use to view and manipulate the state of those resources. Evidently, there is quite a mismatch between imperative (object-oriented) languages and HTTP’s resources-and-representations model. What would happen if we think the other way round and model HTTP methods in an imperative programming language?

Perl and the Preikestolen

published on

If I wanted to join the Oslo Perl Mongers for an RDF hackaton, Kjetil Kjernsmo asked me two months ago. We had met at the LAPIS workshop in Greece, where he showed me the open source work he had been doing. “Sure, I’d love to join”, I replied, “but there’s only a minor problem—I don’t know Perl!” Turns out there was nothing to worry about: learning Perl is easy, and the community embraces newcomers. Plus, the hackaton was located near a beautiful mountain landscape in Norway. Needless to say, I had a splendid week.

REST, where's my state?

published on

HTTP, the Hypertext Transfer Protocol, has been designed under the constraints of the REST architectural style. One of the well-known constraints of this REpresen­tational State Transfer style is that communication must be stateless. Why was this particular constraint introduced? And who is in charge then of maintaining state, since it is clearly necessary for many Web applications? This post explains how statelessness works on today’s Web, explaining the difference between application state and resource state.

JavaScript module loaders: necessary evil?

published on

Modules—we need them when projects go large, and to reuse work from others. Every programming language offers a way to partition code in reusable chunks. Some of them, such as C and Ruby, provide explicit mechanisms for this. JavaScript, on the other hand, leaves the modularization to the programmer, and the Asynchronous Module Definition (AMD) API is one way to achieve this. However, does AMD really offer the final solution to JavaScript modularization?

GET doesn't change the world

published on

Recently, I wanted to offer my visitors the option to add any of my publications to their Mendeley paper library. When creating the “add to Mendeley” links, I noticed that papers got added without asking the visitor for a confirmation. Then I wondered: could I exploit this to trick people into adding something to their Mendeley library without their consent? Turns out I could, and here is why: Mendeley did not honor the safeness property of the HTTP GET method.

Selling a story in one minute

published on

In my hometown Ghent, an exciting contest took place: PhD students could to send in a one-minute video about their research. Winners get to give a talk at TEDxGhent, a local edition of the famous TED conferences. I badly wanted to participate, so I had to find an original and effective way of selling my message in one minute. My goals: tease the audience, entertain the audience, and, ultimately, activate them to vote.