[Profile picture of Ruben Verborgh]

Ruben Verborgh

Ruben’s Blog

On my blog, I write unreviewed opinion pieces in which I share my views on digital technology and research. The views below are mine and mine only.

For my peer-reviewed work, check out my articles and full publication list instead.

No more raw data

Data without context is meaningless; data without trust is useless. 2017-12-18 is nothing but a stringuntil it becomes a birthdate, a wedding, or the moment a security camera registered you. Handling such highly personal data requires trust. When your personal data is shared with someone, you must be able to trust that they will only use it in the way you agreed to. When someone receives your data, they must be able to trust that it is correct and that they are allowed to use it for the intended purpose. Auditors need to be able to challenge and verify this trust relationship, for example under GDPR. These everyday scenarios highlight that data ecosystems need trust as an integral part of their DNA. Unfortunately, trust is not baked into our data interfaces today: they only provide access to the raw data, disregarding the context that is crucial to its correct treatment. We need to standardize interfaces that carry data in a trust envelope, which encapsulates usage policies and provenance to ensure that data can flow in more responsible ways. In this blog post, I explore how this can work, and why they are a necessary change in the way we exchange personal and other data.

Read more…

Let’s talk about pods

Who decides what your Solid pod looks like? For a long time, the answer has been the first one who writes, decides. That is, the first app to sculpt documents and containers in your pod determines where other apps need to look for data. Unfortunately, this creates an undesired dependency between apps, which now have to agree amongst each other on how to store things. Yet Solid promises apps that will seamlessly and independently reuse data in order to provide us with better and safer experiences. At the heart of this contradiction is that the mental model we’re using for Solid pods no longer works. This model restricts our solution space and is a main reason why apps struggle to reuse each other’s data. In this blog post, I argue why we should stop thinking of a pod as a set of documents, and start treating it as the hybrid graph it actually is. By adjusting our perspective, Solid apps can become more independent of variations in data—and thus more powerful for us.

Read more…

Reflections of knowledge

Web services emerged in the late 1990s as a way to access specific pieces of remote functionality, building on the standards-driven stability brought by the universal protocol that HTTP was readily becoming. Interestingly, the Web itself has drastically changed since. During an era of unprecedented centralization, almost all of our data relocated to remote systems, which appointed Web APIs as the exclusive gateways to our digital assets. While the legal and socio-economic limitations of such Big Data systems began painfully revealing themselves, the window of opportunity for decentralized data ecosystems opened up wider than ever before. The knowledge graphs of the future are already emerging today, and they’ll be so massively large and elusive that they can never be captured by any single system—and hence impossibly be exposed through any single API. This begs the question of how servers can provide flexible entry points into this emerging Web-shaped knowledge ecosystem, and how clients can sustainably interact with them. This blog post describes the upcoming shift from API integration to data integration, why we assume the latter is the easier problem, and what it fundamentally means to channel abstract knowledge through concrete Web APIs.

Read more…

A data ecosystem fosters sustainable innovation

We’re living in a data-driven economy, and that won’t change anytime soon. Companies, start-ups, organizations, and governments all require some of our data to provide us with the services we want and need. Unfortunately, decades of Big Data thinking has led many companies to a consequential fallacy: the belief that they need to harvest and maintain that personal data themselves in order to deliver their services and thus survive in the data-driven economy. This prompted a never-ending rat race, dominated by a handful of large players and driven by a deeply flawed notion of “winning”, with as a result that most people and companies collectively end up losing much more than they put in. Pointless data greed has falsified competition and stifled innovation from the moment data collection became more important than quality of experience. A way out of this dead end is to put people fully in control of their own data by equipping them with a personal data vault. Vaults enable us to break the standstill, as they re-level the playing field by giving all parties equal chances to access data under people’s control. Halting data harvesting is, paradoxically, how companies can leverage more data towards their services instead of less. Yet they won’t own that data—and in a sustainable ecosystem, there’s no need to. In this post, I dive into the surprising economics of an overdue data revolution.

Read more…

Shaping Linked Data apps

Ever since Ed Sheeran’s 2017 hit, I just can’t stop thinking about shapes. It’s more than the earworm though: 2017 is the year in which I got deeply involved with Solid, and also when the SHACL recommendation for shapes was published. The problem is a very fundamental one: Solid promises the separation of data and apps, so we can choose our apps independently of where we store our data. The apps you choose will likely be different from mine, yet we want to be able to interact with each other’s data. Building such decentralized Linked Data apps necessitates a high level of interoperability, where data written by one app needs to be picked up by another. Rather than relying on the more heavy Semantic Web machinery of ontologies, I believe that shapes are the right way forward—without throwing the added value of links and semantics out of the window. In this post, I will expand on the thinking that emerged from working with Tim Berners-Lee on the Design Issue on Linked Data shapes, and sketch the vast potential of shapes for tackling crucial problems in flexible ways.

Read more…

Designing a Linked Data developer experience

While the Semantic Web community was fighting its own internal battles, we failed to gain traction with the people who build apps that are actually used: front-end developers. Ironically, Semantic Web enthusiasts have failed to focus on the Web; whereas our technologies are delivering results in specialized back-end systems, the promised intelligent end-user apps are not being created. Within the Solid ecosystem for decentralized Web applications, Linked Data and Semantic Web technologies play a crucial role. Working intensely on Solid the past year, I realized that designing a fun developer experience will be crucial to its success. Through dialogue with front-end developers, I created a couple of JavaScript libraries for easy interaction with complex Linked Datawithout having to know RDF. This post introduces the core React components for Solid along with the LDflex query language, and lessons learned from their design.

Read more…

Paradigm shifts for the decentralized Web

Most Web applications today follow the adage “your data for my services”. They motivate this deal from both a technical perspective (how could we provide services without your data?) and a business perspective (how could we earn money without your data?). Decentralizing the Web means that people gain the ability to store their data wherever they want, while still getting the services they need. This requires major changes in the way we develop applications, as we migrate from a closed back-end database to the open Web as our data source. In this post, I discuss three paradigm shifts a decentralized Web brings, demonstrating that decentralization is about much more than just controlling our own data. It is a fundamental rethinking of the relation between data and applications, which—if done right—will accelerate creativity and innovation for the years to come.

Read more…

Truth takes time

Newspapers everywhere were quick to blame social media for some of 2016’s more surprising political events. However, filter bubbles, echo chambers, and unsubstiantiated claims are as old as humanity itself, so Facebook and friends have at most acted as amplifiers. The real mystery is that, given our access to unprecedented technological means to escape those bubbles and chambers, we apparently still prefer convenient truths over a healthy diet of various information sources. Paradoxically, in a world where the Web connects people more closely than ever, its applications are pushing us irreconcilably far apart. We urgently need to re-invest in decentralized technologies to counterbalance the monopolization of many facets of the Web. Inevitably, this means trading some of the omnipresent luxuries we’ve grown accustomed to for forgotten basic features we actually need most. This is a complex story about the relationship between people and knowledge technology, the eye of the beholder, and how we cannot let a handful of companies act as the custodians of our truth.

Read more…

Use the Web instead

Few things annoy me more than a random website asking me: “do you want to use the app instead?” Of course I don’t want to—that’s why I use your website. There are people who like apps and those who don’t, but regardless of personal preferences, there’s a more important matter. The increasing cry of apps begging to invade—literally—our personal space undermines some of the freedoms for which we have long fought. The Web is the first platform in the history of mankind that allows us to share information and order services through a single program: a browser. Apps gladly circumvent this universal interface, replacing it with their own custom environment. Is it really the supposedly better user experience that pushes us towards native apps, or are there other forces at work?

Read more…

Querying history with Linked Data

Data on the World Wide Web changes at the speed of light—today’s facts are tomorrow’s history. This makes the ability to look back important: how do facts grow and change over time? It gets even more interesting when we zoom out beyond individual facts: how do answers to questions evolve when data ages? With Linked Data, we are used to query the latest version of information, because updating a SPARQL endpoint is easier than maintaining every historical version. With the lightweight Triple Pattern Fragments interface, it becomes very easy for a server to host multiple versions. Using the Memento framework to switch between versions based on a timestamp, your browser can evaluate SPARQL queries over any point in time. We tried this with DBpedia—and so can you!

Read more…

Turtles all the way down

How can we ever talk about intelligent clients if we don’t provide them with opportunities to be intelligent? The current generation of RDF APIs is patronizing its clients by only describing its data in RDF. This contrasts to websites for humans, where data would be quite useless if it were not accompanied by context and controls. By omitting these, we withhold basic information from clients, like “what’s in this response?” and “where can I go next?”. This post proposes to extend the power of self-descriptiveness from data to API responses as a whole. Using RDF graphs, we can combine data, context, and controls in one response. RDF APIs need to become like websites, explaining clients where they are and what they can do.

Read more…

Federated SPARQL queries in your browser

Querying multiple sources reveals the full potential of Linked Data by combining data from heterogeneous origins into a consistent result. However, I have to admit that I had never executed a federated query before. Executing regular SPARQL queries is relatively easy: if the endpoint is up, you can just post your query there. But where do I post my query if there are multiple endpoints, and will they communicate to evaluate that query? Or do I have to use a command-line tool? We wanted federated queries to be as accessible as anything else on the Web, so our federated Triple Pattern Fragments engine runs in your browser. At last, multiple Linked Data sources can be queried at once, at very low server-side cost.

Read more…

Fostering intelligence by enabling it

In a couple of months, 15 years will have passed since Tim Berners-Lee, Jim Hendler, and Ora Lassila wrote the Scientific American article “The Semantic Web”. It’s hard to imagine that, another 15 years before this, the Web didn’t even exist. The article talks heavily about agents, which would use the Web to do things for people. Somehow, somewhere, something went terribly wrong: the same time needed for the Web to liberate the world has hardly been sufficient for the Semantic Web to reach any adoption. And still, there are no agents, nor are there any signs that we will see them in the near future. Where should we even start?

Read more…

600,000 queryable datasets—and counting

What good is a Web full of Linked Data if we cannot reliably query it? Whether we like to admit it or not, queryable data is currently the Semantic Web’s Achilles’ heel. The Linked Data cloud contains several high-quality datasets with a total of billions of triples, yet most of that data is only available in downloadable form. Frankly, this doesn’t make any sense on the Web. After all, would you first download Wikipedia in its entirety just to read a single article? Probably not! We combined the power of the LOD Laundromat, a large-scale data cleansing apparatus, with the low-cost Triple Pattern Fragments interface so you can once and for all query the Web.

Read more…

Thank you for your attention

Talks at academic conferences seldom feature a high knowledge per minute ratio. Speakers often talk for themselves, unwittingly spawning facts that are not directly useful to their audience. For me, the most symptomatic aspect is the obligatory “thank you for your attention” at the end of a talk. Think about what you’re saying. Was your talk so bad that people had to do you an actual favor by paying attention? We’ve got this whole thing backwards. You are one of the people the audience paid for to see. They should be thanking you for doing a great job—provided of course, that you really do the best you can to help them understand.

Read more…

The Year of the Developers

The Semantic Web is plagued by various issues, one rather prominent fact being that few people actually heard about it. If you ask me, it’s because we have been focusing almost exclusively on research lately, which is quite odd. After all—no matter how good the research is—eventually, code is at the core of all Web systems. Why is it that we have been selectively deaf and blind for those who build what we need most: actual applications that use Linked Data in the real world? Fortunately, the first Semantic Web Developers Workshop found a very passionate audience. We need more of this, and we need it now.

Read more…

Bringing fast triples to Node.js with HDT

Reading a selection from a large dataset of triples is an important use case for the Semantic Web. But files in textual formats such as Turtle become too slow as soon as they contain a few thousand triples, and triple stores are often too demanding, since they need to support write informations. The HDT (Header Dictionary Triples) binary RDF format offers fast, read-only access to triples in large datasets. Until recently, this power was only available in Java and C++, so I decided it was high time to port it to Node.js as well ;-)

Read more…

Writing a SPARQL parser in JavaScript

If we want to make the Semantic Web more webby, we need software in the Web’s main language JavaScript. The support for the SemWeb’s data format RDF has been quite good in JavaScript, with several libraries supporting Turtle and JSON-LD. However, proper support for the SPARQL query language has been lacking so far, especially for the latest version SPARQL 1.1. Since I need SPARQL for several of my projects, such as Linked Data Fragments, I wrote a proper SPARQL parser myself. It is created through the Jison parser generator and converts SPARQL into JSON. Its main design goals are completeness, maintainability, and small size.

Read more…

Reviewers shouldn’t hide their name

Peer review is research’ most powerful instrument. Having your manuscript reviewed by independent researchers in your own field improves the odds that your published work is valid—and valuable. The drawback of this mechanism is that many researchers are often on reviewer duty; I find myself reviewing several papers a month. It’s not hard to imagine that sloppiness can creep in sometimes… And sadly, there are not a lot of ways to prevent this: reviews in the research community remain largely anonymous. This means that, if a reviewer has a bad day or doesn’t want to read a paper with their full attention, they cannot be held accountable for that. If you have written a grounded opinion, why don’t you put your name on it?

Read more…

A hands-on Linked Data book for people

The Linked Data hype is surrounded by questions, and most of those questions are only answered from the technology perspective. Such answers often insufficiently address the needs of people who just want to publish their data. Practitioners from libraries, archives and museums all over the world have very valuable data that they would love to share, but they often don’t find the right practical guidance to do this. Our new handbook Linked Data for Libraries, Archives and Museums changes that. We wrote it for non-technical people, by combining clear explanations with hands-on case studies.

Read more…

The Pragmantic Web

Like any technological or scientific community, optimism in the beginning years of the Semantic Web was high. Artificial intelligence researchers in the 1960s believed it would be a matter of years before machines would become better at chess than humans, and that machines would seamlessly translate texts from one language into another. Semantic Web researchers strongly believed in the intelligent agents vision, but along the way, things turned out more difficult. Yet people still seem to focus on trying to solve the complex problems, instead of tackling simple ones first. Can we be more pragmatic about the Semantic Web? As an example, this post zooms in on the SemWeb’s default answer to querying and explains why starting with simple queries might just be a better idea.

Read more…

WWW2014 and 25 years of Web

The yearly World Wide Web conferences are highlights for my research: every time again, the world’s most fascinating people meet to discuss novel ideas. This year’s edition moved to Seoul, and I happily represented Ghent University for the third time, together with my colleagues. In addition to hosting the WSREST2014 workshop, I presented Linked Data Fragments at LDOW2014. The combination of these workshops represents for me what is important to move the Web forward: flexible data and API access for automated clients.

Read more…

Towards Web-scale Web querying

Most public SPARQL endpoints are down for more than a day per month. This makes it impossible to query public datasets reliably, let alone build applications on top of them. It’s not a performance issue, but an inherent architectural problem: any server offering resources with an unbounded computation time poses a severe scalability threat. The current Semantic Web solution to querying simply doesn’t scale. The past few months, we’ve been working on a different model of query solving on the Web. Instead of trying to solve everything at the server side—which we can never do reliably—we should build our servers in such a way that enables clients to solve queries efficiently.

Read more…

My PhD on semantic hypermedia

More than three years of research and several hundred pages of text later, I’m finally ready to defend my PhD. Why did I start this whole endeavor again? Well, I wasand still am—fascinated by the possibilities the Web has to offer, and working as a PhD student gives you the opportunity and the freedom to dive into the things you love. I wanted to make the Web more accessible for machines, so they can perform tasks in a more autonomous way. This brought me to the crossroads of Semantic Web and REST APIs: semantic hypermedia.

Read more…

Apologies for cross-posting

Apologizing is a polite and functional act of com­muni­cation: it helps people to let go any negative sentiments you may have caused. However, communication is only effective when it is actually meant to help others, not to help yourself. We sometimes send messages out of habit, which strangely can give them the opposite effect than was intended by adopting that habit. Therefore, always think before you communicate to ensure you convey the right message.

Read more…

Promiscuous promises

Promises allow to asynchronously return a value from a synchronous function. If the return value is not known when the function exists, you can return a promise that will be fulfilled with that value later on. This comes in handy for JavaScript, which is often used for Web programming and thus has to rely on asynchronous return values: downloads, API calls, read/write operations, … In those cases, promises can make your code easier and thus better. This post briefly explains promises and then zooms in on the methods I used to create promiscuous, a small Promise implementation for JavaScript.

Read more…

The lie of the API

Really, nobody takes your website serious anymore if you don’t offer an API. And that’s what everybody did: they got themselves a nice API. An enormous amount of money and energy is wasted on developing APIs that are hard to create and even harder to use. This is wonderful news for developers, who get paid to build two pieces of software—a server and a clientthat were actually never needed in the first place. The API was there already: it’s your website itself. Shockingly, a majority of developers seems unable to embrace the Web and the important role URLs and hypermedia play on it. The lie called “API” has trapped many publishers, including the Digital Public Library of America and Europeana.

Read more…

Research is teamwork

Research is a rewarding job. You get to work on a cool thing, communicate about it, travel around the world to demonstrate it to others… But most of all, you get the opportunity to work together with highly talented people, in ways that are impossible in industry. The International Semantic Web Conference reunited people working on future Web technology for the 12th year in a row, and I was very lucky to be there. Moreover, our MMLab team, together with the Web & Media Group of the VU, set a new record by winning the Best Demo Award two consecutive years. I’ve come to realize how important communicating and collaborating with people are for good research—simply invaluable.

Read more…

Can I SPARQL your endpoint?

SPARQL, the query language of the Semantic Web, allows clients to retrieve answers to complex questions. It’s a core technology in the Semantic Web Stack, as it enables flexible querying of Linked Data. If the Google search box is the entry to the human Web, a SPARQL query field is the entry to the machine Web. There’s only one slight problem: nobody seems able to keep a SPARQL endpoint up. Maybe the issue is so fundamental that more processing power cannot solve it.

Read more…

Using OpenRefine: data are diamonds

Data is often dubbed the new gold, but no label can be more wrong. It makes more sense to think about data as diamonds: highly valuable, but before they are of any use, they need intensive polishing. OpenRefine, the latest incarnation of Google Refine, is specifically designed to help you with this job. Until recently, getting started with OpenRefine was rather hard because the amount of functionality can overwhelm you. This prompted Max De Wilde and myself to write a book that will turn you into an OpenRefine expert.

Read more…

One hammer for a thousand nails

“When all you have is a hammer, every problem starts to look like a nail” is but one of the many wordings of the infamous Law of the Instrument. Many of us are blinkered by our tools, instantaneously choosing what we know best to solve a problemeven though it might not be the best solution to that problem. It doesn’t take long to end up with complex solutions for simple things. Fortunately, the more tools you master, the higher the chance you choose the right one. Thus, an extensive toolbox is exactly what I recommend.

Read more…

Scientific posters are ineffective

Dreaded scientific posters—if you attend conferences, you definitely saw them. They’re boring and ugly. On purpose. Because that’s what everybody does, right? The adjective scientific seems to imply that we should restrict our creativity. After all, content is king, and too much fanciness won’t get you anywhere? And the term poster is just because “abstract of 84cm × 119cm where you choose the colors” is too long? It’s this kind of reasoning that gets us nowhere.

Read more…

Towards serendipitous Web applications

Hyperlinks are the door handles of the Web, as they afford going to the next place you want to be. However, in a space as large as the Web, there is an awful lot of possible next places, so the webpage might not offer the door handle you are looking for. Luckily of course, there’s a thing called Google, but wouldn’t it be much more awesome if the links you need were already there on the page? Because right now, the author of the webpage has to make the decision where you can go, as he is the architect of the information. But should he also be the architect of the navigation or should that be you, the person surfing the Web?

Read more…

Lightning-fast RDF in JavaScript

Node.js has spawned a new, asynchronous generation of tools. Asynchronous thinking is different from traditional stream processing: instead of actively waiting for data in program routines, you write logic that acts when data arrives. JavaScript is an ideal language for that, because callback functions are lightweight. I have written a parser for Turtle, an RDF serialisation format, that uses asynchrony for maximal performance.

Read more…

Affordances weave the Web

What makes the Web more fascinating to read than any book? It’s not that the information is more reliable or people have become tired of the smell of paper. The exciting thing about consuming information on the Web is that you can keep clicking through for more. Hyperlinks have always been a source of endless curiosity. Few people realize that the hypertext concept actually far predates the Web. The idea that information itself could become an actionable entity has revolutionized our world and how we think.

Read more…

Programming is an Art

People who have programmed with me or have seen my open-source work on GitHub know that I put a lot of effort in my coding style. I indeed consider programming a creative act, which necessarily involves aesthetics. And then, some people consider aesthetics the enemy of the pragmatic: “don’t spend time writing beautiful code when you can write effective code”. However, I argue that my sense of beauty serves pragmatism much better, because it leads to more concise and maintainable code, and is thereby far more effective.

Read more…

What Web agents want

The iPhone’s Siri has given the world a glimpse of the digital personal assistant of the future. “Siri, when is my wife’s birthday?” or “Siri, remind me to pick up flowers when I leave here” are just two examples of things you don’t have to worry about anymore. However cool that is, Siri’s capabilities are not unlimited: unlike a real personal assistant, you can’t teach her new tricks. If you had a personal agent that could use the whole Web as its data source—instead of only specific parts—there would be no limits to what it could do. However, the Web needs some adjustments to make it agent-ready.

Read more…

Asynchronous error handling in JavaScript

Anything that can go wrong will go wrong, so we better prepare ourselves. The lessons we’ve been taught as programmers to nicely throw and catch exceptions don’t apply anymore in asynchronous environments. Yet asynchronous programming is on the rise, and things still can and therefore will go wrong. So what are your options to defend against errors and graciously inform the user when things didn’t go as expected? This post compares different asynchronous error handling tactics for JavaScript.

Read more…

Everything is connected in strange ways

What’s the connection between the Eiffel Tower and the Big Ben? How are you related to Mickey Mouse? Or Elvis Presley? Today, there’s a fun way to find out: Multimedia Lab’s new Web app Everything is Connected allows you to see how any two topics in this world connect. Choose a start topic (this might be you!) and watch an on-the-fly video that takes you to any destination topic you select. You’ll be amazed to discover how small the world we live in really is. In this post, I’ll take you behind the scenes of this fascinating app.

Read more…

Social media as spotlight on your research

As researchers, communication is arguably the most important aspect of our job, but unfortunately not always the most visible. Sometimes, our work is so specific that it seems impossible to share it as a story with the outside world. Surprisingly, day-to-day social media such as Facebook and Twitter can be highly effective to give your work the attention it deserves. To achieve this, researchers must become conscious social media users who engage in every social network with a purposeand a plan.

Read more…

The object-resource impedance mismatch

Most programmers are not familiar with resource-oriented architectures, and this unfamiliarity makes them resort to things they know. This is why we often see URLs that have action names inside of them, while they actually shouldn’t. Indeed, URLs are supposed to identify resources, and HTTP defines the verbs we can use to view and manipulate the state of those resources. Evidently, there is quite a mismatch between imperative (object-oriented) languages and HTTP’s resources-and-representations model. What would happen if we think the other way round and model HTTP methods in an imperative programming language?

Read more…

Perl and the Preikestolen

If I wanted to join the Oslo Perl Mongers for an RDF hackaton, Kjetil Kjernsmo asked me two months ago. We had met at the LAPIS workshop in Greece, where he showed me the open source work he had been doing. “Sure, I’d love to join”, I replied, “but there’s only a minor problem—I don’t know Perl!” Turns out there was nothing to worry about: learning Perl is easy, and the community embraces newcomers. Plus, the hackaton was located near a beautiful mountain landscape in Norway. Needless to say, I had a splendid week.

Read more…

REST, where’s my state?

HTTP, the Hypertext Transfer Protocol, has been designed under the constraints of the REST architectural style. One of the well-known constraints of this REpresen­tational State Transfer style is that communication must be stateless. Why was this particular constraint introduced? And who is in charge then of maintaining state, since it is clearly necessary for many Web applications? This post explains how statelessness works on today’s Web, explaining the difference between application state and resource state.

Read more…

GET doesn’t change the world

Recently, I wanted to offer my visitors the option to add any of my publications to their Mendeley paper library. When creating the “add to Mendeley” links, I noticed that papers got added without asking the visitor for a confirmation. Then I wondered: could I exploit this to trick people into adding something to their Mendeley library without their consent? Turns out I could, and here is why: Mendeley did not honor the safeness property of the HTTP GET method.

Read more…

Selling a story in one minute

In my hometown Ghent, an exciting contest took place: PhD students could to send in a one-minute video about their research. Winners get to give a talk at TEDxGhent, a local edition of the famous TED conferences. I badly wanted to participate, so I had to find an original and effective way of selling my message in one minute. My goals: tease the audience, entertain the audience, and, ultimately, activate them to vote.

Read more…