We’re living in a data-driven economy, and that won’t change anytime soon. Companies, start-ups, organisations, and governments all require some of our data to provide us with the services we want and need. Unfortunately, decades of Big Data thinking has led many companies to a consequential fallacy: the belief that they need to harvest and maintain that personal data themselves in order to deliver their services and thus survive in the data-driven economy. This prompted a never-ending rat race, dominated by a handful of large players and driven by a deeply flawed notion of “winning”, with as a result that most people and companies collectively end up losing much more than they put in. Pointless data greed has falsified competition and stifled innovation from the moment data collection became more important than quality of experience. A way out of this dead end is to put people fully in control of their own data by equipping them with a personal data vault. Vaults enable us to break the standstill, as they re-level the playing field by giving all parties equal chances to access data under people’s control. Halting data harvesting is, paradoxically, how companies can leverage more data towards their services instead of less. Yet they won’t own that data—
Ever since Ed Sheeran’s 2017 hit, I just can’t stop thinking about shapes. It’s more than the earworm though: 2017 is the year in which I got deeply involved with Solid, and also when the SHACL recommendation for shapes was published. The problem is a very fundamental one: Solid promises the separation of data and apps, so we can choose our apps independently of where we store our data. The apps you choose will likely be different from mine, yet we want to be able to interact with each other’s data. Building such decentralized Linked Data apps necessitates a high level of interoperability, where data written by one app needs to be picked up by another. Rather than relying on the more heavy Semantic Web machinery of ontologies, I believe that shapes are the right way forward—
Most Web applications today follow the adage “your data for my services”. They motivate this deal from both a technical perspective (how could we provide services without your data?) and a business perspective (how could we earn money without your data?). Decentralizing the Web means that people gain the ability to store their data wherever they want, while still getting the services they need. This requires major changes in the way we develop applications, as we migrate from a closed back-end database to the open Web as our data source. In this post, I discuss three paradigm shifts a decentralized Web brings, demonstrating that decentralization is about much more than just controlling our own data. It is a fundamental rethinking of the relation between data and applications, which—
Newspapers everywhere were quick to blame social media for some of 2016’s more surprising political events. However, filter bubbles, echo chambers, and unsubstiantiated claims are as old as humanity itself, so Facebook and friends have at most acted as amplifiers. The real mystery is that, given our access to unprecedented technological means to escape those bubbles and chambers, we apparently still prefer convenient truths over a healthy diet of various information sources. Paradoxically, in a world where the Web connects people more closely than ever, its applications are pushing us irreconcilably far apart. We urgently need to re-invest in decentralized technologies to counterbalance the monopolization of many facets of the Web. Inevitably, this means trading some of the omnipresent luxuries we’ve grown accustomed to for forgotten basic features we actually need most. This is a complex story about the relationship between people and knowledge technology, the eye of the beholder, and how we cannot let a handful of companies act as the custodians of our truth.
Few things annoy me more than a random website asking me: “do you want to use the app instead?” Of course I don’t want to—
Data on the World Wide Web changes at the speed of light—
How can we ever talk about intelligent clients if we don’t provide them with opportunities to be intelligent? The current generation of RDF APIs is patronizing its clients by only describing its data in RDF. This contrasts to websites for humans, where data would be quite useless if it were not accompanied by context and controls. By omitting these, we withhold basic information from clients, like “what’s in this response?” and “where can I go next?”. This post proposes to extend the power of self-descriptiveness from data to API responses as a whole. Using RDF graphs, we can combine data, context, and controls in one response. RDF APIs need to become like websites, explaining clients where they are and what they can do.
Querying multiple sources reveals the full potential of Linked Data by combining data from heterogeneous origins into a consistent result. However, I have to admit that I had never executed a federated query before. Executing regular SPARQL queries is relatively easy: if the endpoint is up, you can just post your query there. But where do I post my query if there are multiple endpoints, and will they communicate to evaluate that query? Or do I have to use a command-line tool? We wanted federated queries to be as accessible as anything else on the Web, so our federated Triple Pattern Fragments engine runs in your browser. At last, multiple Linked Data sources can be queried at once, at very low server-side cost.
In a couple of months, 15 years will have passed since Tim Berners-Lee, Jim Hendler, and Ora Lassila wrote the Scientific American article “The Semantic Web”. It’s hard to imagine that, another 15 years before this, the Web didn’t even exist. The article talks heavily about agents, which would use the Web to do things for people. Somehow, somewhere, something went terribly wrong: the same time needed for the Web to liberate the world has hardly been sufficient for the Semantic Web to reach any adoption. And still, there are no agents, nor are there any signs that we will see them in the near future. Where should we even start?
What good is a Web full of Linked Data if we cannot reliably query it? Whether we like to admit it or not, queryable data is currently the Semantic Web’s Achilles’ heel. The Linked Data cloud contains several high-quality datasets with a total of billions of triples, yet most of that data is only available in downloadable form. Frankly, this doesn’t make any sense on the Web. After all, would you first download Wikipedia in its entirety just to read a single article? Probably not! We combined the power of the LOD Laundromat, a large-scale data cleansing apparatus, with the low-cost Triple Pattern Fragments interface so you can once and for all query the Web.
Talks at academic conferences seldom feature a high knowledge per minute ratio. Speakers often talk for themselves, unwittingly spawning facts that are not directly useful to their audience. For me, the most symptomatic aspect is the obligatory “thank you for your attention” at the end of a talk. Think about what you’re saying. Was your talk so bad that people had to do you an actual favor by paying attention? We’ve got this whole thing backwards. You are one of the people the audience paid for to see. They should be thanking you for doing a great job—
Ever looked up a person in an encyclopedia without knowing whether it was a man or a woman? And if you did, was it explicitly mentioned in the article? I’m guessing the answer two both questions is “no”. Gender is of course not that important; we’re interested in people for what they do. Yet at the same time, this particular piece of information is so trivial and obvious that we often just don’t mention it. This means that machines, which require explicit instruction, have no way to determine this elementary fact. Therefore, it’s hard to study even simple statistics in an automated way. This is why the Dutch DBpedia chapter had asked me to experiment with gender extraction for people, based on their Wikipedia pages.
The Semantic Web is plagued by various issues, one rather prominent fact being that few people actually heard about it. If you ask me, it’s because we have been focusing almost exclusively on research lately, which is quite odd. After all—
Reading a selection from a large dataset of triples is an important use case for the Semantic Web. But files in textual formats such as Turtle become too slow as soon as they contain a few thousand triples, and triple stores are often too demanding, since they need to support write informations. The HDT (Header Dictionary Triples) binary RDF format offers fast, read-only access to triples in large datasets. Until recently, this power was only available in Java and C++, so I decided it was high time to port it to Node.js as well ;-)
Peer review is research’ most powerful instrument. Having your manuscript reviewed by independent researchers in your own field improves the odds that your published work is valid—
The Linked Data hype is surrounded by questions, and most of those questions are only answered from the technology perspective. Such answers often insufficiently address the needs of people who just want to publish their data. Practitioners from libraries, archives and museums all over the world have very valuable data that they would love to share, but they often don’t find the right practical guidance to do this. Our new handbook Linked Data for Libraries, Archives and Museums changes that. We wrote it for non-technical people, by combining clear explanations with hands-on case studies.
Like any technological or scientific community, optimism in the beginning years of the Semantic Web was high. Artificial intelligence researchers in the 1960s believed it would be a matter of years before machines would become better at chess than humans, and that machines would seamlessly translate texts from one language into another. Semantic Web researchers strongly believed in the intelligent agents vision, but along the way, things turned out more difficult. Yet people still seem to focus on trying to solve the complex problems, instead of tackling simple ones first. Can we be more pragmatic about the Semantic Web? As an example, this post zooms in on the SemWeb’s default answer to querying and explains why starting with simple queries might just be a better idea.
The yearly World Wide Web conferences are highlights for my research: every time again, the world’s most fascinating people meet to discuss novel ideas. This year’s edition moved to Seoul, and I happily represented Ghent University for the third time, together with my colleagues. In addition to hosting the WS‑REST2014 workshop, I presented Linked Data Fragments at LDOW2014. The combination of these workshops represents for me what is important to move the Web forward: flexible data and API access for automated clients.
Most public SPARQL endpoints are down for more than a day per month. This makes it impossible to query public datasets reliably, let alone build applications on top of them. It’s not a performance issue, but an inherent architectural problem: any server offering resources with an unbounded computation time poses a severe scalability threat. The current Semantic Web solution to querying simply doesn’t scale. The past few months, we’ve been working on a different model of query solving on the Web. Instead of trying to solve everything at the server side—
More than three years of research and several hundred pages of text later, I’m finally ready to defend my PhD. Why did I start this whole endeavor again? Well, I was—
Apologizing is a polite and functional act of communication: it helps people to let go any negative sentiments you may have caused. However, communication is only effective when it is actually meant to help others, not to help yourself. We sometimes send messages out of habit, which strangely can give them the opposite effect than was intended by adopting that habit. Therefore, always think before you communicate to ensure you convey the right message.
Really, nobody takes your website serious anymore if you don’t offer an API. And that’s what everybody did: they got themselves a nice API. An enormous amount of money and energy is wasted on developing APIs that are hard to create and even harder to use. This is wonderful news for developers, who get paid to build two pieces of software—
Research is a rewarding job. You get to work on a cool thing, communicate about it, travel around the world to demonstrate it to others… But most of all, you get the opportunity to work together with highly talented people, in ways that are impossible in industry. The International Semantic Web Conference reunited people working on future Web technology for the 12th year in a row, and I was very lucky to be there. Moreover, our MMLab team, together with the Web & Media Group of the VU, set a new record by winning the Best Demo Award two consecutive years. I’ve come to realize how important communicating and collaborating with people are for good research—simply invaluable.
SPARQL, the query language of the Semantic Web, allows clients to retrieve answers to complex questions. It’s a core technology in the Semantic Web Stack, as it enables flexible querying of Linked Data. If the Google search box is the entry to the human Web, a SPARQL query field is the entry to the machine Web. There’s only one slight problem: nobody seems able to keep a SPARQL endpoint up. Maybe the issue is so fundamental that more processing power cannot solve it.
Data is often dubbed the new gold, but no label can be more wrong. It makes more sense to think about data as diamonds: highly valuable, but before they are of any use, they need intensive polishing. OpenRefine, the latest incarnation of Google Refine, is specifically designed to help you with this job. Until recently, getting started with OpenRefine was rather hard because the amount of functionality can overwhelm you. This prompted Max De Wilde and myself to write a book that will turn you into an OpenRefine expert.
“When all you have is a hammer, every problem starts to look like a nail” is but one of the many wordings of the infamous Law of the Instrument. Many of us are blinkered by our tools, instantaneously choosing what we know best to solve a problem—
Dreaded scientific posters—
Hyperlinks are the door handles of the Web, as they afford going to the next place you want to be. However, in a space as large as the Web, there is an awful lot of possible next places, so the webpage might not offer the door handle you are looking for. Luckily of course, there’s a thing called Google, but wouldn’t it be much more awesome if the links you need were already there on the page? Because right now, the author of the webpage has to make the decision where you can go, as he is the architect of the information. But should he also be the architect of the navigation or should that be you, the person surfing the Web?
What makes the Web more fascinating to read than any book? It’s not that the information is more reliable or people have become tired of the smell of paper. The exciting thing about consuming information on the Web is that you can keep clicking through for more. Hyperlinks have always been a source of endless curiosity. Few people realize that the hypertext concept actually far predates the Web. The idea that information itself could become an actionable entity has revolutionized our world and how we think.
People who have programmed with me or have seen my open-source work on GitHub know that I put a lot of effort in my coding style. I indeed consider programming a creative act, which necessarily involves aesthetics. And then, some people consider aesthetics the enemy of the pragmatic: “don’t spend time writing beautiful code when you can write effective code”. However, I argue that my sense of beauty serves pragmatism much better, because it leads to more concise and maintainable code, and is thereby far more effective.
The iPhone’s Siri has given the world a glimpse of the digital personal assistant of the future. “Siri, when is my wife’s birthday?” or “Siri, remind me to pick up flowers when I leave here” are just two examples of things you don’t have to worry about anymore. However cool that is, Siri’s capabilities are not unlimited: unlike a real personal assistant, you can’t teach her new tricks. If you had a personal agent that could use the whole Web as its data source—
What’s the connection between the Eiffel Tower and the Big Ben? How are you related to Mickey Mouse? Or Elvis Presley? Today, there’s a fun way to find out: Multimedia Lab’s new Web app Everything is Connected allows you to see how any two topics in this world connect. Choose a start topic (this might be you!) and watch an on-the-fly video that takes you to any destination topic you select. You’ll be amazed to discover how small the world we live in really is. In this post, I’ll take you behind the scenes of this fascinating app.
As researchers, communication is arguably the most important aspect of our job, but unfortunately not always the most visible. Sometimes, our work is so specific that it seems impossible to share it as a story with the outside world. Surprisingly, day-to-day social media such as Facebook and Twitter can be highly effective to give your work the attention it deserves. To achieve this, researchers must become conscious social media users who engage in every social network with a purpose—
Most programmers are not familiar with resource-oriented architectures, and this unfamiliarity makes them resort to things they know. This is why we often see URLs that have action names inside of them, while they actually shouldn’t. Indeed, URLs are supposed to identify resources, and HTTP defines the verbs we can use to view and manipulate the state of those resources. Evidently, there is quite a mismatch between imperative (object-oriented) languages and HTTP’s resources-and-representations model. What would happen if we think the other way round and model HTTP methods in an imperative programming language?
If I wanted to join the Oslo Perl Mongers for an RDF hackaton, Kjetil Kjernsmo asked me two months ago. We had met at the LAPIS workshop in Greece, where he showed me the open source work he had been doing. “Sure, I’d love to join”, I replied, “but there’s only a minor problem—
HTTP, the Hypertext Transfer Protocol, has been designed under the constraints of the REST architectural style. One of the well-known constraints of this REpresentational State Transfer style is that communication must be stateless. Why was this particular constraint introduced? And who is in charge then of maintaining state, since it is clearly necessary for many Web applications? This post explains how statelessness works on today’s Web, explaining the difference between application state and resource state.
Recently, I wanted to offer my visitors the option to add any of my publications to their Mendeley paper library. When creating the “add to Mendeley” links, I noticed that papers got added without asking the visitor for a confirmation. Then I wondered: could I exploit this to trick people into adding something to their Mendeley library without their consent? Turns out I could, and here is why: Mendeley did not honor the safeness property of the HTTP GET method.
In my hometown Ghent, an exciting contest took place: PhD students could to send in a one-minute video about their research. Winners get to give a talk at TEDxGhent, a local edition of the famous TED conferences. I badly wanted to participate, so I had to find an original and effective way of selling my message in one minute. My goals: tease the audience, entertain the audience, and, ultimately, activate them to vote.