Data on the World Wide Web changes at the speed of light—today’s facts are tomorrow’s history. This makes the ability to look back important: how do facts grow and change over time? It gets even more interesting when we zoom out beyond individual facts: how do answers to questions evolve when data ages? With Linked Data, we are used to query the latest version of information, because updating a SPARQL endpoint is easier than maintaining every historical version. With the lightweight Triple Pattern Fragments interface, it becomes very easy for a server to host multiple versions. Using the Memento framework to switch between versions based on a timestamp, your browser can evaluate SPARQL queries over any point in time. We tried this with DBpedia—and so can you!
How can we ever talk about intelligent clients if we don’t provide them with opportunities to be intelligent? The current generation of RDF APIs is patronizing its clients by only describing its data in RDF. This contrasts to websites for humans, where data would be quite useless if it were not accompanied by context and controls. By omitting these, we withhold basic information from clients, like “what’s in this response?” and “where can I go next?”. This post proposes to extend the power of self-descriptiveness from data to API responses as a whole. Using RDF graphs, we can combine data, context, and controls in one response. RDF APIs need to become like websites, explaining clients where they are and what they can do.
Querying multiple sources reveals the full potential of Linked Data by combining data from heterogeneous origins into a consistent result. However, I have to admit that I had never executed a federated query before. Executing regular SPARQL queries is relatively easy: if the endpoint is up, you can just post your query there. But where do I post my query if there are multiple endpoints, and will they communicate to evaluate that query? Or do I have to use a command-line tool? We wanted federated queries to be as accessible as anything else on the Web, so our federated Triple Pattern Fragments engine runs in your browser. At last, multiple Linked Data sources can be queried at once, at very low server-side cost.
In a couple of months, 15 years will have passed since Tim Berners-Lee, Jim Hendler, and Ora Lassila wrote the Scientific American article “The Semantic Web”. It’s hard to imagine that, another 15 years before this, the Web didn’t even exist. The article talks heavily about agents, which would use the Web to do things for people. Somehow, somewhere, something went terribly wrong: the same time needed for the Web to liberate the world has hardly been sufficient for the Semantic Web to reach any adoption. And still, there are no agents, nor are there any signs that we will see them in the near future. Where should we even start?
What good is a Web full of Linked Data if we cannot reliably query it? Whether we like to admit it or not, queryable data is currently the Semantic Web’s Achilles’ heel. The Linked Data cloud contains several high-quality datasets with a total of billions of triples, yet most of that data is only available in downloadable form. Frankly, this doesn’t make any sense on the Web. After all, would you first download Wikipedia in its entirety just to read a single article? Probably not! We combined the power of the LOD Laundromat, a large-scale data cleansing apparatus, with the low-cost Triple Pattern Fragments interface so you can once and for all query the Web.
Talks at academic conferences seldom feature a high knowledge per minute ratio. Speakers often talk for themselves, unwittingly spawning facts that are not directly useful to their audience. For me, the most symptomatic aspect is the obligatory “thank you for your attention” at the end of a talk. Think about what you’re saying. Was your talk so bad that people had to do you an actual favor by paying attention? We’ve got this whole thing backwards. You are one of the people the audience paid for to see. They should be thanking you for doing a great job—provided of course, that you really do the best you can to help them understand.
Ever looked up a person in an encyclopedia without knowing whether it was a man or a woman? And if you did, was it explicitly mentioned in the article? I’m guessing the answer two both questions is “no”. Gender is of course not that important; we’re interested in people for what they do. Yet at the same time, this particular piece of information is so trivial and obvious that we often just don’t mention it. This means that machines, which require explicit instruction, have no way to determine this elementary fact. Therefore, it’s hard to study even simple statistics in an automated way. This is why the Dutch DBpedia chapter had asked me to experiment with gender extraction for people, based on their Wikipedia pages.
The Semantic Web is plagued by various issues, one rather prominent fact being that few people actually heard about it. If you ask me, it’s because we have been focusing almost exclusively on research lately, which is quite odd. After all—no matter how good the research is—eventually, code is at the core of all Web systems. Why is it that we have been selectively deaf and blind for those who build what we need most: actual applications that use Linked Data in the real world? Fortunately, the first Semantic Web Developers Workshop found a very passionate audience. We need more of this, and we need it now.
Reading a selection from a large dataset of triples is an important use case for the Semantic Web. But files in textual formats such as Turtle become too slow as soon as they contain a few thousand triples, and triple stores are often too demanding, since they need to support write informations. The HDT (Header Dictionary Triples) binary RDF format offers fast, read-only access to triples in large datasets. Until recently, this power was only available in Java and C++, so I decided it was high time to port it to Node.js as well ;-)
Peer review is research’ most powerful instrument. Having your manuscript reviewed by independent researchers in your own field improves the odds that your published work is valid—and valuable. The drawback of this mechanism is that many researchers are often on reviewer duty; I find myself reviewing several papers a month. It’s not hard to imagine that sloppiness can creep in sometimes… And sadly, there are not a lot of ways to prevent this: reviews in the research community remain largely anonymous. This means that, if a reviewer has a bad day or doesn’t want to read a paper with their full attention, they cannot be held accountable for that. If you have written a grounded opinion, why don’t you put your name on it?
The Linked Data hype is surrounded by questions, and most of those questions are only answered from the technology perspective. Such answers often insufficiently address the needs of people who just want to publish their data. Practitioners from libraries, archives and museums all over the world have very valuable data that they would love to share, but they often don’t find the right practical guidance to do this. Our new handbook Linked Data for Libraries, Archives and Museums changes that. We wrote it for non-technical people, by combining clear explanations with hands-on case studies.
Like any technological or scientific community, optimism in the beginning years of the Semantic Web was high. Artificial intelligence researchers in the 1960s believed it would be a matter of years before machines would become better at chess than humans, and that machines would seamlessly translate texts from one language into another. Semantic Web researchers strongly believed in the intelligent agents vision, but along the way, things turned out more difficult. Yet people still seem to focus on trying to solve the complex problems, instead of tackling simple ones first. Can we be more pragmatic about the Semantic Web? As an example, this post zooms in on the SemWeb’s default answer to querying and explains why starting with simple queries might just be a better idea.
The yearly World Wide Web conferences are highlights for my research: every time again, the world’s most fascinating people meet to discuss novel ideas. This year’s edition moved to Seoul, and I happily represented Ghent University for the third time, together with my colleagues. In addition to hosting the WS‑REST2014 workshop, I presented Linked Data Fragments at LDOW2014. The combination of these workshops represents for me what is important to move the Web forward: flexible data and API access for automated clients.
Most public SPARQL endpoints are down for more than a day per month. This makes it impossible to query public datasets reliably, let alone build applications on top of them. It’s not a performance issue, but an inherent architectural problem: any server offering resources with an unbounded computation time poses a severe scalability threat. The current Semantic Web solution to querying simply doesn’t scale. The past few months, we’ve been working on a different model of query solving on the Web. Instead of trying to solve everything at the server side—which we can never do reliably—we should build our servers in such a way that enables clients to solve queries efficiently.
More than three years of research and several hundred pages of text later, I’m finally ready to defend my PhD. Why did I start this whole endeavor again? Well, I was—and still am—fascinated by the possibilities the Web has to offer, and working as a PhD student gives you the opportunity and the freedom to dive into the things you love. I wanted to make the Web more accessible for machines, so they can perform tasks in a more autonomous way. This brought me to the crossroads of Semantic Web and REST APIs: semantic hypermedia.
Apologizing is a polite and functional act of communication: it helps people to let go any negative sentiments you may have caused. However, communication is only effective when it is actually meant to help others, not to help yourself. We sometimes send messages out of habit, which strangely can give them the opposite effect than was intended by adopting that habit. Therefore, always think before you communicate to ensure you convey the right message.
Really, nobody takes your website serious anymore if you don’t offer an API. And that’s what everybody did: they got themselves a nice API. An enormous amount of money and energy is wasted on developing APIs that are hard to create and even harder to use. This is wonderful news for developers, who get paid to build two pieces of software—a server and a client—that were actually never needed in the first place. The API was there already: it’s your website itself. Shockingly, a majority of developers seems unable to embrace the Web and the important role URLs and hypermedia play on it. The lie called “API” has trapped many publishers, including the Digital Public Library of America and Europeana.
Research is a rewarding job. You get to work on a cool thing, communicate about it, travel around the world to demonstrate it to others… But most of all, you get the opportunity to work together with highly talented people, in ways that are impossible in industry. The International Semantic Web Conference reunited people working on future Web technology for the 12th year in a row, and I was very lucky to be there. Moreover, our MMLab team, together with the Web & Media Group of the VU, set a new record by winning the Best Demo Award two consecutive years. I’ve come to realize how important communicating and collaborating with people are for good research—simply invaluable.
SPARQL, the query language of the Semantic Web, allows clients to retrieve answers to complex questions. It’s a core technology in the Semantic Web Stack, as it enables flexible querying of Linked Data. If the Google search box is the entry to the human Web, a SPARQL query field is the entry to the machine Web. There’s only one slight problem: nobody seems able to keep a SPARQL endpoint up. Maybe the issue is so fundamental that more processing power cannot solve it.
Data is often dubbed the new gold, but no label can be more wrong. It makes more sense to think about data as diamonds: highly valuable, but before they are of any use, they need intensive polishing. OpenRefine, the latest incarnation of Google Refine, is specifically designed to help you with this job. Until recently, getting started with OpenRefine was rather hard because the amount of functionality can overwhelm you. This prompted Max De Wilde and myself to write a book that will turn you into an OpenRefine expert.
“When all you have is a hammer, every problem starts to look like a nail” is but one of the many wordings of the infamous Law of the Instrument. Many of us are blinkered by our tools, instantaneously choosing what we know best to solve a problem—even though it might not be the best solution to that problem. It doesn’t take long to end up with complex solutions for simple things. Fortunately, the more tools you master, the higher the chance you choose the right one. Thus, an extensive toolbox is exactly what I recommend.
Dreaded scientific posters—if you attend conferences, you definitely saw them. They’re boring and ugly. On purpose. Because that’s what everybody does, right? The adjective scientific seems to imply that we should restrict our creativity. After all, content is king, and too much fanciness won’t get you anywhere? And the term poster is just because “abstract of 84cm × 119cm where you choose the colors” is too long? It’s this kind of reasoning that gets us nowhere.
Hyperlinks are the door handles of the Web, as they afford going to the next place you want to be. However, in a space as large as the Web, there is an awful lot of possible next places, so the webpage might not offer the door handle you are looking for. Luckily of course, there’s a thing called Google, but wouldn’t it be much more awesome if the links you need were already there on the page? Because right now, the author of the webpage has to make the decision where you can go, as he is the architect of the information. But should he also be the architect of the navigation or should that be you, the person surfing the Web?
What makes the Web more fascinating to read than any book? It’s not that the information is more reliable or people have become tired of the smell of paper. The exciting thing about consuming information on the Web is that you can keep clicking through for more. Hyperlinks have always been a source of endless curiosity. Few people realize that the hypertext concept actually far predates the Web. The idea that information itself could become an actionable entity has revolutionized our world and how we think.
People who have programmed with me or have seen my open-source work on GitHub know that I put a lot of effort in my coding style. I indeed consider programming a creative act, which necessarily involves aesthetics. And then, some people consider aesthetics the enemy of the pragmatic: “don’t spend time writing beautiful code when you can write effective code”. However, I argue that my sense of beauty serves pragmatism much better, because it leads to more concise and maintainable code, and is thereby far more effective.
The iPhone’s Siri has given the world a glimpse of the digital personal assistant of the future. “Siri, when is my wife’s birthday?” or “Siri, remind me to pick up flowers when I leave here” are just two examples of things you don’t have to worry about anymore. However cool that is, Siri’s capabilities are not unlimited: unlike a real personal assistant, you can’t teach her new tricks. If you had a personal agent that could use the whole Web as its data source—instead of only specific parts—there would be no limits to what it could do. However, the Web needs some adjustments to make it agent-ready.
What’s the connection between the Eiffel Tower and the Big Ben? How are you related to Mickey Mouse? Or Elvis Presley? Today, there’s a fun way to find out: Multimedia Lab’s new Web app Everything is Connected allows you to see how any two topics in this world connect. Choose a start topic (this might be you!) and watch an on-the-fly video that takes you to any destination topic you select. You’ll be amazed to discover how small the world we live in really is. In this post, I’ll take you behind the scenes of this fascinating app.
As researchers, communication is arguably the most important aspect of our job, but unfortunately not always the most visible. Sometimes, our work is so specific that it seems impossible to share it as a story with the outside world. Surprisingly, day-to-day social media such as Facebook and Twitter can be highly effective to give your work the attention it deserves. To achieve this, researchers must become conscious social media users who engage in every social network with a purpose—and a plan.
Most programmers are not familiar with resource-oriented architectures, and this unfamiliarity makes them resort to things they know. This is why we often see URLs that have action names inside of them, while they actually shouldn’t. Indeed, URLs are supposed to identify resources, and HTTP defines the verbs we can use to view and manipulate the state of those resources. Evidently, there is quite a mismatch between imperative (object-oriented) languages and HTTP’s resources-and-representations model. What would happen if we think the other way round and model HTTP methods in an imperative programming language?
If I wanted to join the Oslo Perl Mongers for an RDF hackaton, Kjetil Kjernsmo asked me two months ago. We had met at the LAPIS workshop in Greece, where he showed me the open source work he had been doing. “Sure, I’d love to join”, I replied, “but there’s only a minor problem—I don’t know Perl!” Turns out there was nothing to worry about: learning Perl is easy, and the community embraces newcomers. Plus, the hackaton was located near a beautiful mountain landscape in Norway. Needless to say, I had a splendid week.
HTTP, the Hypertext Transfer Protocol, has been designed under the constraints of the REST architectural style. One of the well-known constraints of this REpresentational State Transfer style is that communication must be stateless. Why was this particular constraint introduced? And who is in charge then of maintaining state, since it is clearly necessary for many Web applications? This post explains how statelessness works on today’s Web, explaining the difference between application state and resource state.
Recently, I wanted to offer my visitors the option to add any of my publications to their Mendeley paper library. When creating the “add to Mendeley” links, I noticed that papers got added without asking the visitor for a confirmation. Then I wondered: could I exploit this to trick people into adding something to their Mendeley library without their consent? Turns out I could, and here is why: Mendeley did not honor the safeness property of the HTTP GET method.
In my hometown Ghent, an exciting contest took place: PhD students could to send in a one-minute video about their research. Winners get to give a talk at TEDxGhent, a local edition of the famous TED conferences. I badly wanted to participate, so I had to find an original and effective way of selling my message in one minute. My goals: tease the audience, entertain the audience, and, ultimately, activate them to vote.