Ever looked up a person in an encyclopedia without knowing whether it was a man or a woman? And if you did, was it explicitly mentioned in the article? I’m guessing the answer two both questions is “no”. Gender is of course not that important; we’re interested in people for what they do. Yet at the same time, this particular piece of information is so trivial and obvious that we often just don’t mention it. This means that machines, which require explicit instruction, have no way to determine this elementary fact. Therefore, it’s hard to study even simple statistics in an automated way. This is why the Dutch DBpedia chapter had asked me to experiment with gender extraction for people, based on their Wikipedia pages.
Read more… published on 30 November 2014
The Semantic Web is plagued by various issues, one rather prominent fact being that few people actually heard about it. If you ask me, it’s because we have been focusing almost exclusively on research lately, which is quite odd. After all—no matter how good the research is—eventually, code is at the core of all Web systems. Why is it that we have been selectively deaf and blind for those who build what we need most: actual applications that use Linked Data in the real world? Fortunately, the first Semantic Web Developers Workshop found a very passionate audience. We need more of this, and we need it now.
Read more… published on 30 October 2014
Reading a selection from a large dataset of triples is an important use case for the Semantic Web. But files in textual formats such as Turtle become too slow as soon as they contain a few thousand triples, and triple stores are often too demanding, since they need to support write informations. The HDT (Header Dictionary Triples) binary RDF format offers fast, read-only access to triples in large datasets. Until recently, this power was only available in Java and C++, so I decided it was high time to port it to Node.js as well ;-)
Read more… published on 30 September 2014
Read more… published on 22 August 2014
Peer review is research’ most powerful instrument. Having your manuscript reviewed by independent researchers in your own field improves the odds that your published work is valid—and valuable. The drawback of this mechanism is that many researchers are often on reviewer duty; I find myself reviewing several papers a month. It’s not hard to imagine that sloppiness can creep in sometimes… And sadly, there are not a lot of ways to prevent this: reviews in the research community remain largely anonymous. This means that, if a reviewer has a bad day or doesn’t want to read a paper with their full attention, they cannot be held accountable for that. If you have written a grounded opinion, why don’t you put your name on it?
Read more… published on 31 July 2014
The Linked Data hype is surrounded by questions, and most of those questions are only answered from the technology perspective. Such answers often insufficiently address the needs of people who just want to publish their data. Practitioners from libraries, archives and museums all over the world have very valuable data that they would love to share, but they often don’t find the right practical guidance to do this. Our new handbook Linked Data for Libraries, Archives and Museums changes that. We wrote it for non-technical people, by combining clear explanations with hands-on case studies.
Read more… published on 30 June 2014
Like any technological or scientific community, optimism in the beginning years of the Semantic Web was high. Artificial intelligence researchers in the 1960s believed it would be a matter of years before machines would become better at chess than humans, and that machines would seamlessly translate texts from one language into another. Semantic Web researchers strongly believed in the intelligent agents vision, but along the way, things turned out more difficult. Yet people still seem to focus on trying to solve the complex problems, instead of tackling simple ones first. Can we be more pragmatic about the Semantic Web? As an example, this post zooms in on the SemWeb’s default answer to querying and explains why starting with simple queries might just be a better idea.
Read more… published on 29 May 2014
The yearly World Wide Web conferences are highlights for my research: every time again, the world’s most fascinating people meet to discuss novel ideas. This year’s edition moved to Seoul, and I happily represented Ghent University for the third time, together with my colleagues. In addition to hosting the WS‑REST2014 workshop, I presented Linked Data Fragments at LDOW2014. The combination of these workshops represents for me what is important to move the Web forward: flexible data and API access for automated clients.
Read more… published on 19 April 2014
Most public SPARQL endpoints are down for more than a day per month. This makes it impossible to query public datasets reliably, let alone build applications on top of them. It’s not a performance issue, but an inherent architectural problem: any server offering resources with an unbounded computation time poses a severe scalability threat. The current Semantic Web solution to querying simply doesn’t scale. The past few months, we’ve been working on a different model of query solving on the Web. Instead of trying to solve everything at the server side—which we can never do reliably—we should build our servers in such a way that enables clients to solve queries efficiently.
Read more… published on 11 March 2014
More than three years of research and several hundred pages of text later, I’m finally ready to defend my PhD. Why did I start this whole endeavor again? Well, I was—and still am—fascinated by the possibilities the Web has to offer, and working as a PhD student gives you the opportunity and the freedom to dive into the things you love. I wanted to make the Web more accessible for machines, so they can perform tasks in a more autonomous way. This brought me to the crossroads of Semantic Web and REST APIs: semantic hypermedia.
Read more… published on 28 February 2014
Apologizing is a polite and functional act of communication: it helps people to let go any negative sentiments you may have caused. However, communication is only effective when it is actually meant to help others, not to help yourself. We sometimes send messages out of habit, which strangely can give them the opposite effect than was intended by adopting that habit. Therefore, always think before you communicate to ensure you convey the right message.
Read more… published on 31 January 2014
Read more… published on 31 December 2013
Really, nobody takes your website serious anymore if you don’t offer an API. And that’s what everybody did: they got themselves a nice API. An enormous amount of money and energy is wasted on developing APIs that are hard to create and even harder to use. This is wonderful news for developers, who get paid to build two pieces of software—a server and a client—that were actually never needed in the first place. The API was there already: it’s your website itself. Shockingly, a majority of developers seems unable to embrace the Web and the important role URLs and hypermedia play on it. The lie called “API” has trapped many publishers, including the Digital Public Library of America and Europeana.
Read more… published on 29 November 2013
Research is a rewarding job. You get to work on a cool thing, communicate about it, travel around the world to demonstrate it to others… But most of all, you get the opportunity to work together with highly talented people, in ways that are impossible in industry. The International Semantic Web Conference reunited people working on future Web technology for the 12th year in a row, and I was very lucky to be there. Moreover, our MMLab team, together with the Web & Media Group of the VU, set a new record by winning the Best Demo Award two consecutive years. I’ve come to realize how important communicating and collaborating with people are for good research—simply invaluable.
Read more… published on 31 October 2013
SPARQL, the query language of the Semantic Web, allows clients to retrieve answers to complex questions. It’s a core technology in the Semantic Web Stack, as it enables flexible querying of Linked Data. If the Google search box is the entry to the human Web, a SPARQL query field is the entry to the machine Web. There’s only one slight problem: nobody seems able to keep a SPARQL endpoint up. Maybe the issue is so fundamental that more processing power cannot solve it.
Read more… published on 30 September 2013
Data is often dubbed the new gold, but no label can be more wrong. It makes more sense to think about data as diamonds: highly valuable, but before they are of any use, they need intensive polishing. OpenRefine, the latest incarnation of Google Refine, is specifically designed to help you with this job. Until recently, getting started with OpenRefine was rather hard because the amount of functionality can overwhelm you. This prompted Max De Wilde and myself to write a book that will turn you into an OpenRefine expert.
Read more… published on 30 August 2013
“When all you have is a hammer, every problem starts to look like a nail” is but one of the many wordings of the infamous Law of the Instrument. Many of us are blinkered by our tools, instantaneously choosing what we know best to solve a problem—even though it might not be the best solution to that problem. It doesn’t take long to end up with complex solutions for simple things. Fortunately, the more tools you master, the higher the chance you choose the right one. Thus, an extensive toolbox is exactly what I recommend.
Read more… published on 30 July 2013
Dreaded scientific posters—if you attend conferences, you definitely saw them. They’re boring and ugly. On purpose. Because that’s what everybody does, right? The adjective scientific seems to imply that we should restrict our creativity. After all, content is king, and too much fanciness won’t get you anywhere? And the term poster is just because “abstract of 84cm × 119cm where you choose the colors” is too long? It’s this kind of reasoning that gets us nowhere.
Read more… published on 27 June 2013
Hyperlinks are the door handles of the Web, as they afford going to the next place you want to be. However, in a space as large as the Web, there is an awful lot of possible next places, so the webpage might not offer the door handle you are looking for. Luckily of course, there’s a thing called Google, but wouldn’t it be much more awesome if the links you need were already there on the page? Because right now, the author of the webpage has to make the decision where you can go, as he is the architect of the information. But should he also be the architect of the navigation or should that be you, the person surfing the Web?
Read more… published on 31 May 2013
Read more… published on 30 April 2013
What makes the Web more fascinating to read than any book? It’s not that the information is more reliable or people have become tired of the smell of paper. The exciting thing about consuming information on the Web is that you can keep clicking through for more. Hyperlinks have always been a source of endless curiosity. Few people realize that the hypertext concept actually far predates the Web. The idea that information itself could become an actionable entity has revolutionized our world and how we think.
Read more… published on 29 March 2013
People who have programmed with me or have seen my open-source work on GitHub know that I put a lot of effort in my coding style. I indeed consider programming a creative act, which necessarily involves aesthetics. And then, some people consider aesthetics the enemy of the pragmatic: “don’t spend time writing beautiful code when you can write effective code”. However, I argue that my sense of beauty serves pragmatism much better, because it leads to more concise and maintainable code, and is thereby far more effective.
Read more… published on 21 February 2013
The iPhone’s Siri has given the world a glimpse of the digital personal assistant of the future. “Siri, when is my wife’s birthday?” or “Siri, remind me to pick up flowers when I leave here” are just two examples of things you don’t have to worry about anymore. However cool that is, Siri’s capabilities are not unlimited: unlike a real personal assistant, you can’t teach her new tricks. If you had a personal agent that could use the whole Web as its data source—instead of only specific parts—there would be no limits to what it could do. However, the Web needs some adjustments to make it agent-ready.
Read more… published on 31 January 2013
Read more… published on 31 December 2012
What’s the connection between the Eiffel Tower and the Big Ben? How are you related to Mickey Mouse? Or Elvis Presley? Today, there’s a fun way to find out: Multimedia Lab’s new Web app Everything is Connected allows you to see how any two topics in this world connect. Choose a start topic (this might be you!) and watch an on-the-fly video that takes you to any destination topic you select. You’ll be amazed to discover how small the world we live in really is. In this post, I’ll take you behind the scenes of this fascinating app.
Read more… published on 27 November 2012
As researchers, communication is arguably the most important aspect of our job, but unfortunately not always the most visible. Sometimes, our work is so specific that it seems impossible to share it as a story with the outside world. Surprisingly, day-to-day social media such as Facebook and Twitter can be highly effective to give your work the attention it deserves. To achieve this, researchers must become conscious social media users who engage in every social network with a purpose—and a plan.
Read more… published on 25 October 2012
Most programmers are not familiar with resource-oriented architectures, and this unfamiliarity makes them resort to things they know. This is why we often see URLs that have action names inside of them, while they actually shouldn’t. Indeed, URLs are supposed to identify resources, and HTTP defines the verbs we can use to view and manipulate the state of those resources. Evidently, there is quite a mismatch between imperative (object-oriented) languages and HTTP’s resources-and-representations model. What would happen if we think the other way round and model HTTP methods in an imperative programming language?
Read more… published on 27 September 2012
If I wanted to join the Oslo Perl Mongers for an RDF hackaton, Kjetil Kjernsmo asked me two months ago. We had met at the LAPIS workshop in Greece, where he showed me the open source work he had been doing. “Sure, I’d love to join”, I replied, “but there’s only a minor problem—I don’t know Perl!” Turns out there was nothing to worry about: learning Perl is easy, and the community embraces newcomers. Plus, the hackaton was located near a beautiful mountain landscape in Norway. Needless to say, I had a splendid week.
Read more… published on 30 August 2012
HTTP, the Hypertext Transfer Protocol, has been designed under the constraints of the REST architectural style. One of the well-known constraints of this REpresentational State Transfer style is that communication must be stateless. Why was this particular constraint introduced? And who is in charge then of maintaining state, since it is clearly necessary for many Web applications? This post explains how statelessness works on today’s Web, explaining the difference between application state and resource state.
Read more… published on 24 August 2012
Read more… published on 13 August 2012
Recently, I wanted to offer my visitors the option to add any of my publications to their Mendeley paper library. When creating the “add to Mendeley” links, I noticed that papers got added without asking the visitor for a confirmation. Then I wondered: could I exploit this to trick people into adding something to their Mendeley library without their consent? Turns out I could, and here is why: Mendeley did not honor the safeness property of the HTTP GET method.
Read more… published on 19 July 2012
In my hometown Ghent, an exciting contest took place: PhD students could to send in a one-minute video about their research. Winners get to give a talk at TEDxGhent, a local edition of the famous TED conferences. I badly wanted to participate, so I had to find an original and effective way of selling my message in one minute. My goals: tease the audience, entertain the audience, and, ultimately, activate them to vote.
Read more… published on 4 May 2012