Ruben Verborgh

Ruben Verborgh

Re-decentralizing the Web, for good this time

Originally designed as a decentralized network, the Web has undergone a significant centralization in recent years. In order to regain freedom and control over the digital aspects of our lives, we should understand how we arrived at this point and how we can get back on track. This chapter explains the history of decentralization in a Web context, and details Tim Berners-Lee’s role in the continued battle for a free and open Web. The challenges and solutions are not purely technical in nature, but rather fit into a larger socio-economic puzzle, to which all of us are invited to contribute. Let us take back the Web for good, and leverage its full potential as envisioned by its creator.

Power to the people

As an inventor, you might envision a purpose and destiny for your creation—yet ultimately, people decide how they put it to use. John Pemberton aimed to cure morphine addicts when he started brewing the potion now known as Coca-Cola, Noah McVicker’s Play-Doh served as a wall-cleaner before it became a childrens toy, and Alfred Nobel declared yearly prizes so he would not be remembered for dynamite’s military purposes. Admirably, Tim Berners-Lee never even intended to control his own invention: his former employer CERN released the World Wide Web software openly, and the Web itself is designed in a decentralized way so that no one can decide who can say what. This unprecedented openness has led to large-scale permissionless innovation and unbounded creativity, provides a voice to more than half of the world’s population, and has revolutionized communication, education, and business. However, a consequence of this freedom is also that anyone can create things that go against the spirit of the Web, such as illegal materials and—ironically—platforms whose primary goal is centralization.

The concept of centralization does not pose a problem in and of itself: there are good reasons for bringing people and things together. The situation becomes problematic when we are robbed of our choice, deceived into thinking there is only one access gate to a space that, in reality, we collectively own. Some time ago, it seemed unimaginable that a fundamentally open platform like the Web would become the foundation for closed spaces, where we pay with our personal data for a fraction of the freedoms that are actually already ours. Yet a majority of Web users today find themselves confined to the boundaries of a handful of influential social networks for their daily interactions. Such networks gather opinions from all over the world, only to condense that richness into one space, where they simultaneously act as the director and judge of the resulting stream they present to us.

Because this change happened so suddenly, perhaps we need a reminder that the Web landscape looked quite different not even that long ago. In 2008, Iranian blogger Hossein Derakhshan was sentenced to 20 years of jail, primarily because of blog posts he had written. He and many others were able to state their critical opinions because they had the Web as an open platform, so they did not depend on anyone’s permission to publish their words. Crucially, the Web’s hyperlinking mechanism lets blogs point to each other, again without requiring any form of permission. This allows for a decentralized value network between equals, where readers remain in active and conscious control of their next move. When Derakhshan was eventually released in 2014, he came back to an entirely different Web [1]: critical readers had transformed into passive viewers, as if watching television. While Web technology had of course evolved, its core foundations had not—it was the way people were using the Web that had become unrecognizable in a mere 6 years.

Of course, social media are not our enemies here: they should be credited with lowering the barrier for the online publication of short texts and photos by anyone. Yet they operate under a winner-takes-all strategy, each striving to become the dominant portal instead of mutually interoperating like the rest of the Web. In contrast to blogs, we typically cannot interact with posts in one network from within another: we would need to either move the people or the data. This famous walled gardens problem of social media [2] has significantly worsened since 2008, because some gardens have grown huge while their walls remain in place. A major problem is that access to the dominant networks invariably means giving up control over our personal data: we can enter through the door in the wall if we pay with our digital belongings. That personal data can then be leveraged to unwittingly influence us through excessively personalized advertising for brands, products, and even political agendas. Furthermore, once there, people tend to form small conversational circles within each garden—an effect that is further amplified by the inward focus of social media platforms and their algorithms that favor maximizing engagement over diversity. The resulting filter bubble [3] isolates us into our own echo chambers, whereas the Web’s purpose—and social media’s claim—has always been to connect.

Unsurprisingly, these problems are reflected in three challenges for the Web [4] that Tim Berners-Lee put forward in 2017:

  • taking back control of our personal data;
  • preventing the spread of misinformation;
  • realizing transparency for political advertising.

Clearly, it is undesirable to tackle these challenges through centralized solutions, for instance by appointing an authority for personal data, news, and advertising. This would create yet another single point of failure, which—even assuming the best of intentions—would always be more vulnerable to abuse. The core issue in this situation is ultimately not one with the individual social networks, but with their hyper-centralization of data and people, and therefore power. We want control, but we want to put that control in the hands of every person, as a right they can choose to exercise over the data they create.

From the above, it is clear that our primary obstacles are not technological [5]; hence Tim Berners-Lee’s call [6] to assemble the brightest minds from business, technology, government, civil society, the arts, and academia to tackle the threats to the Web’s future. Yet at the same time, computer scientists and engineers need to deliver the technological burden of proof that decentralized personal data networks can scale globally and that they can provide people with an experience similar to that of centralized platforms.

In this chapter, we will therefore start with a technological perspective on decentralization, highlighting Tim Berners-Lee’s role in the continuing fight to keep the Web open and decentralized. After a historical overview of power struggles on the Web, we will zoom in on the changes that decentralization requires, and examine what a more healthy ecosystem would look like. As a concrete implementation of these principles, we will study the Solid project. We will end with a discussion of open challenges and an outlook on the future.

A short history of (de-)centralization and the Web

The arrows of the decentralization movement have not always been aimed at social media—and they likely will not be anymore at some point in the future. The forces causing centralization have instead been a moving target: every time a threat had been addressed, an even bigger one superseded it. Understanding these threats will provide us with insights into the different facets of decentralization and their importance.

Decentralization as the unspoken assumption

Decentralized systems, which do not require a central mediator to function, were already around at the time the Web was invented. Most notably, the Internet was increasingly gaining popularity as a large-scale decentralized network. Email was even more decentralized than the traditional postal mail service it mimicked, since different mail servers would directly exchange messages with each other. Long forgotten protocols such as the Network News Transfer Protocol (NNTP) allowed for the decentralized exchange of news articles. In short, decentralization was not some crazy new idea, but rather the spirit of the time.

Therefore, when Tim Berners-Lee set out to design a new hypertext system in 1989, it was presumed to be decentralized, in contrast to documentation systems of the time, but in alignment with many others. The main selling point of the Web was its universality [7], its independence of, among others, hardware and software; decentralization was simply the unspoken assumption. This is reflected in the original article introducing the Web [8], which emphasizes universal readability across operating systems, but does not mention the term decentralization at all.

The only component with centralized roots in the Web’s architectural design is the Domain Name System (DNS), which resolves the domain name part of a Web address (such as example.org) to a physical machine on the Internet. This was not as much of an issue back in the days when the number of domains was relatively small and domain ownership would be stationary. Nowadays, millions of domain names frequently change hands, thereby breaking existing links in possibly malicious ways. By manipulating DNS, governments can block or alter access to existing websites. Tim Berners-Lee has indicated that, in hindsight, a more decentralized naming system might have been preferred. Apart from that, the Web contained all ingredients to thrive in a decentralized way.

The race for our desktop

A first wave of centralization resulted as collateral damage from the browser war of the nineties, in which companies competed to become the sole vendor of the software through which we access the Web. The Web’s design principle of universality demanded readability on any platform, so the emergence of multiple browsers was a blessingexcept that they strived for market domination rather than mutually beneficial co-existence. The Netscape browser and Microsoft’s Internet Explorer tried to convert each other’s users through new features, with Internet Explorer reaching over 90% of connected desktops at its peek.

While competition through innovation is fine, these features came at the cost of incompatibility across browsers and therefore directly endangered the Web’s universality. Websites would carry badges such as best viewed in Internet Explorer, since a consistent experience across platforms could not be guaranteed. Those who did not want to use a particular browser—or who could not install it because no compatible version for their system existed—would be unable to access such websites fully or at all. The resulting de-facto browser monopoly infringed on people’s preference for browser or operating system, centralizing the Web’s decision process in one company that thereby determined the rate of innovation.

The World Wide Web Consortium (W3C) was founded by Tim Berners-Lee with a mission of compatibility, enabling cross-browser consistency through recommendations that specify the correct workings of Web technologies. While W3C standardization is administratively centralized, it incorporates feedback from a decentralized network of members through a consensus-driven process. A problem by the early 2000s was that Internet Explorer deviated from W3C recommendations at crucial points, forcing developers to follow either the actual standards or their incorrect implementation in the most popular browser.

Fortunately, pressure from Firefox and Safari during a second browser war eventually forced Microsoft onto a more standards-oriented course [9]. Since 2010, no single browser has gained more than two thirds of global market share anymore, meaning that standards compatibility is now in the interest of browser vendors and Web developers alike. The balkanization of the Web through centralized browser development has thereby largely been averted.

The race for our searches

Microsoft’s short-lived victory after the first browser war quickly turned out to be insignificant, since the centralization battle had gradually shifted to other fields. While each browser was quarreling to become the default application, search engines were racing to become the main entry point. Soon, it did not matter anymore what software you used for browsing; what mattered was who gave you the directions of where to browse next. After all, no immediate income could be generated from free browser development, whereas companies would gladly pay for a prime spot in one of the major search engines’ rankings.

The early search engine landscape featured several competitors, such as AltaVista and Lycos, but it took Google only a couple of years to become by far the most popular. The centralization of search meant that one company gained an overly strong influence on what content people would access, based on the ordering of search results for given terms. Even assuming the best of intentions and ignoring paid advertising, the fact that one algorithm makes decisions for a large number of people leads to an information bias, as there clearly exists no single objective way to rank the best webpages on any topic. External attempts to manipulate these algorithms started to occur, first through relatively simple interventions such as misleading keywords, later through advanced Search Engine Optimization (SEO) techniques that aimed to improve website rankings in various (and sometimes dubious) ways.

The advent of search engines also brought the first online monetization of user-generated data. Our search terms contribute to a detailed profile of what we need in our private and professional lives. Search engines might know more about some aspects of our lives than our close friends. This profile determines the ads we receive and the personalization of our search results, encouraging us to visit websites and buy things we otherwise might not have. While personalization has helpful effects for many people, the problem is that we are left without choice or control. We are directed to the large search engines, which, due to their large accumulation of data, provide us with the best search experience. Yet these search engines do not provide us with options for how we want to pay for their services, as most of them only accept our personal data. Furthermore, we are not informed about—let alone given control over—how exactly our data influences our search results. The increasing personalization gave rise to the first filter bubbles [3], wherein we are more likely to see results similar to those we previously clicked on.

The race for our personal data and identity

While the hegemony of Google still continues, social media have found an even more powerful way of collecting and marketing our personal data. The social Web revolution of the 2000s encouraged people to be present online, which drove many of us to various platforms to share blog posts, bookmarks, photos, videos, and more. Some year later, social media companies created centralized platforms to take over many of these features, which until then were spread out across multiple providers. These platforms store our personal data and request far-reaching usage rights in exchange for their services, all of which operate within their own walled garden.

Like search engines, the main service of social networks consists of a linear list of content, ranked by factors and algorithms we can only minimally influence. In contrast to search, a social feed is generated without any input terms from our side, like a television that no longer requires a remote. The ensuing show is meticulously personalized based on data we consciously left on social network platforms, combined with traces from our browsing history picked up—without our concious consent [10]—by social trackers on third-party websites. In his 2018 Dertouzos distinguished lecture, Tim Berners-Lee mentioned that political advertising has been banned from television in the UK [11] because of concerns about the impact of such a direct medium. Yet by that logic, he continued, we should be much more concerned about the heavily personalized political advertising that current social media platforms enable and allow. Even if we refrain from explicitly sharing certain sensitive traits, seemingly insignificant pieces of other data can be combined into reliable predictors of highly personal information [12] such as sexual orientation, ethnicity, and religious or political views, which are subsequently used to target us.

As in the previous two centralization races, a subtle force is exerted upon us: we feel pressured to be part of the large networks, because not joining means missing out on the volatile virtual traces of our friends’ and family members’ lives. Often the easiest way for grandparents to see their grandchildren’s latest pictures is to create a Facebook or Instagram account. This is how the digital memory of a large part of today’s generation ends up in one space, often beyond control of those that are part of the memories. The centralization of our online activities has become so extreme that some Facebook users have become unaware of their ability to access the Internet [13]. This paradox has sadly become a reality in many countries, where Facebook’s Internet.org initiative provides a severely constrained version of the Web that further reduces people’s options, in blatant violation of Net Neutrality.

Meanwhile, another race is happening in the background, namely the battle to become our identity provider. An increasing number of websites are gradually replacing their own login systems with authentication tied to large platforms such as Google or Facebook. For people with an existing account, the Log in with Facebook buttons are a convenience. For those without, they form additional pressure to join. And in both cases, such buttons are yet another way of tracking our online activities. This centralization of identity takes away our freedom to assume the persona we want—be it anonymous, pseudonymous, or just ourselves—without needing to expose data we consider our own.

Data ownership by decoupling storage from service

A recurring theme in the above centralization races is the lack of choice: a choice of browser and operating system, of entry point to the Web, of storage for our personal data. Decentralization is fundamentally about enabling choice, by breaking up artificially coupled decisions into individual options that can be combined at will. Just as we are free to choose any combination of device, operating system, and browser to access the Web, we should be able to interact with websites and other people without commitment to a single social or other platform.

Taking back control of our personal data, as envisioned by Tim Berners-Lee, is realized by decoupling data storage from services. This means people can store their data wherever they choose, while still enjoying the services they want. We can pick any provider to store our texts, photos, and videos—or simply store them on our own Web serverand rely on any third-party service to interact with them, regardless of storage location. The crucial service of identity can, but does not need to, be provided by the data storage.

This mindset gives rise to the concept of a personal data pod, in which we can store every single piece of information we produce. As shown in the figure below this statement can be taken quite literally: even a seemingly trivial piece of data, such as simple like we gave a certain webpage or thing, can be stored in our own pod. While such a degree of decentralization might seem extreme, recall that even supposedly trivial likes can reveal much deeper personal information [12], so it makes sense to give people control over them. Furthermore, since we do not depend on anyone’s permission to publish data in our own pod, we can place likes, annotations, and comments on anything we want, without fear of them being censored or deleted.

On a decentralized Web, every piece of data can be stored in a place chosen by its author.

This total data ownership enables highly granular access control: people can selectively give permission to friends or applications to read or write specific parts of their data pod. For instance, they can decide whether or not they make their profile picture and full name public, who can see which of their likes and comments, and what applications can edit their pictures or posts on their behalf. These permissions can be changed or revoked at any time. People can have multiple data pods for different purposes, for instance, a pod for personal and family pictures at home, a pod governed by retention policies for professional data at the workplace, and a university pod with study materials and grades. Upon creation, they can decide which data is stored in which pod.

By choosing the storage location of our own data, we prevent unauthorized access and exploitation. We are no longer obliged to pay with our data in order to access a certain service. Moreover, we can protect the most sensitive parts of our data by keeping them to ourselves, and limit sharing to people and services that really require itbut only for as long as they need it.

Independent innovation on separate data and service markets

When people store their own data, privacy-unfriendly business models centered around data ownership will not be viable anymore. Such an economic change can be accelerated through legislation, like the EU’s General Data Protection Regulation (GDPR), as well as growing awareness among the general population about the dangers of centralization, given recent data scandals at companies such as Equifax and Facebook. Consequently, new business models for applications become necessary.

Decentralization requires the nature of applications to evolve from silos to shared views. As shown in the figure below, current Web apps combine data and service. Because of this coupling, our LinkedIn contacts cannot comment on our Facebook pictures, and an RSVP on a Facebook event will not be reflected in our Doodle calendar’s availability. Decentralized applications, on the other hand, act as views on top of our data pod and those of others. When granted specific access rights, photos uploaded into our data pod by a photo gallery application can be accessed by a social feed app. Events in my personal calendar that have public visibility can show up in the same feed. Our friends can view the parts of our data to which we grant them access through whatever application they wish to use.

Centralized Web applications act as silos that do not share data with each other. Decentralized Web applications act as shared views on top of personal data pods.

Because the choice of data and service provider becomes decoupled, separate markets for data and services emerge. The figure below shows that centralized applications compete in a single market based on data ownership, because usage of a service is coupled with usage of its storage. As such, people cannot easily switch to a better application experience, as migrating their data—if possibleis technically challenging. Furthermore, new applications that could offer a better experience have trouble joining the market, since they do not own sufficient data yet. With decentralized Web applications, people select their storage and service providers separately, which allows an independent competition on the level of storage and on the level of services. On both levels, the competition is solely based on service quality and features versus cost.

This independence means we can freely switch data and service providers, without requiring our friends to choose the same ones. This brings down the walls in between the gardens, because we gain the ability to reuse and move our data, and can interact with anyone in the entire landscape. Data and service providers can evolve without dependency on each other, which enables a faster and more creative innovation cycle. Anyone can enter either market and attract customers by providing a better experience than others, without asking for control of our data.

Centralized applications compete in a single market, based on data ownership. On a decentralized Web, data and service providers compete in different markets.

The Solid project

In order to realize this vision of data ownership and data/service independence, Tim Berners-Lee started the Solid project [14]. Solid consists of specifications for interoperability, implementations of servers, clients, and applications, and a community of people who build new things. In the next sections, we will discuss some of Solid’s unique aspects.

Personal data linking and integration

The goal of Solid is empowering people through personal data management, as a counterpart to enterprise data management. We can consider a Solid server or data pod as the equivalent of a hard disk for the Web, on which we can store arbitrary documents. Then Solid apps are like our desktop applications, except that they open documents from Solid servers on the Web. In contrast to actual hard disks, Solid servers are typically public to the entire world, so detailed access control settings allow us to specify who can view or edit which of our documents. Tim Berners-Lee has been leading by example, by managing his personal and professional life with Solid for several years already.

In order for such data management to work at Web scale, data in different pods need to link to each other, similar to how hypertext documents allow us to jump from one website to another. Solid uses Linked Data [15] to achieve this: every piece of data can link to any other. This is how, for example, a comment in your data pod can be attached to a photo in someone else’s pod, while both of you can remain owners of your data. At runtime, Solid applications integrate data from multiple sources and blend them together into a single experience.

Solid pods can offer people a decentralized means of identification. People can pick a so-called WebID, which is a Web address that identifies them. This Web address leads to their public profile, and people can log on to any pod with their own WebID, instead of requiring a new login on every website or resorting to a centralized identity provider.

Read–Write Web

One of the crucial aspects of Solid is that it provides a Read–Write platform, as was Tim Berners-Lee’s original intention for the Web [16]. While writing has always been possible, in the sense that anyone could start their own website, the Web 2.0 and social media revolutions should be credited with making writing considerably easier. This explains part of the success of these platforms, as anyone can now be a content producer at any time, especially through their mobile devices.

Solid should make authoring content similarly easy, the difference being of course that we would always write to our own data pods instead of to the application through which we create. In doing so, we guarantee that everyone can express themselves without risking censorship. To maximize interoperability, our Linked Data should be stored using Semantic Web technologies [17], which interweave a piece of data with its meaning. That way, applications can make sense of (parts of) each other’s data, without having to agree upfront exactly what our data should look like. When storing data in our own pods, we need a mechanism to inform others when things have been created or modified—especially if these are comments on their data. This is enabled through Linked Data Notifications [18], small automated messages similar to email, which different data pods can send to each other. By combining these technologies, Solid aims to realize the Read–Write Linked Data vision [16], in order to ensure that everyone can participate in the Web of Data.

Potential for disruption

By transforming data ownership and the role of applications in a decentralized ecosystem, Solid is able to disrupt many interactions that happen on the Web. Many processes that currently depend on centralization can be revolutionized in a decentralized way, by cutting out the middlemen that control these processes. This can stimulate innovation in areas that are embracing the current status quo and resisting change.

A first obvious target are social interactions between people. Sharing multimedia with friends, colleagues, and family members without privacy concerns becomes possible through Solid. Other examples include collaborating on various kinds of documents under transparent access control, and organizing meetings and events—again with full data ownership, choice of application and storage, and synchronization between different apps.

Moreover, Solid has the technological potential to disrupt entire industries, such as for instance scholarly publishing. The current scholarly publication process assumes that an author uploads a scientific manuscript to a centralized platform, where a closed group of reviewers evaluates it. After acceptance, the manuscript is published as an article and then becomes accessible to the public, possibly at a fee. This process is rather slow, as the wider scientific community can only read the article at the end—if accepted. It is also non-transparent because valuable artefacts of the process, such as reviews and revisions, remain hidden. Further participation is typically only possible through a reply that has to undergo a similar slow process. A decentralized authoring application such as dokieli [19] instead allows researchers to self-publish their manuscripts online in their own Solid pod. Their peers can annotate these manuscripts with comments and reviews, which are stored in their own pods, guaranteeing freedom of expression to any researcher who wants to participate. All outcomes of this process are online, and the scientific community can continuously provide feedback, even after publication on the Web.

A decentralized Web for all

Re-decentralizing the Web along the lines of the Solid vision can help us tackle Tim Berners-Lee’s three challenges [4]. We can take back control of our personal data by storing data in our own data pods. The spread of misinformation can be halted, because a free choice of applications allows us to influence our news feed—and all information in there can be traced back to its source. Political advertising becomes more transparent, as we can decide which parts of our data we expose to whom; moreover, the separate data and services markets allow us to consider other options that are not based on advertising in the first place. While this does not fully address all aspects of the challenges, data ownership and choice are major factors.

Freedom of course always comes at a cost: what constitutes a victory for personal rights and freedom of speech also facilitates the spread of illegal messages, since decentralized networks make it harder to control what information is exchanged. Legality is of course a tricky matter, as some countries instate laws that prevent their citizens from voicing opinions that would be legal elsewhere. An intriguing case is the increased popularity of the decentralized social network Mastodon in Japan [20]: as Twitter started removing images that were deemed questionable under US norms, Japanese users began publishing them on platforms with less censorship. We will have to accept this trade-off between freedom and control—and in absence of a globally accepted set of norms, centralized filtering of questionable or illegal content can never yield an adequate solution.

This brings us to another aspect of decentralization, which is the tension between freedom and universality. The Paradox of Freedom states that we can only be free if we subject ourselves to certain rules. Simply said, we can take our bike and ride anywhere—if only we stay on the right side of the road (which in several countries is actually left). Without such rules, we would not be able to get anywhere without causing accidents. Given that universality has always been a main goal of the Web [7], decentralized communities can only flourish if they agree on some basic framework on how to decentralize. As with the universality of browsers, there is a major role for the W3C in creating the standards that will allow decentralized data pods and apps to interoperate. Fortunately, we do not have to agree on everything. Linked Data enables layered agreements, in which a few rules are adopted by many, and sets of additional rules are agreed upon by smaller groups as needed.

Importantly, the arrows of decentralization and Solid are not aimed at specific companies such as Google, Facebook, or Twitter. Instead, they point at centralization in general, since many of the problems and challenges faced by these companies are inherent to centralization and the business model of data ownership. We have come to the point where companies possess so much data that they themselves are unable to predict the long-term effects that such a centralization might have [10]. Therefore, it is unreasonable to use informed consent as an excuse, since no individual can reasonably understand what giving up control over small or large pieces of their data will eventually lead to. Storing our data in a trusted place of our choice, combined with a granular permission model, is therefore the only safe bet.

Note that none of us are dreaming of a Web without large players. Quite the contrary: Tim Berners-Lee insists that the Web should always remain scale-free [21], with room for the very large and the very small. The problem is that the very large are currently trying to make the rest obsolete, which endangers the online freedoms we have enjoyed for so many years. As argued above, decentralization is foremost about choice, so people should be free to join large or small communities. And while there are several technical issues ahead of us for decentralized applications, notably guaranteeing a similar user experience as centralized platforms in terms of usability and speed, the first technological proof has been delivered with Solid. Now, it is up to all of us to anchor this technological progress in today’s and tomorrow’s socio-economic reality in order to re-decentralize the Web for good. Only when we succeed in taking back control and choice over our most precious digital assets, we are able to truly say: this is for everyone.


