Re-Decentralizing the Web, For Good This Time
Originally designed as a decentralized ecosystem, the Web has undergone a significant centralization in recent years. In order to regain control over our digital self, over the digital aspects of our lives, we need to understand how we arrived at this point and how we can get back on track. This chapter explains the history of decentralization in a Web context, and details Tim Berners-Lee’s role in the continued battle for a free and open Web. The challenges and solutions are not purely technical in nature, but rather fit into a larger socio-economic puzzle, to which all of us are invited to contribute. Let us take back the Web for good, and leverage its full potential as envisioned by its creator.
Power to the people
As an inventor, you might envision a purpose and destiny for your creation—
Admirably, Tim Berners-Lee never even intended to control his own invention: his former employer CERN released the World Wide Web software openly in 1993, and he gave the Web a decentralized design so that no one can limit what others can say. This unprecedented openness has inspired large-scale permissionless innovation and unbounded creativity, provided a voice to more than half of the world’s population, and revolutionized communication, education, and business. However, a consequence of this unrestricted ability is that anyone can even create things that go against the spirit of the Web, such as illegal materials and—
In and of itself, the concept of centralization does not pose a problem: there are good reasons for bringing people and things together. The situation becomes problematic when we are robbed of our choices, deceived into thinking there would be just one access gate to a space that we in fact collectively own. Some time ago, it seemed unimaginable that a fundamentally open platform like the Web would become the foundation for closed spaces, where we trade our personal data for a fraction of the freedoms that are actually already ours. A majority of Web users today find themselves confined to the boundaries of a handful of influential social networks for their daily interactions. Such networks gather opinions from all over the world, only to condense that richness into one space, where they simultaneously act as the director and the judge of the resulting stream that scrolls across our screens.
Because this change happened so suddenly, we might need a reminder that the Web landscape looked quite different not even that long ago. In 2008, Iranian blogger Hossein Derakhshan was sentenced to 20 years of jail, primarily because of blog posts he had written. He and many others were able to state their critical opinions because they had the Web as an open platform, so they did not depend on anyone’s permission to publish their words. Crucially, the Web’s hyperlinking mechanism lets blogs point to each other, again without requiring any form of permission. This allows for a decentralized value network between authors, where readers remain in active and conscious control of their next steps. Yet when Derakhshan was eventually released in 2014, he came back to an entirely different Web [1]: critical readers had transformed into passive viewers, as if watching television. The Web for which he had sacrificed his personal freedom seemed to have lost an integral part of its own. While the core technological foundations of the Web had not changed, the way people were using it had become unrecognizable after only 6 years.
Of course, social media are not our enemies here: they should be credited with lowering the barrier for the online publication of short texts and photos by anyone. Unfortunately, they operate under a winner-takes-all strategy, each striving to become the dominant portal instead of interoperating with the rest of the Web. In contrast to blogs, we typically cannot interact with posts in one network from within another: we need to either move the people or the data. This famous walled gardens
problem of social media [2] has significantly worsened since 2008 because some gardens have grown huge, and so have their walls. A major problem is that access to the dominant networks invariably means giving up control over our personal data, as the gate to the garden only opens in exchange for our digital belongings. That personal data can then be leveraged to unwittingly influence us with absurdly personalized advertising for brands, products, and even political agendas. Furthermore, once there, people tend to form small conversational circles within each garden—
Unsurprisingly, these problems were reflected in three challenges for the Web [4] that Tim Berners-Lee put forward in 2017:
- taking back control of our personal data;
- preventing the spread of misinformation;
- realizing transparency for political advertising.
Clearly, it is undesirable to tackle these challenges through centralized solutions, for instance by appointing an authority for personal data, news, and advertising. This would create yet another single point of failure, which—
From the above, it is clear that our primary obstacles are not technological [5]; hence Tim Berners-Lee’s call [6] to assemble the brightest minds from business, technology, government, civil society, the arts, and academia to tackle the threats to the Web’s future
. At the same time, computer scientists and engineers need to deliver the technological burden of proof that decentralized personal data networks can scale globally, and that they can provide people with a better experience than centralized platforms.
We will therefore start this chapter with a technological perspective on decentralization, highlighting Tim Berners-Lee’s role in the continuing fight to keep the Web open and decentralized. After a historical overview of power struggles on the Web, we will zoom in on the changes that decentralization requires, and examine what a more healthy ecosystem could look like. As a concrete implementation of these principles, we will study the Solid project. At the end, a discussion of open challenges will lead us to an outlook on the future.
A short history of (de-)centralization and the Web
Decentralization has not always been a question of personal data, as the forces causing centralization have been a moving target. Every time a threat had been addressed, an even bigger one superseded it. Understanding these threats brings insights into the different faces of decentralization.
Decentralization as the unspoken assumption
Decentralized systems, which do not require a central mediator to function, were already around at the time the Web was invented. Most notably, the Internet was increasingly gaining traction as a large-scale decentralized network. Email was even more decentralized than the traditional postal mail service it mimicked, since different mail servers would directly exchange messages with each other. Long forgotten protocols such as the Network News Transfer Protocol (NNTP) allowed for the decentralized exchange of news articles. In short, decentralization was not some crazy new idea, but rather the spirit of the time.
Therefore, when Tim Berners-Lee set out to design a new hypertext system in 1989, it was presumed to be decentralized, in contrast to documentation systems of the time, but in alignment with many others. The main novelty of the Web was its universality [7], its independence of, among others, hardware and software; decentralization was simply the unspoken assumption. This is reflected in the original article introducing the Web [8], which emphasizes universal readability across operating systems, but does not mention the term decentralization
at all.
The only component with centralized roots in the Web’s architectural design is the Domain Name System (DNS), which resolves the domain name part of a Web address (such as example.org) to a physical machine on the Internet. This was not as much of an issue back in the days when the number of domains was relatively small and domain ownership would be stationary. Nowadays, millions of domain names frequently change hands, thereby breaking existing links in possibly malicious ways. By manipulating DNS, governments can block or alter access to existing websites. Tim Berners-Lee has indicated that, in hindsight, a more decentralized naming system might have been preferred. Apart from that, the Web contained all ingredients to thrive in a decentralized way.
The race for our desktop
A first wave of centralization resulted as collateral damage from the browser war of the late nineties, in which companies competed to become the sole vendor of the software through which we access the Web. The Web’s design principle of universality demanded readability on any platform, so the emergence of multiple browsers was a blessing—
While competition by itself can be positive, these features came at the cost of incompatibility across browsers and therefore directly endangered the Web’s universality. Websites would carry badges such as best viewed in Internet Explorer
, since a consistent experience across platforms could not be guaranteed. This also meant that developers were limited by the functionality and quirks of a single browser that, after establishing market dominance, became sloppy with its updates. People who did not want to use a particular browser—
The World Wide Web Consortium (W3C) was founded by Tim Berners-Lee with a mission of compatibility, enabling cross-browser consistency through recommendations that specify the correct workings of Web technologies. While W3C standardization is administratively centralized, it incorporates feedback from a decentralized network of members through a consensus-driven process. A problem by the early 2000s was that Internet Explorer deviated from W3C recommendations at crucial points, forcing developers to follow either the actual standards or the most popular browser’s incorrect implementation thereof.
Fortunately, pressure from Firefox and Safari during a second browser war eventually forced Microsoft onto a more standards-oriented course [9]. Since 2010, no single browser has gained more than two thirds of global market share anymore, meaning that standards compatibility is now in the interest of browser vendors and Web developers alike. The balkanization of the Web through centralized browser development has thereby largely been averted.
The race for our searches
Microsoft’s short-lived victory after the first browser war quickly turned out to be insignificant, since the centralization battle had gradually shifted to other fields. While each browser was quarreling to become the default application, search engines were racing to become the main entry point. Soon after, it did not matter anymore what software you were using to browse; what mattered was who gave you the directions of where to browse next. After all, no immediate income could be generated from free browser development, whereas companies would gladly pay for a prime spot in one of the major search engines’ rankings.
The early search engine landscape featured several competitors, such as AltaVista and Lycos, but it took Google only a couple of years to become the most popular by far. The centralization of search meant that one company gained an overly strong influence on what content people would access, based on the ranking of search results for given terms. Even assuming the best of intentions and ignoring paid advertising, the fact that one algorithm makes decisions for a large number of people leads to an information bias, as there clearly exists no single objective way to rank the best
webpages on any topic. External attempts to manipulate these algorithms started to occur, first through relatively simple interventions such as misleading keywords, later through advanced Search Engine Optimization (SEO) techniques that aimed to improve website rankings in various (and sometimes dubious) ways.
The advent of search engines also brought the first online monetization of user-generated data. Our search terms contribute to a detailed profile of what we need in our private and professional lives. Search engines might know more about some aspects of our lives than our close friends. This profile determines the personalization of our search results and the ads we are shown, encouraging us to visit websites and buy things we otherwise might not have. While personalization has helpful effects for many people, the problem is that we are left without choice or control. We are directed to the big search engines, which, due to their large-scale accumulation of data, deliver us a great search experience. Yet these search engines do not provide us with options for how we want to pay for their services, as most of them only accept our personal data. Furthermore, we are not informed about—
The race for our personal data and identity
While the reign of Google still continues, social media have discovered an even more powerful way of collecting and marketing our personal data. The social Web revolution of the 2000s encouraged people to be present online, which drove many of us to various places to share blog posts, bookmarks, photos, videos, and more. Some years later, social media companies created centralized platforms to take over many of these features, which until then were spread out across multiple providers. These platforms store our personal data and request far-reaching usage rights in exchange for their services, all of which operate within their own walled garden.
Like search engines, the main service of social networks consists of a linear list of content, ranked by factors and algorithms we can only minimally influence. In contrast to search, a social feed is generated without any input terms from our side, like a television that no longer requires a remote. The ensuing show is meticulously personalized based on data we leave on social network platforms, combined with traces from our browsing history picked up—without our conscious consent [10]—by social trackers on third-party websites. In his 2018 Dertouzos distinguished lecture, Tim Berners-Lee remarked that political advertising has been banned from television in the UK [11] because of concerns about the impact of such a direct medium. By that logic, he continued, we should be much more concerned about the heavily personalized political advertising that current social media platforms enable and allow. Even if we refrain from explicitly sharing certain sensitive traits, seemingly insignificant pieces of other data can be combined into reliable predictors of highly personal information [12] such as sexual orientation, ethnicity, and religious or political views, which are subsequently used to target us.
As in the previous two centralization races, a subtle force is exerted upon us: we feel pressured to be part of the large networks, because not joining means missing out on the digital traces of our friends’ and family members’ lives. Often the easiest way for grandparents to see their grandchildren’s latest pictures is to create a Facebook or Instagram account. This is how the digital memory of a large part of today’s generation ends up in one space, often beyond control of those that are part of the memories. The centralization of our online activities has turned so extreme that some Facebook users have become unaware of their ability to access the rest of the Internet [13]. This paradox has sadly become a reality in many countries, where Facebook’s Internet.org initiative provides a severely constrained version of the Web that further reduces people’s options, in blatant violation of Net Neutrality.
Meanwhile, another race is happening in the background, namely the battle to become our identity provider. An increasing number of websites are gradually replacing their own login systems with authentication tied to large platforms such as Google or Facebook. For people with an existing account, the Log in with Facebook
buttons are a convenience. For those without, they create additional pressure to join. And in both cases, such buttons are yet another way of tracking our online activities. This centralization of identity takes away our freedom to assume the persona we want—
Taking back control of our data
A recurring theme in the above centralization races is the lack of choice: a choice of browser and operating system, of entry point to the Web, of storage for our personal data. Decentralization is fundamentally about enabling choice, by breaking up artificially coupled decisions into individual options that can be combined at will. Just as we are free to choose any combination of device, operating system, and browser to access the Web, we should be able to interact with websites and other people without commitment to a single social or other platform. The resulting universality is crucial for permissionless innovation, because it establishes inventors’ independence of these platforms and of other centralized forces.
Taking back control of our personal data, as envisioned by Tim Berners-Lee, is realized by decoupling data storage from services. This means people can store their data wherever they choose, while still enjoying the services they want. We could pick any provider to store our texts, photos, and videos—
This mindset gives rise to the concept of a personal data pod, in which we can store every single piece of information we or others produce about us. As shown in the figure below this statement can be taken quite literally: even a seemingly trivial piece of data, such as simple like
we gave a certain webpage or thing, can be stored in our own pod. While such a degree of decentralization might seem extreme, recall that even supposedly trivial likes can reveal much deeper personal information [12], so it makes sense to give people control over them. Furthermore, since we do not depend on anyone’s permission to publish data in our own pod, we can place public or private likes, dislikes, and comments on anything we want, without fear of them being censored or deleted.
Storing data ourselves enables highly granular access control: people can selectively give permission to friends or applications to read or write specific parts of their data pod. For instance, they can decide whether or not they make their profile picture and full name public, who can see which of their likes and comments, and what applications can edit their pictures or posts on their behalf. These permissions can be changed or revoked at any time. People can have multiple data pods for different purposes, for instance, a pod for personal and family pictures at home, a pod governed by retention policies for professional data at the workplace, and a university pod with study materials and grades. Upon creation, they can decide which data is stored in which pod.
By choosing the storage location of our own data, we prevent unauthorized access and exploitation. We are no longer obliged to pay with our data in order to access a certain service. Moreover, we can protect the most sensitive parts of our data by keeping them to ourselves, and limit sharing to people and services that really require it—
Independent innovation in separate data and service spaces
A key point, however, is that this separation of data and apps benefits not only people, but also the companies providing apps or services. Nowadays, the first task for any app is collecting the data it needs for its operation—
Moreover, user-managed data removes both the burden and the benefits of data collection, which is expensive and legally complex for most companies. Privacy-unfriendly business models centered around data collection will become less attractive. Such an economic change can be accelerated through legislation, like the EU’s General Data Protection Regulation (GDPR), as well as growing awareness among the general population about the dangers of centralization, given data scandals at companies such as Equifax and Facebook. Consequently, new business models for applications become necessary.
Decentralization requires the nature of applications to evolve from silos to shared views. As shown in the figure below, current Web apps combine data and service. Because of this coupling, our LinkedIn contacts cannot comment on our Facebook pictures, and an RSVP on a Facebook event will not be reflected in our Doodle calendar. Decentralized applications, on the other hand, act as views on top of our data pod and those of others. When granted specific access rights, photos uploaded into our data pod by a photo gallery application can be accessed by a social feed app. Events in my personal calendar that have public visibility can show up in that same feed. Our friends can view the parts of our data to which we grant them access through whatever application they wish to use.
Because the choices of data and service become untangled, separate competitions for storage and apps emerge. The figure below shows that centralized applications compete in a single market based on data collection, because usage of a service is coupled with usage of its storage. As such, people cannot easily switch to a better application experience, as migrating their data—
This independence means we can freely switch data and service providers, without requiring our friends to choose the same ones. It tears down the walls in between the gardens, because we gain the ability to reuse and move our data, and can interact with anyone in the entire landscape. Data and service providers can evolve without dependency on each other, which enables a faster and more creative innovation cycle. Through the implementation of standards and specifications, anyone can enter either market and attract customers by providing a better experience than others, without demanding control of our data.
The Solid project
In order to realize this vision of independent data storage and services, Tim Berners-Lee started the Solid project [15]. Solid consists of specifications for interoperability, implementations of servers, clients, and applications, and a community of people who build new things. In the next sections, we will discuss some of Solid’s unique aspects.
Personal data linking and integration
The goal of Solid is empowering people through personal data management, as a counterpart to enterprise data management. We can consider a Solid server or data pod as the equivalent of a hard disk for the Web, on which we can store arbitrary documents and pieces of data. Then Solid apps are like our desktop applications, except that they open documents from Solid servers on the Web. In contrast to actual hard disks, Solid servers are typically public to the entire world, so detailed access control settings allow us to specify who can view or edit which of our documents. Tim Berners-Lee has been leading by example, by managing his personal and professional life with Solid for several years already.
In order for such data management to work at Web scale, data in different pods need to link to each other, similar to how hypertext documents allow us to jump from one website to another. Solid uses Linked Data [16] to achieve this: every piece of data can link to any other. This is how, for example, a comment in your data pod can be attached to a photo in someone else’s pod, while both of you can remain in control of your own data. At runtime, Solid applications integrate data from multiple sources and blend them together into a single experience.
Solid pods can offer people a decentralized means of identification. People can pick a so-called WebID, which is a Web address that identifies them. This Web address leads to their public profile, and people can log on to any pod with their own WebID, instead of requiring a new login on every website or resorting to a centralized identity provider.
Read–Write Web
One of the crucial aspects of Solid is that it provides a Read–Write platform, as was Tim Berners-Lee’s original intention for the Web [17]. While writing has always been possible, in the sense that anyone could start their own website, the Web 2.0 and social media revolutions should be credited with making it considerably easier. This explains part of the success of these platforms, as anyone can now be a content producer at any time, especially through their mobile devices.
Solid should make authoring content similarly easy, the difference being of course that we would always write to our own data pods instead of to the application through which we create. To maximize interoperability, our Linked Data should be stored using Semantic Web technologies [18], which interweave a piece of data with its meaning. That way, applications can make sense of (parts of) each other’s data, without having to agree upfront exactly what our data should look like. When storing data in our own pods, we need a mechanism to inform others when things have been created or modified—
Potential for disruption
By transforming the role of applications in a decentralized ecosystem, Solid is able to disrupt many interactions that happen on the Web. Many processes that currently depend on centralization can be revolutionized in a decentralized way, by cutting out the middlemen currently executing these processes. This can stimulate innovation in areas that are embracing the current status quo and resisting change.
A first obvious target are social interactions between people. Sharing multimedia with friends, colleagues, and family members without privacy concerns becomes possible through Solid. Other examples include collaborating on various kinds of documents under transparent access control, and organizing meetings and events—
Moreover, Solid has the technological potential to disrupt entire industries, such as for instance scholarly publishing. The current scholarly publication process assumes that an author uploads a scientific manuscript to a centralized platform, where a closed group of reviewers evaluates it. After acceptance, the manuscript is published as an article and then becomes accessible to the public, possibly at a fee. This process is rather slow, as the wider scientific community can only read the article at the end—
A decentralized Web for all
Re-decentralizing the Web along the lines of the Solid vision can help us tackle Tim Berners-Lee’s three challenges [4]. We can take back control of our personal data by storing data in our own data pods. The spread of misinformation can be halted, because a free choice of applications allows us to influence our news feed—
Freedom of course always comes at a cost: what constitutes a victory for personal rights and freedom of speech also facilitates the spread of illegal messages, since decentralized networks make it harder to control what information is exchanged. Legality is of course a tricky matter, as some countries instate laws that prevent their citizens from voicing opinions that would be legal elsewhere.
This brings us to another aspect of decentralization, which is the tension between freedom and universality. The Paradox of Freedom states that we can only be free if we subject ourselves to certain rules. Simply said, we can take our bike and ride anywhere—
Importantly, the arrows of decentralization and Solid are not aimed at specific companies. Instead, they point at centralization in general, since many of the problems and challenges faced by these companies are inherent to centralization and the business model of data collection. We have come to the point where companies possess so much data that they themselves are unable to predict the long-term effects that such a centralization might have [10]. Therefore, it is an unreasonable excuse for them to claim that people could have given informed consent
to let their data be processed the way it is. No individual can reasonably understand what happens when they give up control over small parts of their data, since that leads to very different effects in the Big Data picture. Storing our data in a trusted place of our choice, combined with a granular permission model, is therefore a much safer bet.
Tim Berners-Lee insists that the Web should always remain scale-free [21], with room for the very large and the very small, and everything in between. The goal is thus not to have a Web without large players. However, the problem is that the very large are currently trying to make the rest obsolete, which endangers the online freedom and permissionless innovation we have been enjoying for so many years. As argued above, decentralization is foremost about choice, so people should be free to join large or small communities. While several technical issues concerning decentralized applications lie ahead of us, notably providing a better user experience than centralized platforms, the first technological proof has been delivered with Solid. Now, it is up to all of us to anchor this technological progress in today’s and tomorrow’s socio-economic reality in order to re-decentralize the Web for good. Only when we succeed in taking back control over our most precious digital assets, we can truly say: this is for everyone.
References
- [1]
- Derakhshan, H. (2015), “The Web We Have to Save”, 14 July, available at: https://medium.com/
matter/ .the-web-we-have-to-save-2eb1fe15a426 - [2]
- “Break down these walls”. (2008), The Economist, available at: https://www.economist.com/
node/ .10880516 - [3]
- Pariser, E. (2011), The Filter Bubble, Penguin Books.
- [4]
- Berners-Lee, T. (2017), “Three challenges for the Web, according to its inventor”, Web Foundation, 12 March, available at: https://webfoundation.org/
2017/ .03/ web-turns-28-letter/ - [5]
- Rosenthal, D. (2018), “It Isn’t About The Technology”, 11 January, available at: https://blog.dshr.org/
2018/ .01/ it-isnt-about-technology.html - [6]
- Berners-Lee, T. (2018), “The Web is under threat. Join us and fight for it”., Web Foundation, 12 March, available at: https://webfoundation.org/
2018/ .03/ web-birthday-29/ - [7]
- Berners-Lee, T. (2005), “Universality of the Web”, 23 March, available at: https://www.w3.org/
2005/ .Talks/ 0323-yorkshire-tbl/ slide5-2.html - [8]
- Berners‐Lee, T., Cailliau, R., Groff, J.F. and Pollermann, B. (1992), “World‐wide web: the information universe”, Electronic Networking, Vol. 2 No. 1.
- [9]
- Gustafson, A. (2008), “Beyond DOCTYPE: Web Standards, Forward Compatibility, and IE8”, 21 January, available at: https://alistapart.com/
article/ .beyonddoctype - [10]
- Berjon, R. (2018), “Advertising’s War on Consent”, 19 March, available at: https://berjon.com/
advertising-war-on-consent/ . - [11]
- Berners-Lee, T. (2018), “From Utopia to Dystopia in 29 Short Years”, 18 May, available at: https://www.csail.mit.edu/
news/ .utopia-dystopia-29-short-years - [12]
- Kosinski, M., Stillwell, D. and Graepel, T. (2013), “Private traits and attributes are predictable from digital records of human behavior”, Proceedings of the National Academy of Sciences, National Academy of Sciences, Vol. 110 No. 15, pp. 5802–5805.
- [13]
- Samarajiva, R. (2014), “More Facebook users than Internet users in South East Asia?”, 30 August, available at: http://lirneasia.net/
2014/ .08/ more-facebook-users-than-internet-users-in-south-east-asia/ - [14]
- Verborgh, R. (2017), “Paradigm shifts for the decentralized Web”, 20 December, available at: https://ruben.verborgh.org/
blog/ .2017/ 12/ 20/ paradigm-shifts-for-the-decentralized-web/ - [15]
- “Solid”. (n.d.). , available at: https://solid.mit.edu/.
- [16]
- Berners-Lee, T. (2006), “Linked Data”, 27 July, available at: https://www.w3.org/
DesignIssues/ .LinkedData.html - [17]
- Berners-Lee, T. and O’Hara, K. (2013), “The read–write Linked Data Web”, Philosophical Transactions of the Royal Society A, Vol. 371 No. 1987.
- [18]
- Berners-Lee, T., Hendler, J. and Lassila, O. (2001), “The Semantic Web”, Scientific American, Vol. 284 No. 5, pp. 34–43.
- [19]
- Capadisli, S. and Guy, A. (Eds.). (2017), Linked Data Notifications, Recommendation, World Wide Web Consortium, available at: https://www.w3.org/
TR/ .ldn/ - [20]
- Capadisli, S., Guy, A., Verborgh, R., Lange, C., Auer, S. and Berners-Lee, T. (2017), “Decentralised Authoring, Annotations and Notifications for a Read–Write Web with dokieli”, in Proceedings of the 17th International Conference on Web Engineering, pp. 469–481, available at: https://csarven.ca/
dokieli-rww . - [21]
- Barabási, A.-L. and Albert, R. (1999), “Emergence of Scaling in Random Networks”, Science, Vol. 286, pp. 509–512.