Most Web applications today follow the adage “your data for my services”. They motivate this deal from both a technical perspective (how could we provide services without your data?) and a business perspective (how could we earn money without your data?). Decentralizing the Web means that people gain the ability to store their data wherever they want, while still getting the services they need. This requires major changes in the way we develop applications, as we migrate from a closed back-end database to the open Web as our data source. In this post, I discuss three paradigm shifts a decentralized Web brings, demonstrating that decentralization is about much more than just controlling our own data. It is a fundamental rethinking of the relation between data and applications, which—
The movement to (re-)decentralize the Web is sometimes dismissively regarded as a modern-day hippie reaction to the ever increasing power of technology giants such as Facebook and Google. And while the David versus Goliath way of thinking is definitely present among decentralists, there are many more advantages to a world in which people and organizations regain the ability to store their data wherever they want—
Ultimately, decentralization is about choice: we will choose where we store our data, who we give access to which parts of that data, which services we want on top of it, and how we pay for those. Nowadays, we are instead forced to accept package deals we cannot customize. For example, Facebook shows us their social feed featuring our friends, paid for by advertising—
End users become data controllers. This is the most well-known decentralization aspect: we store our data in places of our choice, which improves privacy and control.
Apps become views. As apps become decoupled from data, they start acting as interchangeable views rather than the single gateway to that data.
Interfaces become queries. Data will be distributed across highly diverse interfaces, so sustainable apps need declarative contracts instead of custom data requests.
The basis of decentralization is that people choose where they store their data. Instead of having to pick between a handful of providers such Google or Facebook, in a decentralized world, there will be many options to pick from—
To a certain extent, we already have that choice: since its inception, the Web’s decentralized architecture has allowed anyone to have their own space. However, we want the convenience of the single stream without the central control that currently comes with that. We want to continue enjoying the same types of services that nowadays are only available on centralized platforms. So the important question is: can applications on top of decentralized data behave the same way as centralized apps? For example, can we still generate a friend list and news feed like Facebook does—
On the one end of the spectrum, centralized solutions store all personal data they use themselves: Twitter and Facebook are single data hubs for respectively millions and billions of users. In contrast, the decentralized microblogging network Mastodon lets anyone set up their own Twitter clone, counting around 1.5 million users spread across 2,400 servers. A couple of thousand people share a server, and the application can read content from people on other servers as well. The Solid platform takes this further, introducing the concept of one data pod per person. Such a data pod is a simple data storage location on a server, equipped with highly granular access control, so anyone can decide exactly which people and apps can access what parts of their data. Applications become clients of these servers, sourcing data from multiple data pods. Solid eventually envisages a world of multiple data pods per person: one at home for personal data, one at the office for sensitive work files, one at school to track study material, etc. In this post, I assume such a high degree of decentralization. Note that the names in the above axis are envisaged uses: it is theoretically possible to use Mastodon or Solid in different ways, and other platforms exist.
In a fully decentralized social network, every single part of an interaction—which would now be stored in its entirety on Facebook—could reside in different data pods. Consider this social media post, where an author states his professional opinion on an online news article. Literally each single piece of data can be in another data pod:
In the fully decentralized way of thinking, everything you post is stored on your own website or server. An app collects all posts from people I am following from different servers, and displays them in a feed for me. When I like your post, this “like” is stored on my server. This action triggers my server to send a notification to yours, so you can decide whether to display this like or not—
This paradigm of storing everything in a place we control is fundamentally different from the centralized one, and has several beneficial consequences for users. It improves privacy, since you can say whatever you want about anything, without having to disclose this to Facebook or anyone else. This positively impacts freedom of speech and goes against censorship (with all of the associated consequences and debates). The flexible access control can be used in any way imaginable: even individual likes or comments could only be visible to certain people, groups, or applications—
Other than with centralized platforms, trust is not derived from a single party. For instance, if I claim my post has 124 likes, then we believe this because Facebook says so (and frankly, we have no objective reason to doubt that). In a decentralized scenario, I could prove that by linking to the individual likes that are stored on other servers, which form a provenance trail. And if those links break for any reason (for instance, if people retract their like), I can still prove they once liked it, if my app made a copy of their digitally signed like on my post. This mechanism can replace networks that are largely based on authority, such as LinkedIn, where people build a reputation from the people they are connected to. We can essentially replace LinkedIn by an address book, where somebody is a connection if they also have you in their contact list.
The main challenge with full decentralization of data is scalability. In the Mastodon scenario, there are still relatively few servers for many users. In the Solid scenario, there might even be more data pods than users. In the end, decentralization will go hand in hand with dynamic data replication, which will need to be balanced carefully with fine-grained access control possibilities in order to guarantee data privacy.
By breaking the tight coupling between data and applications, decentralization questions and alters the very nature of an application. While the second paradigm shift comes as a direct consequence of the first paradigm shift we discussed above, it is equally crucial in its own way.
Basically, the competitive advantage of many of today’s popular centralized platforms is their data silo, and the fact that their service depends entirely on access to that data. Conceptually speaking, the service offered by Facebook, Twitter, and LinkedIn is fairly simple and could be replicated easily by others. Yet a major reason why people appreciate the services of these platforms is because of their data: Facebook is engaging because our friends’ data is there, Twitter has all of the world’s tweets and direct messages, and LinkedIn showcases our broad networks. In fact, these platforms have become inseparable from their data: we use “Facebook” to refer to both the application and the data that drives that application. The result is that nearly every Web app today tries to ask you for more and more data again and again, leading to dangling data on duplicate and inconsistent profiles we can no longer manage. And of course, this comes with significant privacy concerns.
In contrast, decentralized Web applications decouple data and applications: you enter data only once—in your own data pod. Instead of maintaining credentials with each app, you log in through your data pod and give apps permission to read or write specific parts of your data. The Web’s ecosystem thereby evolves from bundled data+service packages into applications as interchangeable views, wherein each Web app provides consistent visualizations, interactions, and processing over your personal data pod. Furthermore, those apps let you interact with any other data pods you have access to, such as those of your friends. Applications ask rather than store, and they are able to reuse data create by other apps, avoiding vendor lock-in.
In this ecosystem, Facebook’s friend feeds becomes a view over your contact list in your data pod, combined with the latest messages your contacts have posted in their data pods. Decentralized LinkedIn and Doodle could be granted access to your address book, so your list of colleagues would always be in sync for meeting requests (because there would actually only be one list instead of multiple). Decentralized Doodle and Facebook could both be granted access to your calendar, where Doodle can only see when you are available, and Facebook can only add events. Any change in one view is directly reflected in another because they share the same storage.
Importantly, this disentanglement of data and services creates separate markets for data and applications. Each of those to markets comes with its own competitive forces that stimulate creativity and innovation at a higher rate, since the ability to provide a service no longer depends on collecting data.
On the application market, whoever can make a more user-friendly social feed than Facebook, or show a better network overview than LinkedIn, is able to attract people solely based on its quality of the service. Moreover, people can choose the application that serves them best, and can switch between applications at any time, since all apps are views over your personal data pod. Instead of entering your name and e-mail address over and over again, you instead log in with your data pod to give access to these pieces of data—
On the data market, different options emerge as well. Depending on your requirements, you might prefer different storage providers. The most technologically advanced of us could decide to host their own server, possibly based on existing software packages. For personal purposes, people might select providers similar to Dropbox—
The key to a healthy ecosystem is the independence of these two markets, realized through a noncommittal relationship between apps and data. Since there currently exists no such separation, new innovative application platforms have trouble emerging because they don’t have the data—
The third and final paradigm shift I will discuss pertains to the communication between apps and data pods. It represents my own conclusion, which I think is inevitable for sustainable apps following the first and second paradigm shifts.
The current generation of Web applications communicates with servers through a highly specific sequence of steps that are hard-coded into the application logic. These steps contain specific requests to a Web API, a (typically custom) interface exposed by the server. This approach results in a highly specific contract between a client and a server—
It is unrealistic to hope that all data pods will have the same Web API (be it Linked Data Platform, SPARQL, or GraphQL). Not only would this require a standardization effort without precedent, such a standard could never cover all cases. Given that we aim for competition on the data market as well, different kinds of data pods are expected to provide different kinds of interfaces with varying expressivity. On top of this, on a decentralized Web, the data needed by applications will be scattered across multiple data pods. So even if all pods had the same interface, apps would still need to route requests to the right pods and combine their data.
This indicates that decentralized apps shouldn’t bind directly to concrete Web APIs, because this would limit them to specific data pods at a specific point in time. If their interfaces evolve, or if we want to access different data pods, apps would need to be reprogrammed. Clearly, such a fragile contract between the app and data markets would form a major bottleneck to sustainable growth and scalability. Instead of hard-coding a specific sequence of requests, the application logic should formulate in a higher-level language what operation it wants to perform with data.
Therefore, I believe that decentralized Web applications should exclusively use declarative queries to view and update data on our pods, so their expression of the intended data operation remains constant—
By abstracting all of an application’s operations as declarative queries, we enable an independent evolution of apps and server-side interfaces. At design time, apps only bind to slowly changing high-level queries instead of rapidly moving and changing low-level interfaces, so they don’t need to commit to a specific data pod. At runtime, the client-side query engine library—
While reducing the dependency of applications to queries facilitates their development and improves their sustainability, it implies a complex, cross-API query engine. I envision that multiple implementations of such a query library would compete, and eventually replace the API-specific client-side libraries that are symptomatic of tight coupling between clients, services, and their underlying data. A possible direction to realize this in a scalable way is to split monolithic Web APIs into API features, which can be reused across data pods. These pods could then opt to provide different kinds of capabilities—
The combination of decentralization and query execution also confronts us with a temporally different way of interacting with data. In traditional Web applications, the procedure is typically “send query—
Each of the above paradigm shifts show that decentralizing the Web is about reorganizing power. First, people gain the power to control their own data and privacy. Second, new applications and data solutions gain competitive power through the resulting decoupling of apps and data. Third, the expressive power of applications improves by depending on transferable queries instead of low-level interfaces.
What I describe in this blog post is slowly but steadily happening, and was in fact inspired by prototypes that currently exist. The decentralized editor dokieli and its annotation functionalities convincingly demonstrate that every atomic piece of data can be stored in a different place. Spending the summer at MIT’s Decentralized Information Group revealed the possibilities of simple server-side data stores such as Solid for advanced client-side applications. It’s there that I saw for the first time how data can drive everything seamlessly—
A question many people have is whether decentralization is realistic in real-world scenarios. On the one hand, I’m inclined to answer that, in any case, the Web cannot possibly become more centralized than it already is today. Facebook has already become a main gateway for such an immense number of people, that the only logical direction forward is less centralized. My conviction is based more than just gut feeling, since I see several parties come to similar conclusions. On the other hand, I have experienced the enormous potential of many aspects of the decentralized vision. The idea doesn’t need to start solely with enthusiastic technophiles, but can grow from concrete industry needs. The notion of a private, on-premise data pod appeals to sectors such as finance, law, and healthcare, which have a promising market for high-security data pods and transparent information access through apps as views. From a digital society perspective, personal data pods address the problems we are facing with an increasing number of parties asking consistent access to specific parts of our personal data. I also liked Jim Hendler quoting Marvin Minsky that we’ll know computers are truly becoming intelligent when they won’t ask for the same info twice ever again. Approaching decentralization the right way will enable exactly that.
The final question is who will pay for all of this. The good news is that we’ll have a choice there too. Bundled package deals such as Facebook and Twitter only offer the ad-based payment option with its infamous consequences. In a decentralized world, we can choose our data and app providers independently, and decide for each how we are willing to pay. The bad news is that this means not everything is going to be “free”, as it seemingly appears now. However, increased competition—
Thanks to Tim Berners-Lee, Sarven Capadisli, Dmitri Zagidulin, and Sandro Hawke for inspiring discussions on (re-)decentralizing the Web—