A data ecosystem fosters sustainable innovation

By equipping people with personal data vaults, we can leverage more data for better services.

We’re living in a data-driven economy, and that won’t change anytime soon. Companies, start-ups, organizations, and governments all require some of our data to provide us with the services we want and need. Unfortunately, decades of Big Data thinking has led many companies to a consequential fallacy: the belief that they need to harvest and maintain that personal data themselves in order to deliver their services and thus survive in the data-driven economy. This prompted a never-ending rat race, dominated by a handful of large players and driven by a deeply flawed notion of “winning”, with as a result that most people and companies collectively end up losing much more than they put in. Pointless data greed has falsified competition and stifled innovation from the moment data collection became more important than quality of experience. A way out of this dead end is to put people fully in control of their own data by equipping them with a personal data vault. Vaults enable us to break the standstill, as they re-level the playing field by giving all parties equal chances to access data under people’s control. Halting data harvesting is, paradoxically, how companies can leverage more data towards their services instead of less. Yet they won’t own that data—and in a sustainable ecosystem, there’s no need to. In this post, I dive into the surprising economics of an overdue data revolution.

7 December 2020

An innovation problem

To the moon and back

In December 2019, Google and Facebook proudly announced a major milestone, which was echoed in news media all around the world: it is now possible to copy a picture from Facebook to Google Photos. This news came in mere months after we celebrated the 50^th anniversary of another technological feat: the moon landing of 20 July 1969, when millions of households witnessed Neil Armstrong take a giant leap for mankind.

So let me get this straight: two of the largest tech companies in history make headlines because in 2019, they move a single photo over the whopping distance of 11 km it takes from the Facebook headquarters in Menlo Park to the Googleplex in Mountain View, whereas in 1969, we sent live video signals from 380,000 km away on the actual moon?

If those two companies, both widely hailed as pinnacles of technology, genuinely consider this to be innovation they are proud of, the only logical conclusion is that data-driven innovation today is fundamentally broken.

The problem is widespread and not limited to technology or social media. Any sector that requires personal data to deliver services, from retail over insurance to health, suffers from the damaging effects of siloization. Companies increasingly need more access to data, but they won’t get there if they keep on collecting that data themselves.

[a collection of oil containers] — If data is the new oil, why do companies insist on stocking it in barrels where it cannot flow?
Unlike oil, data is an infinite and duplicable good. We need it to run the economy’s engines, but that does not mean companies need to act as offshore drilling rigs.
©2016 Fogarty Avenue

Unsustainable by design

Our data collection frenzy seems to stem from too many years of Big Data thinking, during which companies were encouraged to centralize and store as much data as they possibly could. After all, there are no technological limitations to the amount of data our systems can store. However, we have since come to the painful realization that there are severe non-technological limitations that make large-scale data harvesting an unsustainable strategy. Data collection is plagued by legal, economical, societal, and ethical barriers that only the largest of companies can arrogantly afford to ignore (and hopefully not for much longer). The rest are stuck with expensive issues, such as:

There is no such thing as enough data; you can only strive to have more than your competitors for as long as you can, and hope that you stay on top. Meanwhile, substantial investments are replicated among all companies within a single sector for obtaining the same pieces of data, with money that could instead be spend on distinctive competitive advantages and innovation. And the more data you have, the more it costs to keep it up to date and at a sufficient quality level.
At any point in time, you can lose all of those investments when consumers start exercising their GDPR, CCPA, or LGPD rights. Within the European Union, consumers can demand that companies give them all of their personal data—and this includes derived data. Furthermore, they can oblige companies to then delete that data. While people are generally still unaware of the practicals, their access to various user-friendly tools is steadily increasing.
These and similar comprehensive legal frameworks imply that data collection has become a liability, especially when taking the Big Data principles to heart and storing heaps of data that might or might not be relevant at some point in the future. When artificial intelligence based on machine learning starts taking many different pieces of data into account, individuals’ rights to erasure becomes hard to satisfy.

These issues are so severe and fundamental, that smart companies know they have already lost their data—they are merely squeezing the lemon to the very last drop. If even Facebook, arguably the planet’s largest and most successful data harvester, is still desperately trying to obtain even more data in blatantly unscrupulous ways, it should be clear that nobody—not even Facebook itself—can ever have enough data and hence win this unfathomable rat race. Data collection is unsustainable, by design. Business models that hinge on harvesting personal data are no longer viable.

People in control means more data

A personal data vault for every person

We need to shed a new light on the interplay of businesses, people, and their data. An alternative under active exploration is an ecosystem of personal data, in which every person has their own personal data vault, which we refer to as a data pod.
I will first explain what a data pod is and what it does, before discussing how it enables innovators to play in a fundamentally different league.

Think of a data pod as a virtual drive to which you as an individual hold the keys. Your data pod can then store and safeguard:

every single piece of data that you produce
every single piece of data that companies and organizations produce about you

As the controller of your data pod, you can decide for every piece of data which parties you want to share it with. This way, companies can get access to specific data without writing or collecting it themselves, while respecting every individual’s preferences.

Acquiring data through harvesting is expensive, and leads to suboptimal access to data.
When people are in control, relevant and up-to-date data can be readily available.

Moving control of personal data from companies to people changes the economics of personal data in a way that—perhaps contrary to what our instincts would tell us—is beneficial to small and large companies alike:

Crucially, there will be even more data available—not less, despite fears of some. This is because companies will be able to access data created by others, and people can share that data with any party they believe will provide them with a better service as a result. Without data pods, today’s companies only see data they manage to collect themselves, which is becoming increasingly more difficult to achieve.
It eliminates the cold-start problem, where the first interaction with many apps or services is a sign-up form that people have already filled out over and over again. When the data is in their pod, it only has to be shared, not (manually) duplicated, realizing a better and faster experience for both people and companies.
The collection effort only needs to happen once for every piece of data, so companies no longer spend excessive resources on gathering and maintaining data that their competitors also have readily available.
Data remains synchronized and up-to-date, since companies read it from the source rather than storing it themselves. Whenever any party updates the data, the changes are visible to anyone granted read permissions. This reduces maintenance costs.
The legal rights to access and erasure are trivially satisfied, because people can always see their own data. They can achieve erasure by either stopping to share certain data, or by deleting it from their own pod.

Before labeling this vision as utopian, recall that—legally speaking—this situation is already achievable today in many jurisdictions. As a European citizen, I can exercise my GDPR rights to move my data from any company that holds it to a place that I control. What we now need is the technology and the processes to facilitate this in practice.

Smart reusers win through innovation

Let us now examine how data vaults circumvent the data rat race as they move from a wasteful harvesting competition to the more productive league of innovation. Today’s worry of many companies is how they can collect more data than the largest player in their field. Not only is this prohibitively expensive, there’s a limit to how much data anyone can reasonably gather because of legal restrictions, which are still tightening. Even if a smaller company would somehow be able to collect more data than LinkedIn, Amazon, or whichever the biggest is in their industry, they would soon hit a legal ceiling they cannot break through any longer.

Beating them at their own game is thus impossible, because today’s data race is finite—and many have hit some of its limits already. Personal data pods offer the much needed opportunity to move to a meaningful, innovation-based competition.

We cannot beat large players by collecting more data; and even if we could, the growth potential is capped by a legal ceiling. Finding out who can collect the most data is a boring competition, and the winners are already known. By learning how to leverage existing data, companies can instead lead in a diverse competition where innovation and quality of service dominate.

Companies can stay ahead of the competition by stopping exclusive collection of data, which we know is a losing strategy. Because personal data vaults make more data available, we need to start thinking about how to leverage existing data. The data pod ecosystem is an open field with much higher potential. After all, the data-driven economy is fueled by data, and installing an ecosystem of data pods brings plenty of fuel for all parties—more than any of them could ever hope to collect individually. Moving on from data harvesting to data reuse is the course correction the world needs.

A key advantage companies gain is that people can decide to grant them access to their competitors’ data. The price they pay, is that it also holds vice versa. However, this is not a novelty of the personal data pod vision but rather a consequence of existing legal frameworks such as GDPR, under which consumers are entitled to retrieve their own original and derived data, which they can already send to other companies today. Personal data vaults essentially just cut out the unnecessary detour.

Therefore, it makes much more sense for companies to prepare for a world where competitors can see each others’ customer data, as opposed to still trying to fight what is essentially a legal reality already.

I believe that data-driven companies of the future will derive their value from the intelligence they provide on top of existing data by treating it as a commodity, rather than clinging onto the flawed fantasy of data as the irreplicable resource it clearly isn’t. Therein, to me, lies the answer to the question of how small and large companies can compete with multinational data harvesters, which have disrupted markets globally with their cannibalistic and unsustainable business models. The main reason why companies have trouble competing with them is because they sit on unprecedented heaps of data; open up that data stream, and the unfair advantage evaporates.

More data means new opportunities

A glimpse into the future

To spark your imagination, I will sketch some novel avenues to which personal data vaults give rise. A first example are useful bits of data we don’t have today. For instance, when people go online clothes shopping, the majority of items on any given website will not fit them. It’s 2020—why can we ask our digital assistant about the weather or to send tweets to the entire world, whereas we need to repeatedly inform various websites on every visit about our shoe size? Clearly, having this information readily available would be super helpful, yet they are not among the millions of facts that Facebook and Amazon store. As such, they are understandably too much of a hassle for simple online shops to ask, store, and maintain.

With a personal data vault, in contrast, even tiny pieces of data can easily be shared in an automated way. While sharing your shoe size might sound trivial, think of the hundreds of items that are shown to you on such websites that you’ll never buy anyway. What’s the point of showing clothes that do not fit? Reducing people’s cognitive overload means easier and faster purchases—it’s a definite win for both sides.
Too impractical for data harvesters today, but easy for data reusers in the future; and there exists a lot of small data for which the same holds. Think about food preferences or allergies, location (not just your current one but also home or office), not to mention basic details we do get asked to fill out on a weekly basis such as given name, email address, phone number, and many more.

Another example are retailers such as my local supermarket, which is part of a larger chain that is well-versed in the dark art of Big Data. They know what I bought there, what I am buying, and—better than myself—what I will buy. They know the path I am walking in their store, because I scan my groceries with a hand-held barcode reader. And perhaps, one day, they will measure my excitement or confusion about their new store layout by counting how many times I am breathing in and out at the corner of every aisle. This is classic Big Data thinking.

But let’s be honest here: what they really want to know is what I’m buying from their competitors. Thanks to recent data regulations, they can! I can request my data from every supermarket I visit, and then ask my local store if they are willing to see that data—and if so, what offers they have for me. Again, two winners here: my store has more opportunities to sell me things, and I get access to better deals. If they fail to notice the advantages, I can exercise my right to erasure and prevent them from seeing even data they collected. Personal data pods allow us to implement all kinds of mutually beneficial data exchanges in a much more efficient way.

Finally, there is an easy way to deal with companies that are asking for too much data. Let’s say a music streaming service wants access to my name, birthdate, and all of my personal “likes”—which include songs and movies but also more sensitive topics such as politics and buying habits. Since the access is provided by my data pod, I am fully in control. This means I can provide automatically generated fake data to this particular app, since I consider some of the requested information irrelevant to the service I want them to provide me. The sheer fact that people now easily can lie about these attributes whenever they want to, means that they likely won’t have to: it makes the value of irrelevant data plunge, and hence it will no longer be worth the effort or liability of asking for it.

Don’t get me wrong: I am not arguing that people should trade their personal data or be dishonest in interactions. Rather, I am saying that their ability to do so is a natural consequence of them having control over their data; and true control also means their freedom to operate in ways some might disagree with. Legislative frameworks might at some point place boundaries on what we as a society consider acceptable.

Furthermore, these ideas cover only commercial applications. Imagine how personal data pods could make the difference in life-defining moments. Moving to a new city becomes easier when you just have to update your address in your own pod, and all parties that need to be aware will read it from there. Finding a job is less of a hassle when your resume’s work history needs to be entered only once and advertised from your pod to future employers. Vital health records and medical imagery can follow you along in your personal data pod, skipping digital queues between various hospital systems that could otherwise incur critical delays. And you control who sees what.

The post-Big Data era

The above scenarios show that we cannot look at the future with today’s eyes. Today, it would be unthinkable for companies to not try asking for certain kinds of data, or to consider sharing their customers’ data with others. But because all of these options are possible, the economics around personal data are bound to change completely. This will have major consequences on future business models. In a post-Big Data world, data is a commodity under control of the people, such that attempting to collect it becomes completely pointless. There is no reason to fuel an engine with more gas than it uses, when storing excess fuel becomes a responsibility with significant liability.

Note how we’ve made it this far in a story with personal data as the main protagonist, without talking about privacy. In fact, I am only mentioning privacy to explain why it—while important—isn’t the core issue. Our lack of privacy today is merely a consequence of unsustainable business models. Hence, we do not fix privacy by fixing privacy; we address it by fixing business models. Just like the serious privacy deficiencies today are merely collateral damage of business gone wrong, our future gain of privacy will be a side effect of more sustainable business models involving personal data.

The reason we so often hear about it in the context of these topics, is because large companies purposely inject privacy into discussions as a red herring. This causes a distraction from the conversation we ought to be having, which is control over data. Privacy is a one-sided story that solely focuses on people, who have intentionally been primed with the term so often that they stopped caring about it a long time ago. Since there’s also no benefit to companies, neither side has an incentive to implement it properly. In contrast, control via personal data pods creates opportunities for both people and companies, as argued and exemplified above. Privacy is a choice that people should be able to make, and they gain that ability when they can take control of their personal data, as one of the many choices they might want to make. Today, we already get some privacy, at the mercy of what companies are willing to provide—but clearly no control, which is what would enable us to take matters into our own hands.

The two-sided benefits delivered by personal data vault ecosystems is why I believe that they are the future-oriented way of thinking about the data-driven economy. It demonstrates once and for all that giving people control can increase the amount of data available to deliver high-quality services. Personal data, innovation, and privacy need not be at odds with each other in an ecosystem designed for sustainability.

Moving forward with a concerted strategy

Data pods as a utility

A crucial question is how to make an ecosystem of personal data vaults happen, because a vision without a plan ultimately remains a dream. Without careful planning, we risk remaining in a chicken-and-egg deadlock, where companies do not leverage personal data because there are no pods, and pods do not go mainstream because companies deliver no services on top of them. We need to ensure that all stakeholders move in sync, such that the ecosystem under development remains balanced at all times. Disproportional progress in one area risks raising disappointment in others.

Let us look at the different actors in the ecosystem. By relocating data from companies and organizations into people’s personal data vaults, we establish separate markets for data and service providers. Existing companies keep their role as service providers, and dedicated new companies can act as providers for data pods. Their task is to ensure that people have a trusted space to store and access their data. Trust comes from a choice of provider, and technological standards will ensure that people can freely choose and move their data whenever they want. People’s choice for a pod provider is independent of the providers they pick for services, similar to how a payment card can be used with any business, regardless of the bank that issued it.

But how can we realize a level playing field for equal access to data pods? One option is to consider data vaults as a utility, just like a connection to water, gas, or electricity. Those services are typically provided by commercial companies that are certified by an official body such as a government. That gives people a free choice of utility provider, so multiple companies compete in different market segments to tailor their offering to various groups of individuals while simultaneously guaranteeing the quality of service, so everyone is certain to have a secure and interoperable data pod.

[photograph of waterpipes] — Thinking of data pod providers as utility companies in an effective way of ensuring that everyone has equal access to standardized services. ©2009 theilr

Just like governments do not produce electricity themselves, the market can take up the data pod provider offering. And just like official certification prevents untrustworthy companies from producing gas or sanitizing water, it can ensure that data pod providers store data in safe and compliant ways. Potential candidates for the role of data utility companies include banks, which already store some people’s valuable assets, and telecom providers, which already have the necessary broadband infrastructure in place to reach large numbers of the population.

Setting the example in Flanders

This vision is currently being implemented in Flanders, a region of Belgium with 6.5 million inhabitants. The Flemish Government is founding a data utility company to become one of the private parties that will provide each of Flanders’ millions of citizens with their own data pod. This paves the road for Belgian and European companies to leverage the vast potential offered by the ecosystem. When companies can assume that every citizen has their own data pod, opportunities are numerous:

Companies will have data available when they need it, such that they are no longer running behind multinationals that harvest data at large scale. This makes them competitive again in the data-driven economy, boosting activity in multiple sectors.
Government agencies will be run more efficiently as they do not waste time on collecting data and keeping it up to date. Data vaults eliminate the need to replicate essential citizen information, such as official address, across different levels of government. This results in lower costs and a better quality of service.
Citizens receive a better experience because they can easily grant access to relevant data for allowing services to be better adapted to their needs. Their data is stored in a readily accessible place, avoiding the need to supply the same fields repeatedly.

Moreover, in Belgium, citizens receive a yearly free allotment of basic utilities such as water, gas, and electricity with a provider of their choice such that society can guarantee every individual’s basic needs are met. This is complemented by the pay-per-use offering of specific providers. In a similar way, the government could provide a free allotment of data, again with a provider of an individual’s choice, such that all digital services can count on the availability of that data pod.

While other regions and countries world-wide will have different approaches, this particular strategy fast-tracks the roll-out of a sustainable data economy.

Solid as the standard

The key to sustainability is that every data pod provider implements a universally accepted standard, such that people can freely choose where they host their pod, and can change providers at any time. The same standard is adopted by service providers, who thereby can stream data from any data pod provider, with the person’s permission. This indicates that the right choice of standard is a core decision for the entire landscape, and therefore one of the aspects of certification by an official body, similar to how electricity voltages and drinking water properties are standardized as well.

In my opinion, the Solid standard is the one that has the highest potential to support a sustainable data ecosystem in the long term, as it is both technically and strategically well positioned to take up this challenge:

Solid builds on 30 years of existing Web standards that have realized an unprecedented operability. Webpages that were written 30 years ago can be viewed on today’s devices that did not even exist back then. Continuing that tradition, with the same mindset and some of the same people and organizations, Solid similarly aims to make data available for services that don’t yet exist today.
Solid is domain-agnostic, such that data pods can contain any kind of data. This contrasts with data vaults made for specific purposes, that can only support a limited set of use cases, severely limiting the range of innovation.
Major companies and organizations are running pilots on Solid today, including household names such as the BBC, the NatWest bank, and the United Kingdom’s National Health Service (NHS), as well as the entire region of Flanders, where every citizen will have access to their own Solid data pod.

Therefore, I consider Solid the reliable choice for moving forward with such an ambitious ecosystem, given its technical properties and promise of wide adoption.

A brave new world of opportunity

The personal data vault ecosystem is a new one, and important technical challenges lie ahead of us, some of which I’m actively working on. Whereas Big Data has only prepared us for a world where large volumes of data will be in few sources, it appears that the future will instead consist of a very large number of personal data sources. So we will need to adjust technology significantly, and shift our budgets accordingly.

Fortunately, restoring innovation through a sustainable data-driven economy is such a positive game changer to society and industry, that I believe overcoming those technical hurdles is a justified effort that will impact our lives for the better.

Ruben Verborgh

Thanks to Tim Berners-Lee, Raf Buyle, Katrien Mostaert, John Bruce, Dries Buytaert, Lieven De Marez, and all other valuable sparring partners from the past couple of years.