No more raw data

Trust envelopes enable responsible data reuse.

Data without context is meaningless; data without trust is useless. 2017-12-18 is nothing but a string—until it becomes a birthdate, a wedding, or the moment a security camera registered you. Handling such highly personal data requires trust. When your personal data is shared with someone, you must be able to trust that they will only use it in the way you agreed to. When someone receives your data, they must be able to trust that it is correct and that they are allowed to use it for the intended purpose. Auditors need to be able to challenge and verify this trust relationship, for example under GDPR. These everyday scenarios highlight that data ecosystems need trust as an integral part of their DNA. Unfortunately, trust is not baked into our data interfaces today: they only provide access to the raw data, disregarding the context that is crucial to its correct treatment. We need to standardize interfaces that carry data in a trust envelope, which encapsulates usage policies and provenance to ensure that data can flow in more responsible ways. In this blog post, I explore how this can work, and why they are a necessary change in the way we exchange personal and other data.

10 November 2023

Imagine spotting a colorful shop on the high street. Right as you walk through the door, the shop owner jumps in front of you, blocking your way with a giant tray of home-baked goods. You just wanted to have a quick look around—at this point not sure yet what this store is even selling or what you want to buy. Alas, no peace or quiet for you! The owner insists with a broad smile that cookies improve your experience, and therefore you must decide now which ones you want.

It’s super weird and awkward because you don’t even know this person. You cannot not choose—you must make up your mind because there’s no getting through otherwise. The tray is big and you’re in a rush. Perhaps it’s easier to simply… accept all?

[photograph of an eldery couple offering cookies at a store entrance] — Do you want a chocolate chip cookie as well? Obviously the strawberry ones are mandatory. ©2017 iStock.com / Ljupco

Surely this story reminds you of an ordeal we all have to undergo every single day. Visiting almost any website means being force-fed cookie pop-ups you never asked for.

It’s hardly an improvement to anyone’s experience when websites open with a big banner they can’t ignore. Before fully seeing the main page, we’re asked to enter into a legal contract. With just one click, we sometimes accept hundreds of legal policies, most of which are written such that the average person cannot understand them. It’s almost as if they want to make their own experience more pleasant, rather than ours…

Why are we confronted with so much more resistance online than in the real world? Shops wouldn’t have any visitors if they treated people like the average website does. Clearly, trust on the Web is at an all-time low and it’s high time we change that.

Fixing this requires paying attention to both sides of the equation. It’s not just people who lose today: companies also fight a losing battle and the end is not in sight. Trust—by definition—is a two-sided construct: a bridge needs two anchoring points. Therefore, solving the question of how people can trust companies more, is intrinsically tied to the question of how we can create more trust for companies as well.

Control is a right, not a duty

Personal data vaults as a win–win strategy

Our overall approach to make personal data work better for everyone, leverages personal data vaults, or data pods as we call them. The core idea is that every piece of data someone produces about you, can be stored in your personal data vault. That’s our starting point for trust: your own private data location, with a provider of your choice.

[a personal data pod makes data flow under a person’s control] — If you’re unfamiliar with data pods, I suggest my blog posts on data vaults ecosystems and the underlying Solid technology based on open standards.

Pods are a great idea because they unlock innovation with data. Essentially, they’re putting data closer to people such that people can decide how their data will work for them. Personal data vaults are fundamentally about empowerment.

And if we do this well, there will be more data available for innovation instead of less, because people can choose to reuse data from other places to their own advantage. If a company offers them a tangible benefit, a customer can decide to share some piece of data obtained elsewhere. The company can then temporarily employ that data during the specific interaction to provide the customer with an enhanced quality of service. This without ever taking control of the customer’s data, and thus without either party having to deal with the additional complexities of doing so.

That’s a forward-looking example of a win–win strategy: people regain agency, and more data becomes available for innovation by companies (still fully under people’s control). We need to apply a similar win–win strategy when it comes to trust.

Let’s not do cookies all over again

When I tell people they’ll soon be able to control their own data, they sometimes voice worries such as:

I already have to click away dozens of cookie pop-ups each day…
Now you’re telling me I’ll have to swipe on 37 permission dialogs as well?

Just imagine: every piece of data anyone asks from you, be it your shoe size or your date of birth, coming with its very own 500 pages of legalese. That wouldn’t be a cookie tray—that’s a whole pallet of cookie boxes! So no, that’s not at all the direction I’m proposing. Control over your data must be a basic right, not a personal duty.

Often, I don’t even want to be asked for permission—just use the data already! I don’t want to give the store clerk permission to check I’m over 18—they already guessed that. I don’t want to give the shop assistant permission to know I am tall and have large feet—they already do. In those cases, the request for permission would inconvenience me more than the actual usage of my data. Sometimes, you just don’t care–and that’s okay.

In many other cases, we do care: don’t use that data without my explicit permission. I definitely want to be asked for permission before anyone starts using my banking data. And accessing my medical data? That’s so sensitive I don’t even want to decide myself! Please don’t ask me—ask my doctor, she knows best what parts of my data you should and shouldn’t be using.

Control is a choice—my choice. It is the right to choose myself what I want to control, to delegate what I want someone else to control, and to be left alone when I don’t want to exercise any control right now. So I will only see the dialog windows I want to see. Don’t force me to make any decision I am not equipped to make or don’t want to make. Yet provide me with the trust that the right call is made every time.

automatically grant or deny access
involve a trusted person to help you decide
explicitly give or withhold permission yourself

[dialogs can be personalized with different levels of detail] — Your pod lets you choose exactly how you want to be asked about your data.

When the shoe fits: a three-part story of trust

Let’s take a deliberately simple case: I’m happy to share my shoe size with any shoe or clothing store. And it just so happens I’m planning to buy new sneakers, which I want the store to send to my home address. So they need data to deliver this service to me, for the duration of our transaction. What we want is data sharing to our mutual benefit. It’s that simple—or is it?

Try any website, I challenge you. Adidas, Reebok, Nike, Puma. All of them want me to decide on cookies first. I’m a customer here! All I need is for them to show me a nice pair of shoes that fit, and I will buy them. Why does that straightforward transaction require a complicated upfront legal contract? My data experience is terrible:

❌ I need to make decisions about cookies.
❌ I need to manually copy over my data.
❌ I have no idea what happens to my data.

So despite living in the age of ever advancing AI, we’re forced by today’s Web to manually copy over the same data again and again, every time agreeing to a new contract. Just think about how many times you have to type out your name or email address on a weekly basis. It’s 2023 and we still manually carry our data around.

That’s all because of a mutual lack of trust. To buy shoes, I have to enter my address and payment details. Companies have to trust that those are correct; I have to trust that they will handle my data correctly. Us carrying our data across their doorstep is an attempt to compensate for being unable to make data and trust flow together. Fortunately, our pods can correctly share correct data on our behalf.

So let’s solve these problems once and for all. Not because I care so much about buying shoes—but because this deceptively complex problem is a direct gateway to solve more advanced cases involving much more private data. If I can’t buy shoes without cookies, I can’t safely exchange financial transactions or medical records either. So let’s get to it.

Part 1: My browser puts on its lawyer shoes

I arrive on Didasbok, the shoe store website of the future. I can see a bunch of shoes: Nice ones, some that fit, some that don’t. There are no cookie pop-ups or consent dialogs to click or accept before I can see their products. Just shoes, and I like it.

At first, Didasbok doesn’t know anything about me. That’s great—I’m not sharing any data yet. Unfortunately, it also doesn’t know what shoes fit me. I could just start using the search filters, but hey—it’s the age of AI, and my machine can do this for me.

Now this is where the magic starts to happen. I’m not clicking any button on their website: they’re not storing my data—I am, through my pod, which is securely connected to my browser. Instead of logging in on the website, I’m clicking a button on the frame of my browser window to indicate I’m ready to share data via my pod.

The reason we can do this, is because Solid adds a layer to the existing Web with standard authentication and authorization. Just like how HTML and CSS standards form a contract between websites and browsers to decide what webpages should look like, the Solid specification will enable contracts for standardized identity and data sharing.

My browser’s address bar has a built-in menu that lets me pick which identity I want to use for interacting with each specific website. (Per-website identity is different from browser-wide profiles, which are limited to consistent login on the browser vendor’s own sites.)

I exist in different contexts, so there’s no single version of me and my data. I can be:

entirely anonymous
a student
a private citizen
an employee of a specific company
a pseudonymous commenter
…

That’s why I first tell my browser what persona to pick for each website. It’s basically the equivalent of signing up for a website with my personal email address, work email address, or a burner email account. When I share any data, it will be from that context. So now my browser knows how I want to share data with the website. But of course, I’ll only share what’s convenient to me.

Next, the website informs my pod that it can use my shoe size to filter results, and my country to calculate shipping fees. My pod then generates a dialog that allows me to choose whether I want to share data from my selected identity:

I’m not interested in promotions right now, but I agree to share my shoe size. Lo and behold, my screen fills up with only shoes that fit me. At this point, Didasbok still does not know who I am. All it knows is that this visitor is a Belgian with shoe size 48. And I didn’t have to type this data into the website: my pod selected the right data and sent it over, after making sure that’s what I want. Take a moment to reflect how much better this future experience is compared to what we have today:

🚫 I received no questions about cookies.
👍 I did not have to type anything.
✅ I was consulted in detail before my data was shared.

All that’s left for me to do, is to find the right pair of shoes.

Part 2: Shoes for everyday people

After some browsing, I found the pair of sneakers I want to buy. Let’s go through the data sharing process again.

The previous dialog still contained some legalese, which is not what I want to see when I’m buying shoes (your mileage may vary). Therefore I changed my pod’s settings to show me more simplified dialogs.

I click the giant “Buy now” button. In order to finalize the transaction, Didasbok needs my complete address for shipping, and my credit card details for the payment. Today, we would enter all of this data manually in text fields on a form. But this is the future! Not only can my pod automatically fill out such forms; the forms don’t need to be there in the first place, because my pod can just share the needed data, machine to machine.

With the new simplified dialog activated, my experience looks like this:

Again, what a great experience:

🚫 I received no questions about cookies.
👍 I did not have to type anything.
✅ I could share my data via a simple dialog.

Can it get any better? Hardly! Except there’s this one thing…

Part 3: Shoepreme trust and convenience

In user experience design, the ultimate way to improve a dialog is to make it disappear. Just like pod-based data sharing can make forms obsolete, it can also make consent dialogs a thing of the past.

I’m fine with shoe stores knowing my shoe size. They don’t need to ask me. In fact, by asking me, you’re bothering me more than by just using that data already. So what if we didn’t need a dialog at all?

Don’t get me wrong—this is a personal choice. Maybe you want to be asked every time before your shoe size is shared with anyone. I definitely want to be asked before sharing my credit card details with anyone. This is still a choice we have to make: for which pieces of data and which recipients should we be asked permission before sharing?

But data pods allow us to make that choice beforehand; or make it once and then tick a box that we don’t want to make that choice again. That’s because, unlike a website you visit for the first time, your pod knows you and your preferences across any online and offline case where you might want to use your data. And since your pod works for you, you can trust it to consistently apply those preferences.

So there’s what I do: in my pod, I set an internal data usage policy saying:

My shoe size can be shared with any shoe and clothing websites that I browse.

In essence, I am giving my prior agreement to any future requests coming in, such that I don’t have to be confronted with pop-ups or make decisions in the moment:

[the shoe store website with only shoes that fit; no dialogs are necessary to achieve this result] — Formless and dialogless sharing of specific data becomes possible through prior agreement.

It’s just a website with shoes that fit me, and I didn’t have to click any dialogs:

🚫 I received no questions about anything whatsoever.
👍 I still did not have to type anything.
✅ I pre-approved automated sharing of specific data.

How it works under the hood is that, when those requests arrive, my pod needs to:

Verify the requester is indeed a shoe store or clothing store, for instance through their NACE or ISIC codes.
Instantiate my internal policy for that single store, packaging my data together with this highly specific policy.

So when Didasbok asks to see my shoe size, my pod verifies their NACE code is 47.72, and puts the data in a virtual envelope that indicates an instantiated data usage policy:

Ruben Verborgh gives permission to Didasbok to use the enclosed data for the purposes of filtering search results, for the duration of the next 3 months.

That’s quite some more legalese than I would have written, but that’s why I’m letting my pod automatically generate it! The recipient doesn’t know that I have an internal rule set that gives the same permission to all shoe stores; all they know is that they can now legally use the data I’m sending them (only) for the purpose I agreed to.

Invisible magic is the best kind of magic.

Trust flows with the data

The reason these previous examples work, is because they make trust flow along with the data. We’ve known for a long time that we need to send the semantics along with the data, such that people and machines from different organizations can interpret the data in the same way. If we don’t, confusion ensues—or in the worst case, disaster.

With Solid pods, we use RDF to integrate semantics with the data. My proposal is to encapsulate this RDF data into a trust envelope, which describes additional context, also in RDF. This envelope describes context to establish mutual trust between the sender and the recipient by explicitly detailing the history and destiny of the data:

[the trust envelope contains digitally signed data; written on the envelope is the source of the data and the permitted usage; the envelope is digitally signed as well] — By indicating the data source and the instantiated usage policy on the trust envelope, personal data can flow together with explicit trust.

How a trust envelope supports the sender

Today, without a trust envelope, I have control over what I share, but my pod is just sending raw data. So once the gate opens—even for a legitimate purpose—my data has gone out and I don’t have control anymore. Of course, my pod would never open the gate for a random website asking for my date of birth. But when a website has a legitimate reason, such as verifying my age for the purchase of a bottle of wine, I could tell my pod to authorize access. Currently, my date of birth would leave my pod as a raw piece of data, without any context of how it’s supposed to be used.

When my pod instead generates a trust envelope detailing the intended destiny of my data, the recipient can track for what purposes they can use it. For example: I can share my date of birth for the purpose of verifying whether I am of legal age to buy wine at Online Store Inc., valid for the duration of 1 hour. The whole envelope and its contents will be digitally signed with my private key, such that I can prove at any point in time which requests I have and haven’t authorized. Therefore, the envelope establishes trust that your data will be used correctly.

Now of course, just because we’re sending the semantics and trust along with the data, doesn’t prevent the recipient from separating them. They can discard the semantics and make a wrong interpretation. They can throw the trust envelope in the bin and just work with the raw data as if there are no limitations. This is why I refer to this concept as a “trust envelope” rather than more established terms such as sticky policies, because policies are never really sticky. But while the envelope metaphor emphasizes how someone could separate it from its data, a digital signature on the sealed envelope means we would notice whenever they tried to reseal the data in a different envelope.

But what stops a recipient from discarding the envelope? We’ll answer that by looking at how trust envelopes benefit recipients.

How a trust envelope supports the recipient

Today, without a trust envelope, companies have to hope that we send them the correct data. Is this really the customer’s address, or did they make a typo? Am I really 18+, or did I just make up a date of birth so I could buy a bottle of wine for my friend? Or worse, does the data require a custom verification process, which is expensive to install and maintain, and actually mostly bothers people who already provided correct data?

If the trust envelope explains the history of the data, digitally signed, companies can automatically verify whether they consider data trustworthy. Such provenance can originally have been supplied by other parties. For instance, my date of birth in my pod might be accompanied by digitally signed provenance from a government’s citizens’ registry. That way, the hard part of the verification performed by another party can be effortlessly reused by both my pod and the recipient of the message. On top of the trust envelope, my pod simply provides the existing provenance trail. Thereby, the envelope establishes trust that my data is correct.

Trust is a bi-lateral construct. It is to my benefit that the recipient has trust in the correctness of my data, as well as to their benefit that the data is correct. Similarly, it is to my benefit that the recipient knows what I consider correct usage of my data, and it is to the recipient’s benefit that they only use my data in correct ways.

And that’s the reason why recipients will want to keep the trust envelope instead of throwing it in the bin. Because it’s their ticket to proving to an auditor that, indeed, they have only used our data for purposes to which we agreed. That’s where our part of the trust comes in: when our pod sends data in a trust envelope, we’re not going to be naive and merely assume that these companies will now do the right thing. No, it’s because, even though they technically can do anything with that data, they will only ever be able to prove legally whatever the policy envelope allows them to do.

The way laws are written, assumes that the majority will do the right thing. Today, being GDPR-compliant is hard even for companies who want to do the right thing. How can they prove that they were indeed allowed to use my date of birth for an age check? All they have is their own claim that you ticked a certain checkbox on their website. That’s not exactly the strong proof that they and I want.

The policy envelope supports those companies who aim to do the right thing, and to ensure that companies who do the wrong thing will not have proof that would hold up in an audit. They can still do the wrong thing—as they would today—but it becomes harder for them to get away with it, and hence less attractive. Besides, why go through all the effort of illegally harvesting data when there’s a perfectly legal alternative? The costs of distrust start outweighing the benefits of trust.

Raw data belongs in a trust envelope

Let’s design more responsible data flows

Raw data is like raw meat. An ingredient of delicious dishes to some, but not exactly a thing you’d touch without thoroughly washing your hands. And just like raw chicken can infect an entire kitchen or restaurant, raw data can be infectious to an entire organization. Organizations must be able to trust that data is correct, and that they are using data correctly. We deserve the same guarantees from them.

Many seem to think that our data is flowing too easily today: we’ve lost control and our data is everywhere. I’m actually arguing the opposite point: our data does not flow well enough. Because our data is currently shared without trust, we need to enter that data ourselves on every website we visit, time and time again. Data isn’t flowing easily at all: we have to manually sustain data flows, serving as a human replacement for the broken trust link in the data chain.

Companies deal with this lack of trust today by rolling out their own storage and trust systems, but that doesn’t scale. Not for them, and not for us. First, many companies aren’t doing it right, because it’s not what they specialize in, and legal data compliance across the globe is inherently difficult. Second, such compliance is expensive and needs constant maintenance. Third, the burden of compliance is pushed upon us, with legalese-ridden pop-ups that even most lawyers can no longer understand.

GDPR has exposed the lack of trusted data flows, but companies have doubled down by merely acknowledging this lack through cookie pop-ups, rather than restoring trust. The cure has proven worse than the disease: their legal compliance has become our responsibility, rather than a healthy basis for a mutually beneficial relationship.

Data pods can make data flow responsibly, and provide the right environment to make trust flow along with the data. Pods could, on our behalf and with our agreement, share useful data automatically. They can ensure that we only need to explicitly approve in cases where we want to, and opt for prior approval for cases where we want to be left alone. Pods can provide control as a right, not as a duty, by helping us exercise this control rather than making it more complicated.

The shoe store example shows that, rather than claiming to give people a better experience with cookies, companies can actually provide us with a better experience when they allow for pod-driven data sharing. And as much as sharing my shoe size was a toy example, even this simple case isn’t possible today—yet. We’re working on creating the standards and technology to make this happen, and when ready, this will open the door to many more complex mutually beneficial scenarios.

Think about all cases where data sharing is a burden today. Think about job searching and application, where you have to send structured and unstructured data around, and then companies have to spend time verifying your input. Trusted data flows could make such scenarios so much more efficient, saving us and companies valuable time and resources. In the medical domain, trusted data flows can literally save lives. The fact that my Fitbit tracker data cannot flow to my doctor is a missed opportunity of significant proportions. Trust envelopes are a game changer for preventive healthcare.

Adding trust to our interfaces

To make data and trust flow together, the design of our data interfaces will need to change fundamentally. In today’s Solid ecosystem, data, history, and destiny get mixed up, so we need to recombine them. Rather than implicitly assuming that certain data comes with a certain provenance and intended usage, we need to make trust explicit. Pod servers should no longer send generic raw documents, but encapsulate the data from each response in a trust envelope instantiated for the specific interaction.

Once we master sending original documents, we want trust to flow with derived data such that, instead of my full date of birth, I can share a verifiable claim that I am older than 18. To make this happen, I send a trust envelope with my birthdate to an intermediary we both approve. This intermediary replies using a new trust envelope with an intact provenance trail, containing my claim of adulthood. I can then share that second envelope with the store, which they will trust because of its provenance trail, without them having the burden of handling personally identifiable information.

In addition to solving these technological challenges, we also need progress elsewhere. From a usability perspective, if our goal is for people to not be bothered as much as they are today, the proposed automated instantiation can help. However, such automated instantiation paradoxically requires us to answer beforehand questions we actually didn’t want to answer ourselves. Presenting these to people as literal questions, assuming this is legally possible, could defeat our purpose of preventing consent fatigue. An interesting direction could be reusable profiles, where for instance a government releases a list of profiles that people can combine into a data sharing strategy that suits them. For example, one profile might be “relaxed shopper”, which I can mix into my pod’s usage policy rules to give only shoe or clothing stores prior permission to use my body measurements for result filtering. And if I’m not okay with this, that’s fine: I can instead select the default “strict shopper” experience where no such data is shared without asking. To facilitate such interactions, we need to explore the usability of different kinds of personalized dialogs for different people.

In all of this, it’s abundantly clear that technology cannot—and should not—solve all of our problems. The trust envelope is a technological basis for trust, but that trust must be anchored in societal, economical, and legal reality. For instance, we cannot enforce with technology that raw data should never be separated from its trust envelope. But what we can do, is designing that trust envelope in such a way that it supports legal processes. And those legal processes in turn should make it societally and economically undesirable to treat data in ways we haven’t agreed to via our trust envelopes. Once a valid request arrives that results in data correctly leaving your pod, privacy is no longer in the technical but the legal realm. Therefore, a pod must do anything in its power to support processes that provide our legal protection.

And that’s what trust envelopes can do for us. They ensure that the majority of those who want to do the right thing, can do so without encountering the absurd hurdles we face today. Those with less sincere intentions will not be able to leverage the benefits of trusted data flows. The key of this all is entangling data with trust, ensuring that no data leaves our pods raw, by always sharing it inside of a trust envelope.

Ruben Verborgh

Thanks to Sabrina Kirrane for first introducing me to sticky policies many years ago, and for openly sharing her insights ever since. My appreciation goes out to Harsh Pandit and Beatriz Esteves for their interdisciplinary approach on GDPR.