Your data, in your hands: why I built Habeas

The right you have but cannot use

I recently wanted to download my purchase receipts from Carrefour. It’s not my usual supermarket, but I’ve shopped there enough to have an account with my email and loyalty card. It’s my data. Generated by my money and my decision to shop there. I wanted to feed it into my finance app to get a real breakdown of what I spend.

Some supermarkets send you an email with a summary of each purchase. For those cases I already have Tiquetera, which processes them automatically. But Carrefour sends nothing. The only way to access your receipts is to go to their website, one by one. Click, wait, click, wait. And if you try to automate it with a script, the anti-bot wall blocks you before you reach the third one.

The reasonable thing would be an “Export history” button, or an API, or at least a CSV. There’s nothing.

And it’s not an isolated case. It’s a pattern that repeats with your electricity company, your phone carrier, your investment platform, your insurance company. They all have your data. They all show it to you. Almost none of them let you take it out.

The GDPR, in Articles 15 and 20, is quite clear: you have the right to access your personal data and to receive it “in a structured, commonly used and machine-readable format.” On paper, everyone complies. Technically, your data is there, visible to you.

But the difference between “you can see your data” and “you can take your data” is enormous. It’s like your bank telling you that you can check your balance, but not withdraw your money.

And the irony: the very mechanisms designed to protect you — anti-bot systems like Cloudflare, Akamai, or DataDome — are the ones preventing you from exercising your right. They don’t distinguish between an attacker trying to steal someone else’s data and you trying to download your own.

If the only practical way to export your data in bulk is to automate the process, and automation is blocked by design, your right to data portability is theoretical.

The market’s “solution” and why it doesn’t work

There are companies that have built a business on this problem. Plaid, Tink, TrueLayer and others offer financial data aggregation. You give them your credentials, they log in on your behalf from their servers and extract your data.

PSD2 gives them legal cover to access payment accounts — current accounts — through the APIs that banks are required to provide. So far, so good. The problem is that PSD2 only covers payment accounts. Your credit card, your investments, your pension plan, your supermarket receipts, your electricity bills — all of that falls outside the scope of the directive.

So what do these aggregators do to access what PSD2 doesn’t cover? Exactly what PSD2 was meant to eliminate: custodying your credentials and scraping from their servers. The same practice that motivated the regulation, but applied to everything the regulation doesn’t reach. With their certifications, their audits, and their regulatory compliance on the regulated side — and with screen scraping for everything else.

The constant fight against anti-bot systems from cloud servers makes reliability low and cost high. And meanwhile, you’re giving your credentials to someone.

I thought about it the other way around. If you’re already logged in to your browser, if your IP is already trusted, if you’ve already passed the MFA and the captcha, the anti-bot wall doesn’t exist. There’s nothing to fight. Your browser is already inside.

That’s the premise of Habeas: a browser extension that runs inside your own authenticated session. You log in. You pass the 2FA. The extension, already inside, reads the data you can already see and exports it to the format and destination of your choice — a local folder, Google Drive, or an HTTP endpoint to feed another application.

No intermediary servers. No stored credentials. No fighting Cloudflare.

Data, not code

Habeas is not a scraper. A scraper accesses other people’s services, from other people’s IPs, with stolen or surrendered credentials, to extract data at scale. Habeas accesses your account, from your browser, with your session, to extract your data. It’s the difference between a burglar forcing your lock and you using your own key to take your own things.

Could a provider consider this a violation of their terms of service? Probably. Many prohibit any “automated access,” even when it’s your account and your session.

But there’s a question nobody seems to ask: how enforceable is that clause when the provider offers no adequate means to exercise your rights? GDPR rights cannot be waived by contract. And Directive 93/13/EEC on unfair contract terms considers clauses that create a significant imbalance to the detriment of the consumer to be void. “You cannot automate access to your data, but we also give you no way to export it” is a pretty clear imbalance.

GDPR Article 20 requires data controllers to deliver data “in a structured, commonly used and machine-readable format.” If they don’t, it’s the controller who is in breach — not you for finding a way to exercise your right.

The “correct” path, of course, is a formal portability request and, if denied or made impractical, a complaint to the data protection authority. But that process takes months or years. Meanwhile, your data remains trapped. There is, as far as I know, no European court ruling that explicitly states “an anti-automation clause is void when the provider offers no adequate means of portability.” The legal argument is solid, but it hasn’t been tested in court.

Habeas deliberately positions itself in that gap. It doesn’t bypass authentication, it doesn’t impersonate identities, it doesn’t access other people’s data. And the clause that prohibits what it does sits uneasily with the GDPR.

Each service that Habeas can extract data from is defined as a declarative YAML adapter. No arbitrary JavaScript — just the definition of which endpoints to call, how to map the fields, and which output schema to use.

This isn’t a technical whim: it’s a security decision. Manifest V3 prohibits remote code execution, and the declarative format means anyone can audit exactly what data an adapter collects and from where.

If a service requires logic that the format cannot express, the format is extended for everyone — no ad hoc code is allowed.

The uncomfortable part

I won’t pretend that Habeas is a clean solution to a clean problem. The underlying issue is that the digital ecosystem has been built on an asymmetry: companies can collect, process, and monetize your data with industrial tools, but you can’t even download it without fighting an anti-bot wall.

The GDPR tried to fix that. In theory. In practice, the industry has found a thousand ways to comply with the letter while violating the spirit.

And we have accepted that “your data is in your account” means the same as “your data is yours.” It doesn’t. Not as long as you can’t take it out.

Habeas is a small, imperfect tool in a very early stage. Today it only has one adapter (Carrefour Spain). But the architecture is designed so that adding new services is a matter of writing a YAML file, not maintaining a fragile scraper.

The name comes from habeas data — the constitutional right to access your own personal data held by third parties. Habeas doesn’t grant you a new right: it makes an existing one executable.

If you’re interested in the problem — or if you simply want to download your own shopping receipts without five hundred clicks — the code is on GitHub and the project site at habeas.dev.

The right you have but cannot use

The market’s “solution” and why it doesn’t work

Data, not code

The uncomfortable part

Leave a Comment Cancel Reply