As the May 25th deadline looms, we've had lots and lots of questions about how GDPR (the EU's new General Data Protection Regulation legislation) applies to Matrix and to folks running Matrix servers - and so we've written this blog post to try to spell out what we're doing as part of maintaining the Matrix.org server (and bridges and hosted integrations etc), in case it helps folks running their own servers.
The main controversial point is how to handle Article 17 of the GDPR: ‘Right to Erasure' (aka Right to be Forgotten). The question is particularly interesting for Matrix, because as a relatively new protocol with somewhat distinctive semantics it's not always clear how the rules apply - and there's no case law to seek inspiration from.
The key question boils down to whether Matrix should be considered more like email (where people would be horrified if senders could erase their messages from your mail spool), or should it be considered more like Facebook (where people would be horrified if their posts were visible anywhere after they avail themselves of their right to erasure).
Solving this requires making a judgement call, which we've approached from two directions: firstly, considering what the spirit of the GDPR is actually trying to achieve (in terms of empowering users to control their data and have the right to be forgotten if they regret saying something in a public setting) - and secondly, considering the concrete legal obligations which exist.
The conclusion we've ended up with is to (obviously) prioritise that Matrix can support all the core concrete legal obligations that GDPR imposes on it - whilst also having a detailed plan for the full ‘spirit of the GDPR' where the legal obligations are ambiguous. The idea is to get as much of the longer term plan into place as soon as possible, but ensure that the core stuff is in place for May 25th.
Please note that we are still talking to GDPR lawyers, and we'd also very much appreciate feedback from the wider Matrix community - i.e. this plan is very much subject to change. We're sharing it now to ensure everyone sees where our understanding stands today.
The current todo list breaks down into the following categories. Most of these issues have matching github IDs, which we'll track in a progress dashboard.
This means that if a user invokes their right to erasure, we will need to ensure that their events will only ever be visible to users who already have a copy - and mustnever be served to new users or the general public. Meanwhile, data which is no longer accessible by any user must of course be deleted entirely.
In the email analogy: this is like saying that you cannot erase emails that you have sent other people; you cannot try to rewrite history as witnessed by others... but you can erase your emails from a public mail archive or search engine and stop them from being visible to anyone else.
It is important to note that GDPR Erasure is completely separate from the existing Matrix functionality of “redactions” which let users remove events from the room. A “redaction” today represents a request for the human-facing details of an event (message, join/leave, avatar change etc) to be removed. Technically, there is no way to enforce a redaction over federation, but there is a “gentlemen's agreement” that this request will be honoured.The alternative to the ‘email-analogue' approach would have been to facilitate users' automatically applying the existing redact function toall of the events they have ever submitted to a public room. The problem here is that defining a ‘public room' is subtle, especially to uninformed users: for instance, if a message was sent in a private room (and so didn't get erased), what happens if that room is later made public? Conversely, if right-to-erasure removed messages fromall rooms, it will end up destroying the history integrity of 1:1 conversations, which pretty much everyone agrees is abhorrent. Hence our conclusion to protect erased users from being visible to the general public (or anyone who comes snooping around after the fact) - but preserving their history from the perspective of the people they were talking to at the time.
In practice, our core to-do list for Right to Erasure is:
In practice, this could be quite serious: imagine that you join a public chatroom for some sensitive subject (e.g. #hiv:example.com) and then later on decide that you want to erase yourself from the room. It would be very undesirable if any new homeserver joining that room received a copy of the DAG showing that your MXID had sent thousands of events into the room - especially if your MXID was clearly identifying (i.e. your real name).
Mitigating this is a hard problem, as MXIDs are baked into the DAG for a room in many places - not least to identify which servers are participating in a room. The problem is made even worse by the fact that in Matrix, server hostnames themselves are often personally identifying (for one-person homeservers sitting on a personal domain).
We've spent quite a lot time reasoning through how to fix this situation, and a full technical spec proposal for removing MXIDs from events can be found at https://docs.google.com/document/d/1ni4LnC_vafX4h4K4sYNpmccS7QeHEFpAcYcbLS-J21Q. The high level proposal is to switch to giving each user a different ID in the form of a cryptographic public key for every room it participates in, and maintaining a mapping of today's MXIDs to these per-user-per-room keys. In the event of a GDPR erasure, these mappings can be discarded, pseudonymising the user and avoiding correlation across different rooms. We'd also switch to using cryptographic public keys as the identifiers for Rooms, Events and Users (for cross-room APIs like presence).
This is obviously a significant protocol change, and we're not going to do it lightly - we're still waiting for legal confirmation on whether we need it for May 25th (it may be covered as an intrinsic technical limitation of the system). However, the good news is that it paves the way towards many other desirable features: the ability to migrate accounts between homeservers; the ability to solve the problem of how to handle domain names being reused (or hijacked); the ability to decouple homeservers from DNS so that they can run clientside (for p2p matrix); etc. The chances are high that this proposal will land in the relatively near future (especially if mandated by GDPR), so input is very appreciated at this point!
In order to gather consent in a way that doesn't break all of the assorted Matrix clients connecting to matrix.org today, we have identified both an immediate- and a long-term approach.
The (immediate-term) todo list for gathering consent is:
Todo list for account deactivation:
In the medium term we would like to develop this as a core feature of Matrix (i.e. an API for exporting your logs and other data, or for that matter account portability between Matrix servers), but in the immediate term we'll be meeting our obligations by providing a manual service.
The immediate todo list for data portability is:
select * from events where user=?.
It's worth noting that we feel that GDPR is an excellent piece of legislation from the perspective of forcing us to think more seriously about our privacy - it has forced us to re-prioritise all sorts of long-term deficiencies in Matrix (e.g. dependence on DNS; improving User Interactive authentication; improving logout semantics etc). There's obviously a lot of work to be done here, but hopefully it should all be worth it!