GDPR Compliance in Matrix
- As a first cut, provide Article 17 right-to-erasure at a per-account granularity. The simplest UX for this will be an option when calling the account deactivation API to request erasure as well as deactivation. There will be a 30 day grace period, and (ideally) a 2FA confirmation (if available) to avoid the feature being abused.
- Homeservers must delete events that nobody has access to any more (i.e. if all the users in a room have GDPR-erased themselves). If users have deactivated their accounts without GDPR-erasure, then the data will persist in case they reactivate in future.
- Homeservers must delete media that nobody has access to any more. This is hard, as media is referenced by mxc:// URLs which may be shared across multiple events (e.g. stickers or forwarded events, including E2E encrypted events), and moreover mxc:// URLs aren't currently authorized. As a first cut, we track which user uploaded the mxc:// content, and if they erase themselves then the content will also be erased.
- Homeservers must not serve up unredacted events over federation to users who were not in the room at the time. This poses some interesting problems in terms of the privacy implications of sharing MXIDs of erased users over federation - see “GDPR erasure of MXIDs” below.
- Matrix must specify a way of informing both servers and clients (especially bots and bridges) of GDPR erasures (as distinct from redactions), so that they can apply the appropriate erasure semantics.
GDPR erasure of Matrix IDsOne interesting edge case that comes out of GDPR erasure is that we need a way to stop GDPR-erased events from leaking out over federation - when in practice they are cryptographically signed into the event Directed Acyclic Graph (DAG) of a given room. Today, we can remove the message contents (and preserve the integrity of the room's DAG) via redaction - but this still leaves personally identifying information in the form of the Matrix IDs (MXIDs) of the sender of these events.In practice, this could be quite serious: imagine that you join a public chatroom for some sensitive subject (e.g. #hiv:example.com) and then later on decide that you want to erase yourself from the room. It would be very undesirable if any new homeserver joining that room received a copy of the DAG showing that your MXID had sent thousands of events into the room - especially if your MXID was clearly identifying (i.e. your real name).Mitigating this is a hard problem, as MXIDs are baked into the DAG for a room in many places - not least to identify which servers are participating in a room. The problem is made even worse by the fact that in Matrix, server hostnames themselves are often personally identifying (for one-person homeservers sitting on a personal domain).We've spent quite a lot time reasoning through how to fix this situation, and a full technical spec proposal for removing MXIDs from events can be found at https://docs.google.com/document/d/1ni4LnC_vafX4h4K4sYNpmccS7QeHEFpAcYcbLS-J21Q. The high level proposal is to switch to giving each user a different ID in the form of a cryptographic public key for every room it participates in, and maintaining a mapping of today's MXIDs to these per-user-per-room keys. In the event of a GDPR erasure, these mappings can be discarded, pseudonymising the user and avoiding correlation across different rooms. We'd also switch to using cryptographic public keys as the identifiers for Rooms, Events and Users (for cross-room APIs like presence).This is obviously a significant protocol change, and we're not going to do it lightly - we're still waiting for legal confirmation on whether we need it for May 25th (it may be covered as an intrinsic technical limitation of the system). However, the good news is that it paves the way towards many other desirable features: the ability to migrate accounts between homeservers; the ability to solve the problem of how to handle domain names being reused (or hijacked); the ability to decouple homeservers from DNS so that they can run clientside (for p2p matrix); etc. The chances are high that this proposal will land in the relatively near future (especially if mandated by GDPR), so input is very appreciated at this point!
ConsentGDPR describes six lawful bases for processing personal data. For those running Matrix servers, it seems the best route to compliance is the most explicit and active one: consent.
Consent requires that our users are fully informed as to exactly how their data will be used, where it will be stored, and (in our case) the specific caveats associated with a decentralised, federated communication system. They are then asked to provide their explicit approval before using (or continuing to use) the service.In order to gather consent in a way that doesn't break all of the assorted Matrix clients connecting to matrix.org today, we have identified both an immediate- and a long-term approach.The (immediate-term) todo list for gathering consent is:
- Modify Synapse to serve up a simple ‘consent tool' static webapp to display the privacy notice/terms and conditions and gather consent to this API.
- Send emails and push notifications to advise users of the upcoming change (and link through to the consent tool)
- Modify synapse to reject message send requests for all users who have not yet provided consent
- return a useful error message which contains a link to the consent tool
- Making our anonymised user analytics for Riot.im ‘opt in' rather than ‘opt out' - this isn't a requirement of GDPR (since our analytics are fully anonymised) but reflects our commitment to user data sovereignty
- Add a User Interactive Auth flow for the /register API to gather consent at register
- As an alternative to the bot:
- Fix user authentication in general to distinguish between ‘need to reauthorize without destroying user data' and ‘destroy user data and login again', so we can use the re-authorize API to gather consent via /login without destroying user data on the client.
- port the /login API to use User Interactive Auth and also use it to gather consent for existing users when logging in
DeactivationAccount deactivation (the ability to terminate your account on your homeserver) intersects with GDPR in a number of places.Todo list for account deactivation:
- Remove deactivated users from all rooms - this finally solves the problem where deactivated users leave zombie users around on bridged networks.
- Remove deactivated users from the homeserver's user directory
- Remove all 3PID bindings associated with a deactivated user from the identity servers
- Improve the account deactivation UX to make sure users understand the full consequences of account deactivation
PortabilityGDPR states that users have a right to extract their data in a structured, commonly used and machine-readable format.In the medium term we would like to develop this as a core feature of Matrix (i.e. an API for exporting your logs and other data, or for that matter account portability between Matrix servers), but in the immediate term we'll be meeting our obligations by providing a manual service.The immediate todo list for data portability is:
- Expose a simple interface for people to request their data
- Implement the necessary tooling to provide full message logs (as a csv) upon request. As a first cut this would be the result of manually running something like
select * from events where user=?.
OtherGDPR mandates rules for all the personal data stored by a business, so there are some broader areas to bear in mind which aren't really Matrix specific, including:
- Making a clear statement as to how data is processed if you apply for a job
- Ensuring you are seeking appropriate consent for cookies
- Making sure all the appropriate documentation, processes and training materials are in place to meet GDPR obligations.