Introducing Matrix Specification Changes

Hi all,

We’ve been able to start investing more time in advancing the Matrix Specification itself over the last month or so thanks to Ben joining the core team (and should be able to accelerate even more with uhoreg joining in a few weeks!)  The first step in the new wave of work has been to provide much better infrastructure for the process of actually evolving the spec – whether that’s from changes proposed by the core team or the wider Matrix Community.

So, without further ado, we’d like to introduce https://matrix.org/docs/spec/proposals – a dashboard for all the spec change proposals we’ve accumulated so far (ignoring most of the ones which have already been merged), as well as a clearer workflow for how everyone can help improve the Matrix spec itself.  Part of this is introducing a formal numbering system – e.g. MSC1228 stands for Matrix Spec Change 1228 (where 1228 is the ID of the Github issue on the github.com/matrix-org/matrix-doc/issues repository that tracks the proposal).

Please note that these are *NOT* like XEPs or RFCs – i.e. optional proposals or add-ons to the protocol; instead they are literally proposals for changes to the Matrix Spec itself.  Once merged into the spec, they are only of historical interest.

We’ve also created a new room: #matrix-spec:matrix.org to discuss specific spec proposal changes – please join if you want to track how proposals are evolving! (Conversation is likely to fork off into per-proposal rooms or overflow into #matrix-dev:matrix.org or #matrix-architecture:matrix.org depending on traffic levels, however).

Feedback would be much appreciated on this – so please head over to #matrix-spec:matrix.org and let us know how it feels and how it could do better.

This is also a major step towards properly formalising Matrix.org’s governance model – hopefully the changes above are sufficient to improve the health of the evolution of the Spec as we work towards an initial stable release later this year, and then you should expect to see a spec proposal for formal governance once we’ve (at last!) exited beta :)

Huge thanks to Ben for putting this together, and thanks to everyone who’s contributed so far to the spec – we’re looking forward to working through the backlog of proposals and turning them all into merged spec PRs!!

Matthew

GDPR Compliance in Matrix

Hi all,

As the May 25th deadline looms, we’ve had lots and lots of questions about how GDPR (the EU’s new General Data Protection Regulation legislation) applies to Matrix and to folks running Matrix servers – and so we’ve written this blog post to try to spell out what we’re doing as part of maintaining the Matrix.org server (and bridges and hosted integrations etc), in case it helps folks running their own servers.

The main controversial point is how to handle Article 17 of the GDPR: ‘Right to Erasure’ (aka Right to be Forgotten).  The question is particularly interesting for Matrix, because as a relatively new protocol with somewhat distinctive semantics it’s not always clear how the rules apply – and there’s no case law to seek inspiration from.

The key question boils down to whether Matrix should be considered more like email (where people would be horrified if senders could erase their messages from your mail spool), or should it be considered more like Facebook (where people would be horrified if their posts were visible anywhere after they avail themselves of their right to erasure).

Solving this requires making a judgement call, which we’ve approached from two directions: firstly, considering what the spirit of the GDPR is actually trying to achieve (in terms of empowering users to control their data and have the right to be forgotten if they regret saying something in a public setting) – and secondly, considering the concrete legal obligations which exist.  

The conclusion we’ve ended up with is to (obviously) prioritise that Matrix can support all the core concrete legal obligations that GDPR imposes on it – whilst also having a detailed plan for the full ‘spirit of the GDPR’ where the legal obligations are ambiguous.  The idea is to get as much of the longer term plan into place as soon as possible, but ensure that the core stuff is in place for May 25th.

Please note that we are still talking to GDPR lawyers, and we’d also very much appreciate feedback from the wider Matrix community – i.e. this plan is very much subject to change.  We’re sharing it now to ensure everyone sees where our understanding stands today.

The current todo list breaks down into the following categories. Most of these issues have matching github IDs, which we’ll track in a progress dashboard.

Right to Erasure

We’re opting to follow the email model, where the act of sending an event (i.e. message) into a room shares a copy of that message to everyone who is currently in that room.  This means that in the privacy policy (see Consent below) users will have to consent to agreeing that a copy of their messages will be transferred to whoever they are addressing.  This is also the model followed by IM systems such as WhatsApp, Twitter DMs or (almost) Facebook Messenger.

This means that if a user invokes their right to erasure, we will need to ensure that their events will only ever be visible to users who already have a copy – and must never be served to new users or the general public.  Meanwhile, data which is no longer accessible by any user must of course be deleted entirely.

In the email analogy: this is like saying that you cannot erase emails that you have sent other people; you cannot try to rewrite history as witnessed by others… but you can erase your emails from a public mail archive or search engine and stop them from being visible to anyone else.

It is important to note that GDPR Erasure is completely separate from the existing Matrix functionality of “redactions” which let users remove events from the room. A “redaction” today represents a request for the human-facing details of an event (message, join/leave, avatar change etc) to be removed.  Technically, there is no way to enforce a redaction over federation, but there is a “gentlemen’s agreement” that this request will be honoured.

The alternative to the ‘email-analogue’ approach would have been to facilitate users’ automatically applying the existing redact function to all of the events they have ever submitted to a public room. The problem here is that defining a ‘public room’ is subtle, especially to uninformed users: for instance, if a message was sent in a private room (and so didn’t get erased), what happens if that room is later made public? Conversely, if right-to-erasure removed messages from all rooms, it will end up destroying the history integrity of 1:1 conversations, which pretty much everyone agrees is abhorrent.  Hence our conclusion to protect erased users from being visible to the general public (or anyone who comes snooping around after the fact) – but preserving their history from the perspective of the people they were talking to at the time.

In practice, our core to-do list for Right to Erasure is:

  • As a first cut,  provide Article 17 right-to-erasure at a per-account granularity.  The simplest UX for this will be an option when calling the account deactivation API to request erasure as well as deactivation.  There will be a 30 day grace period, and (ideally) a 2FA confirmation (if available) to avoid the feature being abused.
  • Homeservers must delete events that nobody has access to any more (i.e. if all the users in a room have GDPR-erased themselves).  If users have deactivated their accounts without GDPR-erasure, then the data will persist in case they reactivate in future.
  • Homeservers must delete media that nobody has access to any more.  This is hard, as media is referenced by mxc:// URLs which may be shared across multiple events (e.g. stickers or forwarded events, including E2E encrypted events), and moreover mxc:// URLs aren’t currently authorized.  As a first cut, we track which user uploaded the mxc:// content, and if they erase themselves then the content will also be erased.
  • Homeservers must not serve up unredacted events over federation to users who were not in the room at the time.  This poses some interesting problems in terms of the privacy implications of sharing MXIDs of erased users over federation – see “GDPR erasure of MXIDs” below.
  • Matrix must specify a way of informing both servers and clients (especially bots and bridges) of GDPR erasures (as distinct from redactions), so that they can apply the appropriate erasure semantics.

GDPR erasure of Matrix IDs

One interesting edge case that comes out of GDPR erasure is that we need a way to stop GDPR-erased events from leaking out over federation – when in practice they are cryptographically signed into the event Directed Acyclic Graph (DAG) of a given room.  Today, we can remove the message contents (and preserve the integrity of the room’s DAG) via redaction – but this still leaves personally identifying information in the form of the Matrix IDs (MXIDs) of the sender of these events.

In practice, this could be quite serious: imagine that you join a public chatroom for some sensitive subject (e.g. #hiv:example.com) and then later on decide that you want to erase yourself from the room.  It would be very undesirable if any new homeserver joining that room received a copy of the DAG showing that your MXID had sent thousands of events into the room – especially if your MXID was clearly identifying (i.e. your real name).

Mitigating this is a hard problem, as MXIDs are baked into the DAG for a room in many places – not least to identify which servers are participating in a room.  The problem is made even worse by the fact that in Matrix, server hostnames themselves are often personally identifying (for one-person homeservers sitting on a personal domain).

We’ve spent quite a lot time reasoning through how to fix this situation, and a full technical spec proposal for removing MXIDs from events can be found at https://docs.google.com/document/d/1ni4LnC_vafX4h4K4sYNpmccS7QeHEFpAcYcbLS-J21Q.  The high level proposal is to switch to giving each user a different ID in the form of a cryptographic public key for every room it participates in, and maintaining a mapping of today’s MXIDs to these per-user-per-room keys.  In the event of a GDPR erasure, these mappings can be discarded, pseudonymising the user and avoiding correlation across different rooms. We’d also switch to using cryptographic public keys as the identifiers for Rooms, Events and Users (for cross-room APIs like presence).

This is obviously a significant protocol change, and we’re not going to do it lightly – we’re still waiting for legal confirmation on whether we need it for May 25th (it may be covered as an intrinsic technical limitation of the system).  However, the good news is that it paves the way towards many other desirable features: the ability to migrate accounts between homeservers; the ability to solve the problem of how to handle domain names being reused (or hijacked); the ability to decouple homeservers from DNS so that they can run clientside (for p2p matrix); etc.  The chances are high that this proposal will land in the relatively near future (especially if mandated by GDPR), so input is very appreciated at this point!

Consent

GDPR describes six lawful bases for processing personal data.  For those running Matrix servers, it seems the best route to compliance is the most explicit and active one: consent.

Consent requires that our users are fully informed as to exactly how their data will be used, where it will be stored, and (in our case) the specific caveats associated with a decentralised, federated communication system. They are then asked to provide their explicit approval before using (or continuing to use) the service.

In order to gather consent in a way that doesn’t break all of the assorted Matrix clients connecting to matrix.org today, we have identified both an immediate- and a long-term approach.

The (immediate-term) todo list for gathering consent is:

  • Modify Synapse to serve up a simple ‘consent tool’ static webapp to display the privacy notice/terms and conditions and gather consent to this API.
    • Add a ‘consent API’ to the CS API which lets a server track whether a given user has consented to the server’s privacy policy or not.
  • Send emails and push notifications to advise users of the upcoming change (and link through to the consent tool)
  • Develop a bot that automatically connects to all users (new and existing), posting a link to the consent tool.  This bot can also be used in the future as a general ‘server notice channel’ for letting server admins inform users of privacy policy changes; planned downtime; security notices etc.
  • Modify synapse to reject message send requests for all users who have not yet provided consent
    • return a useful error message which contains a link to the consent tool
  • Making our anonymised user analytics for Riot.im ‘opt in’ rather than ‘opt out’ – this isn’t a requirement of GDPR (since our analytics are fully anonymised) but reflects our commitment to user data sovereignty

Long-term:

  • Add a User Interactive Auth flow for the /register API to gather consent at register
  • As an alternative to the bot:
    • Fix user authentication in general to distinguish between ‘need to reauthorize without destroying user data’ and ‘destroy user data and login again’, so we can use the re-authorize API to gather consent via /login without destroying user data on the client.
    • port the /login API to use User Interactive Auth and also use it to gather consent for existing users when logging in

Deactivation

Account deactivation (the ability to terminate your account on your homeserver) intersects with GDPR in a number of places.

Todo list for account deactivation:

  • Remove deactivated users from all rooms – this finally solves the problem where deactivated users leave zombie users around on bridged networks.
  • Remove deactivated users from the homeserver’s user directory
  • Remove all 3PID bindings associated with a deactivated user from the identity servers
  • Improve the account deactivation UX to make sure users understand the full consequences of account deactivation

Portability

GDPR states that users have a right to extract their data in a structured, commonly used and machine-readable format.

In the medium term we would like to develop this as a core feature of Matrix (i.e. an API for exporting your logs and other data, or for that matter account portability between Matrix servers), but in the immediate term we’ll be meeting our obligations by providing a manual service.

The immediate todo list for data portability is:

  • Expose a simple interface for people to request their data
  • Implement the necessary tooling to provide full message logs (as a csv) upon request.  As a first cut this would be the result of manually running something like select * from events where user=?.

Other

GDPR mandates rules for all the personal data stored by a business, so there are some broader areas to bear in mind which aren’t really Matrix specific, including:

  • Making a clear statement as to how data is processed if you apply for a job
  • Ensuring you are seeking appropriate consent for cookies
  • Making sure all the appropriate documentation, processes and training materials are in place to meet GDPR obligations.

Conclusion

So, there you have it.  We’ll be tracking progress in github issues and an associated dashboard over the coming weeks; for now https://github.com/matrix-org/synapse/issues/1941 (for Right to Erasure) or https://github.com/vector-im/riot-meta/issues/149 (GDPR in general) is as good as place as any to gather feedback.  Alternatively, feel free to comment on the original text of this blog post: https://docs.google.com/document/d/1JTEI6RENnOlnCwcU2hwpg3P6LmTWuNS9S-ZYDdjqgzA.

It’s worth noting that we feel that GDPR is an excellent piece of legislation from the perspective of forcing us to think more seriously about our privacy – it has forced us to re-prioritise all sorts of long-term deficiencies in Matrix (e.g. dependence on DNS; improving User Interactive authentication; improving logout semantics etc).  There’s obviously a lot of work to be done here, but hopefully it should all be worth it!

 

SECURITY UPDATE: Synapse 0.28.1

Hi all,

Many people will have noticed disruption in #matrix:matrix.org and #matrix-dev:matrix.org on Sunday, when a validation bug in Synapse was exploited which allowed a malicious event to be inserted into the room with ‘depth’ value that made the rooms temporarily unusable. Whilst a transient workaround was found at the time (thanks to /dev/ponies, kythyria and Po Shamil for the workaround and to Half-Shot for working on a proposed fix), we’re doing an urgent release of Synapse 0.28.1 to provide a temporary solution which will mitigate the attack across all rooms in upgraded servers and un-break affected ones.  Meanwhile we have a full long-term fix on the horizon (hopefully) later this week.

This vulnerability has already been exploited in the wild; please upgrade as soon as possible.

Synapse 0.28.1 is available from https://github.com/matrix-org/synapse/releases/tag/v0.28.1 as normal.

The ‘depth’ parameter is used primarily as a way for servers to signal the intended cosmetic order of their events within a room (particularly when the room’s message graph has gaps in it due to the server being offline, or due to users backfilling old disconnected chunks of conversation). This means that affected rooms may experience message ordering problems until a full long-term fix is provided, which we’re working on currently (and tentatively involves no longer trusting ‘depth’ information from servers).  For full details you can see the proposal documents for the temporary fix in 0.28.1 and the options for the imminent long-term fix.

We’d like to acknowledge jzk for identifying the vulnerability, and Max Dor for providing feedback on the fixes.

As a general reminder, Synapse is still beta (as is the Matrix spec) and the federation API particularly is still being debugged and refined and is pre-r0.0.0. For the benefit of the whole community, please disclose vulnerabilities and exploits responsibly by emailing [email protected] or DMing someone from +matrix:matrix.org. Thanks.

Changes in synapse v0.28.1 (2018-05-01)

SECURITY UPDATE

  • Clamp the allowed values of event depth received over federation to be [0, 2^63 – 1]. This mitigates an attack where malicious events injected with depth = 2^63 – 1 render rooms unusable. Depth is used to determine the cosmetic ordering of events within a room, and so the ordering of events in such a room will default to using stream_ordering rather than depth (topological_ordering). This is a temporary solution to mitigate abuse in the wild, whilst a long solution is being implemented to improve how the depth parameter is used.Full details at https://docs.google.com/document/d/1I3fi2S-XnpO45qrpCsowZv8P8dHcNZ4fsBsbOW7KABI
  • Pin Twisted to <18.4 until we stop using the private _OpenSSLECCurve API.

Matrix and Riot confirmed as the basis for France’s Secure Instant Messenger app

Hi folks,

We’re incredibly excited that the Government of France has confirmed it is in the process of deploying a huge private federation of Matrix homeservers spanning the whole government, and developing a fork of Riot.im for use as their official secure communications client! The goal is to replace usage of WhatsApp or Telegram for official purposes.

It’s a unbelievably wonderful situation that we’re living in a world where governments genuinely care about openness, open source and open-standard based communications – and Matrix’s decentralisation and end-to-end encryption is a perfect fit for intra- and inter-governmental communication.  Congratulations to France for going decentralised and supporting FOSS! We understand the whole project is going to be released entirely open source (other than the operational bits) – development is well under way and an early proof of concept is already circulating within various government entities.

I’m sure there will be more details from their side as the project progresses, but meanwhile here’s the official press release, and an English translation too. We expect this will drive a lot of effort into maturing Synapse/Dendrite, E2E encryption and matrix-{react,ios,android}-sdk, which is great news for the whole Matrix ecosystem! The deployment is going to be speaking pure Matrix and should be fully compatible with other Matrix clients and projects in addition to the official client.

So: exciting times for Matrix.  Needless to say, if you work on Open Government projects in other countries, please get in touch – we’re seeing that Matrix really is a sweet spot for these sort of use cases and we’d love to help get other deployments up and running.  We’re also hoping it’s going to help iron out many of the UX kinks we have in Riot.im today as we merge stuff back. We’d like to thank DINSIC (the Department responsible for the project) for choosing Matrix, and can’t wait to see how the project progresses!

English Translation:

The French State creates its own secure instant messenger

By the summer of 2018, the French State will have its own instant messenger, an alternative to WhatsApp and Telegram.

It will guarantee secure, end-to-end encrypted conversations without degradation of the user experience. It will be compatible with any mobile device or desktop, state or personal. In fact until now the installation of applications like WhatsApp or Telegram was not possible on professional mobile phones, which hindered easy sharing of information and documents.

Led by the Interministerial Department of State Digital, Information and Communication Systems (DINSIC), the project is receiving contributions from the National Agency for Information System Security (ANSSI), the IT Directorship (DSI) of the Armed Forces and the Ministry of Europe and Foreign Affairs.

The tool developed is based on open source software (Riot) that implements an open standard (Matrix). Powered by a Franco-British startup (New Vector), and benefiting from many contributions, this communication standard has already caught the attention of other states such as the Netherlands and Canada, with whom DINSIC collaborates closely.

The Matrix standard and its open source software are also used by private companies such as Thales, which has driven the teams to come together to ensure the interoperability of their tools and cooperate in the development of free and open source software.

After 3 months of development for a very limited cost, this tool is currently being tested in the State Secretary for Digital, DINSIC and in the IT departments of different ministries. It should be rolled out during the summer in administrations and cabinets.

“With this new French solution, the state is demonstrating its ability to work in an agile manner to meet concrete needs by using open source tools and very low development costs. Sharing information in a secure way is essential not only for companies but also for a more fluid dialogue within administrations.” – Mounir Mahjoubi, Secretary of State to the Prime Minister, in charge of Digital.

Urgent Synapse 0.26.1 hotfix out

Hi all,

We just rushed out an urgent hotfix release for Synapse 0.26.1, addressing a nasty bug in the ujson library which causes it to misbehave badly in the presence of JSON containing very large >64-bit integers.  Anyone whose synapses are currently filling up with “Value too big!” errors will want to upgrade immediately from https://github.com/matrix-org/synapse/releases/tag/v0.26.1.

Sorry for the inconvenience.

Changes in synapse v0.26.1 (2018-03-15)

Bug fixes:

  • Fix bug where an invalid event caused server to stop functioning correctly, due to parsing and serializing bugs in ujson library.