Matthew Hodgson

157 posts tagged with "Matthew Hodgson" (See all Author)

Upgrade now to address E2EE vulnerabilities in matrix-js-sdk, matrix-ios-sdk and matrix-android-sdk2

28.09.2022 17:41 — Security Matthew Hodgson

TL;DR:

  • Two critical severity vulnerabilities in end-to-end encryption were found in the SDKs which power Element, Beeper, Cinny, SchildiChat, Circuli, Synod.im and any other clients based on matrix-js-sdk, matrix-ios-sdk or matrix-android-sdk2.
  • These have now been fixed, and we have not seen evidence of them being exploited in the wild. All of the critical vulnerabilities require cooperation from a malicious homeserver to be exploited.
  • Please upgrade immediately in order to be protected against these vulnerabilities.
  • Clients with other encryption implementations (including Hydrogen, ElementX, Nheko, FluffyChat, Syphon, Timmy, Gomuks and Pantalaimon) are not affected; this is not a protocol bug.
  • We take the security of our end-to-end encryption extremely seriously, and we have an ongoing series of public independent audits booked to help guard against future vulnerabilities. We will also be making some protocol changes in the future to provide additional layers of protection.
  • This resolves the pre-disclosure issued on September 23rd.

Continue reading…

Announcing Third Room Tech Preview 1

27.09.2022 17:53 — Releases Matthew Hodgson

We're excited to announce the first tech preview of Third Room, an open, standards-based, decentralised vision of the metaverse for the open Web, built entirely on Matrix… without cryptocurrencies, NFTs or walled gardens.

To see what it's all about, head over to https://thirdroom.io/preview - or come chat in #thirdroom-dev:matrix.org to learn more!

The Matrix Summer Special 2022

15.08.2022 00:00 — General Matthew Hodgson

Hi all,

At the end of each year it’s been traditional to do a big review of everything that the Matrix core team got up to that year, and announcing our predictions for the next. You can see the last edition in 2021 here - and if you’re feeling nostalgic you can head down memory lane with the 2020, 2019, 2018 ones etc too.

This year is turning out to be slightly different, however. Our plans for 2022 are particularly ambitious: to force a step change in improving Matrix’s performance and usability so that we firmly transition from our historical “make it work” and “make it work right” phases into “making it fast”. Specifically: to succeed, Matrix has to succeed in powering apps which punch their weight in terms of performance and usability against the proprietary centralised alternatives of WhatsApp, Discord, Slack and friends.

We’ve seen an absolute tonne of work happening on this so far this year… and somehow the end results all seem to be taking concrete shape at roughly the same time, despite summer traditionally being the quietest point of the year. The progress is super exciting and we don’t want to wait until things are ready to enthuse about them, and so we thought it’d be fun to do a spontaneous Summer Special gala blog post so that everyone can follow along and see how things are going!

Making it fast

We have always focused on first making Matrix “work right” before we make it “work fast” - sometimes to a fault. After all: the longer you build on a given architecture the harder it becomes to swap it out down the line, and the core architecture of Matrix has remained essentially the same since we began in 2014 - frankly it’s amazing that the initial design has lasted for as long as it has.

Over the years we’ve done a lot of optimisation work on the core team implementations of that original architecture - whether that’s Synapse or matrix-{js,react,ios,android}-sdk and friends: for instance Synapse uses 5-10x less RAM than it used to (my personal federated server is only using 145MB of RAM atm! 🤯) and it continues to speed up in pretty much every new release (this PR looks to give a 1000x speedup on calculating push notification actions, for instance!). However, there are some places where Matrix’s architecture itself ends up being an embarrassingly slow bottleneck: most notably when rapidly syncing data to clients, and when joining rooms for the first time over federation. We’re addressing these as follows…

Sliding Sync (aka Sync v3)

Historically, /sync always assumed that the client would typically want to know about all the conversations its user is in - much as an IRC client or XMPP client is aware of all your current conversations. This provided some nice properties - such as automatically enabling perfect offline support, simplifying client and server development, and making features like “jump to room” and “tab complete” work instantly given the data is all client-side. In the early days of Matrix, when nobody was yet a power user, this wasn’t really a problem - but as users join more conversations and join bigger rooms, it’s become one of Matrix’s biggest performance bottlenecks. In practice, logging into a large account (~4000 rooms) can take ~10 minutes and hundreds of megabytes of network traffic, which is clearly ridiculous. Worse: if you go offline for a day or so, the incremental sync to catch back up can take minutes to calculate (and can even end up being worse than an initial sync).

To fix this, we started work on Sliding Sync (MSC3575) in 2021: a complete reimagining of the /sync API used by Matrix clients to receive data from their homeserver. In Sliding Sync, we only send the client the data it needs to render its UI. Most importantly, we only tell it about the subset of rooms which it is visible in the scroll window of its room list (or that it needs to display notifications about). As the user scrolls around the room list, they slide the window up and down - hence the name “sliding sync”. Sliding Sync was originally called Sync v3, given it’s our 3rd iteration of the sync API - it got renamed Sliding Sync given the current sync API confusingly ended up with a prefix of /v3.

Back in December our work on Sliding Sync was still pretty early: we had the initial MSC, an experimental proxy that converted the existing sync v2 API into Sliding Sync, and a simple proof-of-concept web client to exercise it. Since then, however, there has been spectacular progress:

  • MSC3575 has undergone some big iterations as we converge on the optimal API shape.
  • The sliding-sync proxy has matured to be something which we’re now running in stealth against matrix.org for those dogfooding the API
  • We added the concept of extensions to split out how to sync particular classes of data (to avoid the API becoming a monolithic monster) - specifically:
    • Account Data
    • End-to-end Encryption
    • To-device messages
    • Ephemeral events (to be done)
    • Presence (to be done)
  • We added support for spaces!
  • We implemented it in matrix-js-sdk (which merged a few weeks ago!)
  • …and have a WIP implementation in matrix-rust-sdk too.

But most importantly, we’ve also been busy implementing Sliding Sync in Element Web itself so we can start using it for real. Now, this is still a work in progress, but as of today it’s just getting to the point where one can experiment with it as a daily driver (although it’s definitely alpha and we’re still squishing bugs like crazy!) - and we can see just how well it really performs in real life.

For instance, here’s a video of my account (4055 rooms, redacted for privacy) logging in on an entirely fresh browser via Sliding Sync - the actual sync takes roughly 1 second (at 00:18 in the video). And if we’d started the sync operation while the user is setting up E2E encryption, it would have completed in the background before they even got to the main screen, giving instant login(!). Given my account typically takes ~10 minutes to initial sync (plus even more time for encryption to sync up), this is at least a real-life 600x improvement. Moreover, the sync response is only 20KB (a ~5000x improvement) - a huge win for low-bandwidth Matrix situations.

Then, having logged in, the client subsequently launches pretty much instantly, no matter how long you’ve been offline. Total launch time is roughly 4 seconds, most of which is loading the app’s assets - which in turn could well be improved by progressively loading the app. It could also be sped up even more if we cached state locally - currently the implementation simply reloads from the server every time the app launches rather than maintaining a local cache.

As you can see, this is palpably coming together, but there’s still a bunch of work to be done before we can encourage folks to try it, including:

  • Switching the RoomList to be fully backed by sliding sync (currently the v2 roomlist is jury-rigged up to the sliding sync API, causing some flakey bugs such as duplicate rooms)
  • Spec and hook up typing / receipts / presence extensions
  • Hook up favourites, low_priority, knocked and historical rooms
  • Adding back in loading room members
  • Apply quality-of-service to to-device messages so we prioritise ones relevant to the current sliding window
  • Sync encrypted rooms in the background to search for notifications (and for indexing).
  • More local caching to speed up operations which now require checking the server (e.g. Ctrl/Cmd-K room switching)

We also need to determine whether it’s viable to run the sliding-sync proxy against matrix.org for general production use, or whether we’ll need native support in Synapse before we can turn it on by default for everyone. But these are good problems to have!!

matrix-rust-sdk and Element X

Meanwhile, over in the land of Rust, we’ve been making huge progress in maturing and stabilising matrix-rust-sdk and exercising it in Element X: the codename for the next generation of native Element mobile apps. Most excitingly, we literally just got the first (very alpha) cut of Sliding Sync working in matrix-rust-sdk and hooked up to Element X on iOS - you can see Ștefan’s demo from last week here:

matrix-rust-sdk itself is now getting a steady stream of releases - including long-awaited official node bindings, providing excellent and performant encryption support via the newly audited vodozemac Rust implementation of Olm. It’s also great to see loads of major contributions to matrix-rust-sdk from across the wider Matrix community - particularly from Ruma, Fractal, Famedly and others - thank you!! As a result the SDK is shaping up to be much more healthy and heterogeneous than the original matrix-{js,ios,android}-sdk projects.

On Element X itself: matrix-rust-sdk is being used first on iOS in Element X iOS - aiming first for launching a stable “barbecue” feature set (i.e. personal messaging) asap, followed by adding on “banquet” features (i.e. team collaboration) such as spaces and threads afterwards. We’ve shamelessly misappropriated the barbecue & banquet terminology from Tobias Bernard’s excellent blog post “Banquets and Barbecues” - although, ironically, unlike the post, our plan is still to have a single app which incrementally discloses the banquet functionality as the user’s barbecue starts to sprawl. We’ve just published the brand new development roadmap for Element X from the rust-sdk perspective on GitHub. Above all else, the goal of Element X is to be the fastest mobile messenger out there in terms of launch and sync time, thanks to Sliding Sync. Not just for Matrix - but the fastest messenger, full stop :D Watch this space to see how we do!

Finally: Element is getting a major redesign of the core UI on both iOS and Android - both for today’s Element and Element X. I’m not going to spoil the final result (which is looking amazing) given it’ll have a proper glossy launch in a few weeks, but you can get a rough idea based on the earlier design previewed by Amsha back in June:

In addition to the upcoming overall redesign, Element also landed a complete rework of the login and registration flows last week on iOS and Android - you can see all about it over on the Element blog.

Fast Remote Joins

In terms of performance, the other area that we’re reworking at the protocol level is room joins.

One of the most glaring shortcomings of Matrix happens when a new server admin excitedly decides to join the network, installs a homeserver, tries to join a large room like #matrix:matrix.org, and then looks on in horror as it takes 10+ minutes to join the room, promptly despairs of Matrix being slow and complains bitterly about it all over HN and Reddit :)

The reason for the current behaviour is that the Matrix rooms are replicated between the servers who participate in them - and in the initial design of the protocol we made that replication atomic. In other words, a new server joining a room picks a server from which to acquire the room (typically the one in the room’s alias), and gets sent a copy of all the state events (i.e. structural data) about the room, as well as the last 20 or so messages. For a big room like Matrix HQ, this can be massive - right now, there are 79,969 state events in the room - and 126,510 auth_chain events (i.e. the events used to justify the existence of the state events). The reason there are so many is typically that the act of a user joining or parting the room is described by a state event - and in the naive implementation, the server needs to know all current state events in the room (e.g. including parted users), in order to keep in sync with the other servers in the room and faithfully authorise each new event which they receive for that room.

However, each event is typically around 500 bytes in size, and so the act of joining a big room could require generating, transmitting, receiving, authenticating and storing up to 100MB of JSON 😱. This is why joining big rooms for the first time is so painfully slow.

Happily, there is an answer: much as Sliding Sync lets clients synchronise with the bare minimum of data required to render their UI, we’ve created MSC3706 (and its precursor MSC2775) in order to rework the protocol to let servers receive the bare minimum of state needed to join a room in order to participate. Practically speaking, we only really care about events relevant to the users who are currently participating in the room; the 40,000 other lurkers can be incrementally synced in the background so that our membership list is accurate - but it shouldn’t block us from being able to join or read (or peek) the room. We already have membership lazyloading in the client-server API to support incrementally loaded membership data, after all.

The problem with this change is that Synapse was written from the outset to assume that each room’s state should be atomically complete: in other words, room state shouldn’t incrementally load in the background. So the work for Faster Joins has been an exercise in auditing the entirety of Synapse for this assumption, and generally reviewing and hardening the whole state management system. This has been loads of work that has been going on since the end of last year - but the end is in sight: you can see the remaining issues here.

As of right now, faster joins work (although aren’t enabled by default) - with the main proviso that you can’t speak in the room yet until the background sync has completed, and the new implementation has not yet been optimised. However, thanks to all the preparation work, this should be relatively straightforward, so the end is in sight on this one too.

In terms of performance: right now, joining Matrix HQ via the unoptimised implementation of faster joins completes on a fresh server in roughly 30 seconds - so a ~25x improvement over the ~12 minutes we’ve seen previously. However, the really exciting news is that this only requires synchronising 45 state events and 107 auth_chain events to the new server - a ~1400x improvement! So there should be significant scope for further optimising the calculation of these 152 events, given 30 seconds to come up with 152 events is definitely on the high side. In our ideal world, we’d be getting joins down to sub-second performance, no matter how big the room is - once again, watch this space to see how we do.

Finally, alongside faster remote joins, we’re also working on faster local joins. This work overlaps a bit with the optimisation needed to speed up the faster remote join logic - given we are seeing relatively simple operations unexpectedly taking tens of seconds in both instances. Some of this is needing to batch database activity more intelligently, but we also have some unknown pauses which we’re currently tracking down. Profiling is afoot, as well as copious Jaeger and OpenTracing instrumentation - the hunt is on!

Ratcheting up testing

All the work above describes some pretty bold changes to speed up Matrix and improve usability - but in order to land these changes with confidence, avoiding regressions both now and in future, we have really levelled up our testing this year.

Looking at matrix-react-sdk as used by Element Web/Desktop: all PRs made to matrix-js-sdk must now pass 80% unit test coverage for new code (measured using Sonarqube, enforced as a GitHub PR check). All matrix-react-sdk PRs must be accompanied by a mix of unit tests, end-to-end tests (via Cypress) and screenshot tests (via percy.io). All regressions (in both nightly and stable) are retro’d to ensure fixed things stay fixed (usually via writing new tests), and we have converted fully to typescript for full type safety.

Concretely, since May, we’ve increased js-sdk unit test coverage by ~10% globally, increased react-sdk coverage by ~17%, and added ever more Cypress integration tests to cover the broad strokes. Cypress now completely replaces our old Puppeteer-based end-to-end tests, and Sliding Sync work in matrix-react-sdk is being extensively tested by Cypress from the outset (the Sliding Sync PR literally comes with a Cypress test suite).

In mobile land, the situation is more complex given our long-term strategy is to deprecate matrix-ios-sdk and matrix-android-sdk2 in favour of matrix-rust-sdk. matrix-rust-sdk has always had excellent coverage, and in particular, adopting the crypto module in the current matrix-{js,ios,android}-sdk will represent a night and day improvement for quality (not to mention perf!). We’ll also be adopting PR checks, and screenshot testing for the mobile SDKs.

On the backend, we continue to build out test cases for our new integration tester Complement (in Golang), alongside the original sytest integration test suite (in Perl). In particular, we can now test Synapse in worker mode. The intention with Complement is that it should be homeserver agnostic so that any homeserver implementation can benefit. Indeed the project was initiated by Kegan wearing his Dendrite hat.

Finally, we’ve had a huge breakthrough with true multi-client end-to-end testing in the form of Michael Kaye’s brand new Traffic Light project. For the first time, we can fully test things like cross signing and verification and VoIP calls end-to-end across completely different platforms and different clients. It’s early days yet, but this really will be a game changer, especially for crypto and VoIP.

Next up, we will turn our attention to a performance testing framework so that we can reliably track performance improvements and regressions in an automated fashion - heavily inspired by Safari’s Page Load Test approach. This will be essential as we build out new clients like Element X.

A whole new world

All the stuff above is focused on improving the core performance and usability of Matrix - but in parallel we have also been making enormous progress on entirely new features and capabilities. The following isn’t a comprehensive list, but we wanted to highlight a few of the areas where new development is progressing at a terrifying rate…

Native VoIP Conferencing

2022 is turning out to be the year that Matrix finally gets fully native voice/video conferencing. After speccing MSC3401 at the end of last year, Element Call Beta 1 launched as a reference implementation back in March, followed by enabling E2EE, spatial audio and walkie-talkie mode in Element Call Beta 2 in June.

However, the catch was that Element Call beta 1 and 2 only ever implemented “full mesh” conferencing - where every participant calls every other participant simultaneously, limiting the size of the conference to ~7 participants on typical hardware, and wasting lots of bandwidth (given you end up sending the same upstream multiple times over for all the other participants). Element Call has been working surprisingly well in spite of this, but the design of MSC3401 was always to have “foci” (the plural of ‘focus’ - i.e. conference servers) to optionally sit alongside homeservers in order to aggregate the participating calls, a bit like this:

MSC3401 Architecture

With foci, clients only need to send their upstream to their local focus, rather than replicating it across all the other participants - and the focus can then fan it out to other foci or clients as required. In fact, if no other clients are even watching your upstream, then your client can skip sending an upstream to its focus entirely!

Most importantly, foci are decentralised, just like Matrix: there is no single conference server as a single point of control or failure responsible for powering the group call - users connect to whichever focus is closest to them, and so you automatically get a standards-based heterogeneous network-split-resilient geographically distributed cascading conferencing system with end-to-end-encryption, powered by a potentially infinite number of different implementations. To the best of our knowledge, this is the first time someone’s proposed an approach like this for decentralised group calling (watch out, Zoom, we’re coming for you!)

Now, the VoIP team have been busy polishing Element Call (e.g. chasing down end-to-end encryption edge cases and reliability), and also figuring out how to embed it into Element and other Matrix clients as a quick way to get excellent group VoIP (more on that later). As a result, work on building out foci for scalable conferencing had to be pushed down the line.

But in the last few months this completely changed, thanks to an amazing open source contribution from Sean DuBois, project lead over at Pion - the excellent Golang WebRTC implementation. Inspired by our initial talk about MSC3401 at CommCon, Sean independently decided to see how hard it’d be to build a Selective Forwarding Unit (SFU) focus that implemented MSC3401 semantics using Pion - and published it at https://github.com/sean-der/sfu-to-sfu (subsequently donated to github.com/matrix-org). In many ways this was a flag day for Matrix: it’s the first time that a core MSC from the core team has been first implemented from outside the core team (let alone outside the Matrix community!). It’s the VoIP equivalent of Synapse starting off life as a community contribution rather than being written by the core team.

Either way: Sean’s SFU work has opened the floodgates to making native Matrix conferencing actually scale, with Šimon Brandner and I jumping in to implement SFU support in matrix-js-sdk… and as of a few weeks ago we did the first ever SFU-powered Matrix call - which worked impressively well for 12 participants!

12 person Element Call

Now, this isn’t released yet, and there is still work to be done, including:

  • We actually need to select the subset of streams we care about from the focus
  • We need to support thumbnail streams as well as high-res streams
  • We need rate control to ensure clients on bad connections don’t get swamped
  • We need to hook up cascading between foci (although the SFU already supports it!)
  • We need E2EE via insertable streams
  • Faster signalling for switching between streams

You can see the full todo list for basic and future features over on GitHub. However, we’re making good progress thanks to Šimon’s work and Sean’s help - but with any luck beta 3 of Element Call might showcase SFU support!

Meanwhile it’s worth noting that Element Call is not the only MSC3401 implementation out there - the Hydrogen team has added native support to Hydrogen SDK too (skipping over the old 1:1 calling), so expect to see Element <-> Hydrogen calling in the near future. The Hydrogen implementation is also what powers Third Room (see below…)

Matryoshka VoIP Embedding

Elsewhere on VoIP, we’ve also been hard at work figuring out how to embed Element Call into Matrix clients in general, starting with Element Web, iOS & Android. Given MSC3401 is effectively a superset of native 1:1 Matrix VoIP calling, we’d ideally like to replace the current 1:1-only VoIP implementation in Element with an embedded instance of Element Call (not least so we don’t have to maintain it in triplicate over Web/iOS/Android, and because WebRTC-in-a-webview really isn’t very different to native WebRTC). To do this efficiently however, the embedded Element Call needs to share the same underlying Matrix client as the parent Element client (otherwise you end up wasting resources and devices and E2EE overhead between the two). Effectively Element Call ends up needing to parasite off the parent’s client. We call this approach “matryoshka embedding”, given it resembles nested Russian dolls. 🪆

In practice, we do this by extending the Widget API to let Matrix clients within the widget share the parent’s Matrix client for operations such as sending and receiving to-device messages and accessing TURN servers (c.f. MSC3819 and MSC3846). This in turn has been implemented in the matrix-widget-api helper library for widget implementers - and then a few days ago Robin demonstrated the world’s first ever matryoshka embedded Element Call call, looking like this:

Matryoshka embedded Element Call

Note that the MSC3401 events are happening in the actual room where the widget has been added, sent by the right users from Element Web rather than from Element Call, and so are subject to all the normal Matrix access control and encryption semantics. This is a huge step forwards from embedding Jitsi widgets, where the subsequent call membership and signalling happens in an entirely separate system (XMPP via Prosody, ironically) - instead: this is proper native Matrix calling at last.

Moreover, the same trick could be used to efficiently embed other exotic Matrix clients such as Third Room or TheBoard - giving the user the choice either to use the app standalone or within the context of their existing Matrix client. Another approach could be to use OIDC scopes to transparently log the embedded client in using the parent’s identity; this has the advantage of no code changes being needed on the embedded client - but has the disadvantage that you needlessly end up running two Matrix clients for the same account side by side, and adding another device to your account, which isn’t ideal for a performance sensitive app like Element Call or Third Room.

Matryoshka embedding isn’t live yet, but between scalable calls via SFU and native Element Call in Element Web/iOS/Android, the future is looking incredibly exciting for native Matrix VoIP. We hope to finish embedding Element Call in Element Web/iOS/Android in Sept/Oct - and if we get lucky perhaps the SFU will be ready too and then Element Call can exit beta!

Finally, we also added Video Rooms to Element Web - adding the user interface for an “always on” video room that you can hop into whenever you want. You can read about it over on the Element blog - the initial implementation uses Jitsi, but once Element Call and Matryoshka embedding is ready, we’ll switch over to using Element Call instead (and add Voice Rooms too!)

Video Rooms

Third Room

Just as MSC3401 and Element Call natively adds decentralised voice/video conferences to boring old textual Matrix chatrooms, MSC3815 and Third Room go the whole enchilada and adds a full decentralised 3D spatial collaboration environment into your Matrix room - letting you turn your Matrix rooms into a full blown interconnected virtual world.

I can’t overstate how exciting this is: one of the key origins of Matrix was back in Oct 2013 when Amandine and myself found ourselves in Berlin after TechCrunch Disrupt, debating why Second Life hadn’t been more successful - and wondering what you’d have to do to build an immersive 3D social environment which would be as positive and successful as a wildly popular chat network. Our conclusion was that the first key ingredient you’d need would be a kick-ass open decentralised communication protocol to build it on - providing truly open communication primitives that anyone could build on, much like the open web… and that was what got us thinking about how to build Matrix.

Fast forward 9 years, and Third Room is making spectacular progress in building out this dream, thanks to the incredibly hard work of Robert, Nate and Ajay. The goal of Third Room is to be an open platform layered directly on Matrix for spatial collaboration of any kind: effectively a blank canvas to let folks create freeform collaborative 3D (and in future 2D, VR or AR) experiences, either by using existing assets or building their own user-generated content and functionality. Just like the open web itself, this unlocks a literally infinite range of possibilities, but some of the obvious uses include: spatial telepresence, social VR, 3D visualisation of GIS or weather data, 3D simulated environments, search and rescue and disaster response operations (imagine streaming LIDAR from a drone surveying hurricane devastation into Third Room, where you can then overlay and collaborate on GIS data in realtime), and of course 3D gaming in all its various forms.

Now, we’re hoping to give Third Room a proper launch in a few weeks, so I’m not going to spoil too much right now - but the final pieces which are currently coming together include:

  • Finalising the initial version of Manifold, the multi-threaded game engine which powers Third Room (built on Three.JS, bitECS and Rapier), using SharedArrayBuffers as triple-buffers to synchronise between the various threads. See this update for a bit more detail on how the engine works.
  • Finalising the Matrix client interface itself, powered by Hydrogen SDK in order to be as lightweight as possible
  • Adding in full spatial audio and game networking via MSC3401 and Hydrogen SDK (currently full mesh, but will go SFU as soon as SFUs land!)
  • Adding in animated avatars (currently using Mixamo animations)
  • Adding in name tags and object labels
  • Adding in 3D Tile support in order to incrementally load 3D map tiles à la Google Earth
  • Building an asset pipeline from Unity and Blender through to the glTF assets which Third Room uses.
  • Initial framework for an in-world direct-manipulation editor
  • Lightmap support for beautiful high-performance static lighting and shadows
  • Full post-processing pipeline (bloom, depth-of-field, anti-aliasing etc)
  • Integrating with OIDC for login, registration, and account management (see OIDC below)

As a quick teaser - here’s an example of a Unity asset exported into Third Room, showing off lightmaps (check out the light and shadows cast by the strip lighting inside, or the shadow on the ground outside). Ignore the blurry HDR environment map of Venice in the background, which is just there to give the metals something to reflect. Check out the stats on the right-hand side: on Robert’s M1 Macbook Pro we’re getting a solid 60fps at 2000x1244px, with 13.12ms of unused gametime available for every 16.67ms frame, despite already showing a relatively complicated asset!

Meanwhile, here are some shots of Robert and Nate chasing each other around the UK City demo environment (also exported from Unity), showing off blended Mixamo animations and throwing around crates thanks to the Rapier physics engine.


And don't forget, it's just a Matrix client - with no infrastructure required other than a normal Matrix server:

Third Room Overlay

As you can see, we are rapidly approaching the point where we’ll need support from technical artists to help create beautiful scenes and avatars and assets in order to make it all come to life - especially once the Blender and Unity pipelines, and/or the Third Room editor are finished. If you’re interested in getting involved come chat at #thirdroom-dev:matrix.org!

WYSIWYG

Back in the real world, a recent new project that we haven’t spoken about much yet is adding consistent WYSIWYG (What You See Is What You Get) editing to the message composer in matrix-{react,ios,android}-sdk as used by Element Web/iOS/Android - as well as publishing the resulting WYSIWYG editor for the greater glory of the wider ecosystem.

This is a bit of a contentious area, because we’ve tried several times over the years to add a rich text editor to matrix-react-sdk - firstly with the Draft.js implementation by Aviral (which we abandoned after Facebook de-staffed Draft), and then later with a Slate implementation by me (which we abandoned thanks to the maintenance burden of keeping up with Slate’s API changes). Finally, burnt by the experience with third party solutions, Bruno wrote his own editor called CIDER, which was a great success and is what Element Web uses today to author messages including ‘pills’ for structured rooms/users etc… but this deliberately didn’t provide full WYSIWYG functionality. Meanwhile, Slack added WYSIWYG, forced it on, and screwed it up - and apps like WhatsApp and Discord seem to get by fine without WYSIWYG.

However, given that users are now used to WYSIWYG in Teams and Slack, we’ve now decided to have another go at it, inspired by CIDER’s success - and with the novel twist that the heavy lifting of modelling and versioning the document and handling Unicode + CJK voodoo will be provided by a cross-platform native library written in Rust, ensuring that matrix-{react,ios,android}-sdk (and in future matrix-rust-sdk-based apps like Element X) all have precisely the same consistent semantics, and we don’t spend our lives fixing per-platform WYSIWYG bugs unless it really is a platform-specific issue with the user interface provided on that particular platform.

The project is fairly young but developing fast, and lives over at https://github.com/matrix-org/matrix-wysiwyg (better name suggestions welcome ;) - we’re aiming to get it into clients by the end of October. The editor itself is not Matrix specific at all, so it’ll be interesting to see if other projects pick it up at all - and meanwhile, if we’ve done a good job, it’ll be interesting to see if this can be used to power Matrix-backed collaborative-editing solutions in future…

Update: we should have mentioned that the WYSIWYG editor project is being built out by staff at Element, who very kindly have been sponsored to work on it by one of Element's Big Public Sector Customers in order to get to parity with Teams. Thank you!!

Are We OIDC Yet?

On the other hand, a project we recently yelled about a lot is Matrix’s transition to Open ID Connect for standards-based authentication and account management. We announced this at the end of the year and the project has built up huge momentum subsequently, culminating with the release of https://areweoidcyet.com last week to track the progress and remaining work.

Our plan is to use native OIDC in production for the first time to provide all the login, registration and account management for Third Room when it launches in a few weeks (using a branded Keycloak instance as the identity provider, for convenience). After all, the last thing we wanted to do was to waste time building fiddly Matrix-specific login/registration UI in Third Room when we’re about to move to OIDC! This will be an excellent case study to see how it works, and how it feels, and inform the rest of the great OIDC experiment and proposed migration.

Dendrite + P2P

Meanwhile, the Next Generation team has continued to focus on their mission to make Dendrite as efficient and usable as possible. Within recent months, Dendrite has matured dramatically, with a considerable list of bugs fixed, performance significantly improved and new features added - push notifications, history visibility and presence to name a few notable additions.

Neil Alexander, Kegan and Till have continued to streamline the Dendrite architecture and to refactor areas of the codebase which have long needed attention, as well as moving from Kafka to NATS JetStream, an all-new caching model and some other fairly major architectural changes. We’ve also seen an increase of code contributions from the community and outside organisations, which is exciting, and the gomatrixserverlib library which underpins much of Dendrite is also seeing more active development and attention thanks to its use in the Complement integration testing suite.

With the most recent 0.9.3 release, we are proud to announce that Dendrite now passes 90% of Client-Server API tests and 95% of Server-Server API tests and has support for all specced room versions in use today. We have a growing community of users who are (quite successfully) trialling using Dendrite homeservers day-to-day, as well as our own public dendrite.matrix.org homeserver, which is open to the public for registration for anyone who wants to experiment with Dendrite without running their own deployment.

Dendrite plays an important role in our future strategy as it is also the homeserver implementation used for embedded homeservers, P2P development and experimentation. In addition to being able to scale up, we have also successfully scaled down, with the Element P2P demos proving that an embedded Dendrite homeserver can run comfortably on an iOS or Android device.

Research on the Pinecone overlay network for P2P Matrix has also continued, with Devon and Neil having experimented with a number of protocol iterations and spent considerable time bringing the Pinecone Simulator up to scratch to help us to test our designs more rapidly. Our work in this area is helping us to form a better direction and strategy for P2P Matrix as a whole, which is moving more towards a hybridised model with the current Matrix federation — a little different to our original vision, but will hopefully result in a much smoother transition path for existing users whilst solving some potential scaling problems. The arewep2pyet.com site is a living page which contains a high level overview of our goals and all the progress being made.

What’s left?

Comparing all of the above with the predictions for 2022 section of the end-of-year blog post, we’re making very strong progress in a tonne of areas - and the list above isn’t comprehensive. For instance, we haven’t called out all the work that the Trust & Safety team are doing to roll out advanced moderation features by default to all communities - or the work that Eric has been doing to close the remaining gap between Gitter and Matrix by creating new static archives throughout Matrix. Hydrogen has also been beavering away to provide a tiny but perfectly formed web client suitable for embedding, including the new embeddable Hydrogen SDK. We haven’t spoken about the work that the Cryptography team have been doing to adopt vodozemac and matrix-rust-sdk-crypto throughout matrix-{js,ios,android}-sdk, or improve encryption stability and security throughout. We’ve also not spoken about the new initiative to fix long-term chronic bugs (outside of the work above) in general - or all the work being done around Digital Markets Act interoperability

Other things left on the menu for this year include getting Threads out of beta: we’ve had a bit of an adventure here figuring out how to get the right semantics for notification badges and unread state in rooms with threads (especially if you use a mix of clients which support and don’t support threads), and once that’s done we’ll be returning to Spaces (performance, group permissions etc).

Matrix 2.0?

Looking through this post (and congratulations if you’re still reading it at this point ;P), it really feels that Matrix is on the verge of shifting into a new phase. Much as MacOS X started off as a promising but distinctly unoptimised operating system, and then subsequently got unrecognisably faster year by year (even on the same hardware!) as Apple diligently worked away optimising the kernel… similarly: we are now landing the architectural changes to completely transform how Matrix performs.

Between protocol changes like Sliding Sync, Faster Joins, native OIDC and native VoIP conferencing all landing at roughly the same time - and alongside new implementations like matrix-rust-sdk and vodozemac, let alone Third Room - it feels unquestionably like we have an unrecognisable step change on the horizon. Our aim is to land as much of this as possible by the end of the year, and if we pull it off, I’m tempted to suggest we call the end result Matrix 2.0.

To the future! 🚀

Matthew, Amandine & the whole core team.

Welcoming Rocket.Chat to Matrix!

30.05.2022 19:04 — General Matthew Hodgson
Last update: 30.05.2022 16:16
Rocket.Chat ♥️ Matrix

Hi folks,

We just wanted to take a moment to welcome Rocket.Chat to Matrix, given the recent announcement that they are switching to using Matrix for standards-based interoperable federation! This is incredible news: Rocket.Chat is one of the leading open source collaboration platforms with over 12 million users, and they will all shortly have the option to natively interoperate with the wider Matrix network: the feature has already landed (in alpha) in Rocket.Chat 4.7.0!

We’d like to thank the whole Rocket.Chat team for putting their faith in Matrix and joining the network: the whole idea of Matrix is that by banding together, different independent organisations can build an open decentralised network which is far stronger and more vibrant than any closed communication platform. The more organisations that join Matrix, the more useful and valuable the network becomes for everyone, and the more momentum there is to further refine and improve the protocol. Our intention is that Matrix will grow into a massive open ecosystem and industry, akin to the open Web itself… and that every organisation participating, be that Rocket.Chat, Element, Gitter, Beeper, Famedly or anyone else will benefit from being part of it. We are stronger together!

Rocket.Chat’s implementation follows the “How do you make an existing chat system talk Matrix?” approach we published based on our experiences of linking Gitter into Matrix. Looking at the initial pull request, the implementation lets Rocket.Chat act as a Matrix Application Service, effectively acting as a bridge to talk to an appropriate Matrix homeserver. From chatting with the team, it sounds like next steps will involve adding in encryption via our upcoming matrix-sdk-crypto node bindings - and then looking at ways to transparently embed a homeserver like Dendrite, sharing data as much as possible between RC and Matrix, so Rocket.Chat deployments can transparently sprout Matrix interoperability without having to run a separate homeserver. Super exciting!

You can see a quick preview of a Rocket.Chat user chatting away with an Element user on matrix.org via Matrix here:

So, exciting times ahead - needless to say we’ll be doing everything we can to support Rocket.Chat and ensure their Matrix integration is a success. And at this rate, they might be distinctly ahead of the curve if they start shipping Dendrite! Meanwhile, we have to wonder who will be next? Nextcloud? Mattermost? Place your bets… ;)

--Matthew

Update

Aaron from Rocket.Chat just published an excellent guide & video tour for how to actually set up your Rocket.Chat instance with Dendrite to get talking Matrix!

Independent public audit of Vodozemac, a native Rust reference implementation of Matrix end-to-end encryption

16.05.2022 17:04 — General Matthew Hodgson
Last update: 16.05.2022 16:02

Hi all,

It’s been quite a while since our last independent E2EE review, and so we’re incredibly proud to present an entirely new independent security audit of vodozemac, our next generation native Rust implementation of Matrix’s Olm and Megolm E2EE protocols. The audit has been conducted by Least Authority, who specialise in comprehensive audits of security-sensitive decentralized technologies - and we are very grateful to gematik, who kindly shared the costs of funding the audit as part of their commitment to Matrix for healthcare in Germany.

This audit was a bit of a whirlwind, as while we were clearly overdue an audit of Matrix’s E2EE implementations, we decided quite late in the day to focus on bringing vodozemac to auditable production quality rather than simply doing a refresh of the original libolm audit. However, we got there in time, thanks to a monumental sprint from Damir and Denis over Christmas. The reason we went this route is that vodozemac is an enormous step change forwards in quality over libolm, and vodozemac is now the reference Matrix E2EE implementation going forward. Just as libolm went live with NCC’s security review back in 2016, similarly we’re kicking off the first stable release of vodozemac today with Least Authority’s audit. In fact, vodozemac just shipped as the default E2EE library in matrix-rust-sdk 0.5, released at the end of last week!

The main advantages of vodozemac over libolm include:

  • Native Rust - type, memory and thread safety is preserved both internally, and within the context of larger Rust programs (such as those building on matrix-rust-sdk). This is particularly important given the memory bugs which libolm sprouted, despite our best efforts to the contrary.
  • Performance - vodozemac benchmarks roughly 5-6x faster than libolm on typical hardware
  • Better primitives - vodozemac is built on the best practice cryptographic primitives from the Rust community, rather than the generic Brad Conte primitives used by libolm.

Also, we’ve finally fixed one of the biggest problems with libolm - which was that the hardest bit of implementing E2EE in Matrix wasn’t necessarily the encryption protocol implementation itself, but how you glue it into your Matrix client’s SDK. It turns out ~80% of the code and complexity needed to securely implement encryption ends up being in the SDK rather than the Olm implementation - and each client SDK ended up implementing its own independent state machine and glue for E2EE, leading to a needlessly large attack & bug surface.

To address this problem, vodozemac is designed to plug into matrix-sdk-crypto - an SDK-independent Rust library which abstracts away the complexities and risks of implementing E2EE, designed to plug into existing SDKs in any language. For instance, Element Android already supports delegating its encryption to matrix-sdk-crypto; Element iOS got this working too last week, and we’re hard at work adding it to Element Web too. (This set of projects is codenamed Element R). Meanwhile, Element X (the project to switch Element iOS and Element Android to use matrix-rust-sdk entirely) obviously benefits from it too, as matrix-rust-sdk now leans on matrix-sdk-crypto for its encryption.

Therefore we highly recommend that developers using libolm migrate over to vodozemac - ideally by switching to matrix-sdk-crypto as a way of managing the interface between Matrix and the E2EE implementation. Vodozemac does also provides a similar API to libolm with bindings for JS and Python (and C++ in progress) if you want to link directly against it - e.g. if you’re using libolm for something other than Matrix, for example XMPP, ActivityPub or Jitsi. We’ll continue to support and maintain libolm for now though, at least until the majority of folks have switched to vodozemac.

In terms of the audit itself - we recommend reading it yourself, but the main takeaway is that Least Authority identified 10 valid concerns, of which we addressed 8 during the audit process. The remaining two are valid but lower priority, and we’ll fix them as part of our maintenance backlog. All the issues identified are excellent valid points, and we’re very glad that Least Authority have added huge value here by highlighting some subtle gotchas which we’d missed. (If you write Rust, you’ll particularly want to check out their zeroisation comments).

So: exciting times! Vodozemac should be landing in a Matrix client near you in the near future - we’ll yell about it loudly once Element switches over. In the meantime, if you have any questions, please head over to #e2ee:matrix.org.

Thanks again to gematik for helping fund the audit, and to Least Authority for doing an excellent job - and being patient and accommodating beyond the call of duty when we suddenly switched the scope from libolm to vodozemac at the last minute ;)

Next up: we’re going to get the Rust matrix-sdk-crypto independently audited (once this burndown is complete) so that everyone using the matrix-sdk-crypto state machine for Matrix E2EE can have some independent reassurance too - a huge step forward from the wild west of E2EE SDK implementations today!

– Matthew

Technical FAQ on the Digital Markets Act

30.03.2022 14:13 — General Matthew Hodgson

Hi all,

We've been flooded with questions about the DMA since it was announced last week, and have spotted some of the gatekeepers jumping to the wrong conclusions about what it might entail. Just in case you don't want to wade through yesterday's sprawling blog post, we've put together a quick FAQ to cover the most important points based on our understanding.

What does DMA mean for the gatekeepers?

The gatekeepers will have to open and document their existing APIs, so that alternative clients and/or bridges can connect to their service. The DMA requires that the APIs must expose the same level of privacy for remote users as for local users. So, if their service is end-to-end-encrypted (E2EE), the APIs must expose the same E2EE semantics (e.g. so that an alternative client connecting would have to implement E2EE too). For E2EE-capable APIs to work, the gatekeeper will likely have to model remote users as if they were local in their system. In the short term (one year horizon) this applies only to 1:1 chats and file transfers. In the long term (three year horizon) this applies to group chats and voip calls/conferences too.

Who counts as a “gatekeeper”?

The DMA defines any tech company worth over €75B or with over €7.5B of turnover as a gatekeeper, who must open their communication APIs. This means only the tech giants are in scope (e.g. as of today that includes Meta, Apple, Google, Microsoft, Salesforce/Slack - not Signal, Telegram, Discord, Twitter).

Does this mean the gatekeepers are being forced to implement an open standard such as Matrix or XMPP?

No. They can keep their existing implementations and APIs. For interoperability with other service providers, they will need to use a bridge (which could bridge via a common language such as Matrix or XMPP).

Are bridges secure?

If the service lacks end-to-end-encryption (Slack, Teams, Google Chat, non-secret chats on Facebook Messenger, Instagram, Google Messages etc) then the bridge does not reduce security or privacy, beyond the fact that bridged conversations by definition will be visible to the bridge and to the service you are interoperating with.

If the service has E2EE (WhatsApp, iMessage, secret chats on Messenger) then the bridge will necessarily have to decrypt (and reencrypt, where possible) the data when passing it to the other service. This means the conversation is no longer E2EE, and thus less secure (the bridge could be compromised and inspect or reroute their messages) - and so gatekeepers must warn the user that their conversation is leaving their platform and is no longer E2EE with something like this:


Why is the DMA good?

The upside is that the user has the freedom to use an infinite number of services (bots, virtual assistants, CRMs, translation services, etc) as well as speak to any other user in the world, regardless what platform they use.

It also puts much-needed pressure on the gatekeepers to innovate and differentiate rather than rely on their network effects to attract new users - creating a much more vibrant, open, competitive marketplace for users.

See "What's driven the DMA?" for more details.

If the DMA requires that remote users have the same security as local users, how can bridges work?

The DMA requires that the APIs expose the same level of security as for local users - ie E2EE must be exposed. If the users in a conversation choose to use a bridge and thus reencrypt the messages, then it is their choice to tradeoff encryption in favour of interoperability for a given conversation.

Does this undermine the gatekeepers’ current encryption?

Absolutely not. Users talking to other users within the same E2EE-capable gatekeeper will still be E2EE (assuming the gatekeeper doesn’t pull that rug from under its users) - and in fact it gives the gatekeepers an excellent way to advertise the selling point that E2EE is only guaranteed when you speak to other users on the same platform.

But why do we need bridges? If everyone spoke a common protocol, you wouldn’t ever have to decrypt messages to convert them between protocols.

Practically speaking, we don’t expect the gatekeepers to throw away their existing stacks (or implement multihead messengers that also speak open protocols). It’s true that if they natively spoke Matrix or XMPP then the reencryption problem would go away, but it’s more realistic to focus on opening the existing APIs than interpret the legislation as a mandate to speak Matrix. Perhaps in future players will adopt Matrix of their own volition.

Where do these bridges come from?

There is already a vibrant community of developers who build unofficial bridges to the gatekeepers - eg Element, Beeper and hundreds of open source developers in the Matrix and XMPP communities. Historically these bridges have been hampered by having to use unofficial and private APIs, making them a second class citizen - but with open documented APIs guaranteed by the DMA we eagerly anticipate an explosion of high quality transparent bridges which will be invisible to the end user.

Can you run E2EE bridges clientside to make them safer?

Maybe. For instance, current iMessage bridges work by running iMessage on a local iPhone or Mac and then reencrypting the messages there for interoperability. Given the messages are already exposed on the client anyway, this means that E2EE is not broken - and avoids decrypting them on a server. There is lots of development in this space currently, and with open APIs guaranteed by DMA the pace should speed up significantly.

How can you tell what service you should use to talk to a given remote user?

For 1:1 chats this is easy: you can simply ask the user which service they want to use to talk to a given user, if that user is not already on that service.

For group chats it is harder, and this is why the deadline for group chats is years away. The problem is that you need a way to verify the identity of arbitrary numbers of remote users on different platforms - effectively looking up their identity in a secure manner which stops services from maliciously spoofing identities.

One possible way to solve this would be for users to explicitly link their identity on one service with that on the gatekeeper’s service - eg “Alice on AliceChat is talking in the same room as Bob on BobChat; Bob will be asked to prove to AliceChat that he is the real Bob” - and so if AliceChat has already validated Bob’s identity, then this can be used to spot him popping up on other services. It also gives Bob a way to block themselves from ever being unwittingly bridged to AliceChat.

There are many other approaches too - and the onus is on the industry to figure out the best solution for decentralised identity in the next 3-4 years in order to realise the most exciting benefits of the DMA.


For a much deeper dive into the above, please check out our post at "How do you implement interoperability in a DMA world?"

How do you implement interoperability in a DMA world?

29.03.2022 17:57 — General Matthew Hodgson
Last update: 29.03.2022 09:23

With last week’s revelation that the EU Digital Markets Act will require tech gatekeepers (companies valued at over $75B or with over $7.5B of turnover) to open their communication APIs for the purposes of interoperability, there’s been a lot of speculation on what this could mean in practice, To try to ground the conversation a bit, we’ve had a go at outlining some concrete proposals for how it could work.

However, before we jump in, we should review how the DMA has come to pass.

What’s driven the DMA?

Today’s gatekeepers all began with a great product, which got more and more popular until it grew to such a size that one of the biggest reasons to use the service is not necessarily the product any more, but the benefits of being able to talk to a large network of users. This rapidly becomes anti-competitive, however: the user becomes locked into the network and can’t switch even if they want to. Even when people have a really good reason to move provider (e.g. WhatsApp’s terms of use changing to share user data with Facebook, Apple doing a 180 on end-to-end encrypting iCloud backups, or Telegram not actually being end-to-end encrypted), in practice hardly anything changes - because the users are socially obligated to keep using the service in order to talk to the rest of the users on it.

As a result, it’s literally harmful to the users. Even if a new service launches with a shiny new feature, there is enormous inertia that must be overcome for users to switch, thanks to the pull of their existing network - and even if they do, they’ll just end up with their conversations haphazardly fragmented over multiple apps. This isn’t accepted for email; it isn’t accepted for the phone network; it isn’t accepted on the Internet itself - and it shouldn’t be accepted for messaging apps either.

Similarly: the closed networks of today’s gatekeepers put a completely arbitrary limit on how users can extend and enrich their own conversations. On email, if you want to use a fancy new client like Superhuman - you can. If you want to hook up a digital assistant or translation service to help you wrangle your email - you can. If you want to hook up your emails to a CRM to help you run your business - you can. But with today’s gatekeepers, you have literally no control: you’re completely at the mercy of the service provider - and for something like WhatsApp or iMessage the options are limited at best.

Finally - all the users’ conversation metadata for that service (who talks to who, and when) ends up centralised in the gatekeepers’ databases, which then become an incredibly valuable and sensitive treasure trove, at risk of abuse. And if the service provider identifies users by phone number, the user is forced to disclose their phone number (a deeply sensitive personal identifier) to participate, whether they want to or not. Meanwhile the user is massively incentivised not to move away: effectively they are held hostage by the pull of the service’s network of users.

So, the DMA exists as a strategy to improve the situation for users and service providers alike by building a healthier dynamic ecosystem for communication apps; encouraging products to win by producing the best quality product rather than the biggest network. To quote Cédric O (Secretary of State for the Digital Sector of France), the strategy of the legislation came from Washington advice to address the anticompetitive behaviour of the gatekeepers “not by breaking them up… but by breaking them open.” By requiring the gatekeepers to open their APIs, the door has at last been opened to give users the option to pick whatever service they prefer to use, to choose who they trust with their data and control their conversations as they wish - without losing the ability to talk to their wider audience.

However, something as groundbreaking as this is never going to be completely straightforward. Of course while some basic use cases (i.e. non-E2EE chat) are easy to implement, they initially may not have a UX as smooth as a closed network which has ingested all your address book; and other use cases(eg E2EE support) may require some compromises at first. It’s up to the industry to figure out how to make the most of that challenge, and how to do it in a way which minimises the impact on privacy - especially for end-to-end encrypted services.

What problems need to be solved?

We’ve already written about this from a Matrix perspective, but to recap - the main challenge is the trade-off between interoperability and privacy for gatekeepers who provide end-to-end encryption, which at a rough estimate means: WhatsApp, iMessage, secret chats in Facebook Messenger, and Google Messages. The problem is that even with open APIs which correctly expose the end-to-end encrypted semantics (as DMA requires), the point where you interoperate with a different system inevitably means that you’ll have to re-encrypt the messages for that system, unless they speak precisely the same protocol - and by definition you end up trusting the different system to keep the messages safe. Therefore this increases the attack surface on the conversations, putting the end-to-end encryption at risk.

Alex Stamos (ex-CISO at Facebook) said that “WhatsApp rolling out mandatory end-to-end encryption was the largest improvement in communications privacy in human history” – and we agree. Guaranteed end-to-end encrypted conversations on WhatsApp is amazing, and should be protected at all costs. If users are talking to other users on WhatsApp (or any set of users communicating within the same E2EE messenger), E2EE should and must be maintained - and there is nothing in the DMA which says otherwise.

But what if the user consciously wants to prioritise interoperability over encryption? What if the user wants to hook their WhatsApp messages into a CRM, or run them through a machine translation service, or try to start a migration to an alternative service because they don’t trust Meta? Should privacy really come so spectacularly at the expense of user freedom?

We also have the problem of figuring out how to reference users on other platforms. Say that I want to talk to a user with a given phone number, but they’re not on my platform - how do I locate them? What if my platform only knows about phone numbers, but you’re trying to talk to a user on a platform which uses a different format for identifiers?

Finally, we have the problem of mitigating abuse: opening up APIs makes it easier for bad actors to try to spam or phish or otherwise abuse users within the gatekeepers. There are going to have to be changes in anti-abuse services/software, and some signals that the gatekeeper platforms currently use are going to go away or be less useful, but that doesn't mean the whole thing is intractable. It will require changes and innovative thinking, but we’ve been making steady progress (e.g. the work done by Element’s trust and safety team). Meanwhile, the gatekeepers already have massive anti-abuse systems in place to handle the billions of users within their walled gardens, and unofficial APIs are already widespread: adding official APIs does not change the landscape significantly (assuming interoperability is implemented in such a way that the existing anti-abuse mechanisms still apply).

In the past, gatekeepers dismissed the effort of interop as not being worthwhile - after all, the default course of action is to build a walled garden, and having built one, the temptation is to try to trap as many users as possible. It was also not always clear that there were services worth interoperating with (thanks to the chilling effects of the gatekeepers themselves, in terms of stifling innovation for communication startups). Nowadays this situation has fundamentally changed however: there is a vibrant ecosystem of open communication startups out there, and a huge appetite to build a vibrant open ecosystem for interoperable communication, but like the open web itself.

What are the requirements?

Before going further in considering solutions, we need to review the actual requirements of the DMA. Our best understanding at this point is that the DMA will mandate that:

  • Gatekeepers will have to provide open and documented APIs to their services, on request, in order to facilitate interoperability (i.e. so that other services can communicate with their users).
  • These APIs must preserve the same level of end-to-end encryption (if any) to remote users as is available to local users.
  • This applies to 1:1 messaging and file transfer in the short term, and group messaging, file-transfer, 1:1 VoIP and group VoIP in the longer term.

So, what could this actually look like?

The DMA legislation deliberately doesn’t focus on implementation, instead letting the industry figure out how this could actually work in practice. There are many different possible approaches, and so from our point of view as Matrix we’ve tried to sketch out some options to make the discussion more concrete. Please note these are preliminary thoughts, and are far from perfect - but hopefully useful as a starting point for discussion.

Finding Bob

Imagine that you have a user Alice on an existing gatekeeper, which we’ll call AliceChat, who runs an E2EE messaging service which identifies users using phone numbers. Say that they want to start a 1-to-1 conversation with Bob, who doesn’t use AliceChat, but Alice knows he is a keen user of BobChat. Today, you’d have no choice but to send them an SMS and nag them to join AliceChat (sucks to be them if they don’t want to use that service, or if they’re unable to for whatever reason - e.g. their platform isn’t supported, or their government has blocked access, etc), or join BobChat yourself.


However, imagine if instead the gatekeeper app had a user experience where the app prompted you to talk to the user via a different platform instead. It’d be no different to your operating system prompting you to pick which app to use to open a given file extension (rather than the OS vendor hardcoding it to one of their own apps - another win for user rights led by the EU!).


Now, the simplest approach in the short term would be for each gatekeeper to pre-provision a set of options of possible alternative networks. (The DMA says that, on request, other service providers can ask to have access to the gatekeeper’s APIs for the purposes of interoperability, so the gatekeeper knows who the alternative networks may be). “Bob is not on AliceChat - do you want to try to reach him instead on BobChat, CharlieChat, DaveChat (etc)”.

Much like users can configure their preferred applications for file extensions in an operating system today, users would also be able to add their own preferred service providers - simply specifying their domain name.

Connecting to Bob

Now, AliceChat itself needs to figure out how to query the remote service provider to see if Bob actually exists there. Given the DMA requires that gatekeepers provide open APIs with the same level of security to remote users as their local ones using today’s private APIs - and very deliberately doesn’t mandate specific protocols for interoperability - they will need to locate a bridge which can connect to the other system.

In this thought experiment, the bridge used would be up to the destination provider. For instance, bobchat.com could announce that AliceChat users should connect to it via alicechat-bridge.bobchat.com using the AliceChat protocol(or matrix-bridge.bobchat.com via Matrix or xmpp-bridge.bobchat.com via XMPP) by a simple HTTP API or even a .well-known URL. Users might also be able to override the bridge used to connect to them (e.g. to point instead at a client-side bridge), and could sign the advertisement to prove that it hadn’t been tampered with.

AliceChat would then connect to the discovered bridge using AliceChat’s vendor-specific newly opened API, and would then effectively treat Bob as if they were a real AliceChat user and client to all intents and purposes. In other words, Bob would effectively be a “ghost user” on AliceChat, and subject to all their existing anti-abuse mechanisms.

Meanwhile, the other side of the bridge converts through to whatever the target system is - be that XMPP, Matrix, a different proprietary API, etc. For Matrix, it’d be chatting away to a homeserver via the Application Service API (using End-to-Bridge Encryption via MSC3202). It’s also worth noting that the target might not even be a bridge - it could be a system which already natively speaks AliceChat’s end-to-end encrypted API, thus preserving end-to-end encryption without any need to re-encrypt. It’s also worth noting that while historically bridges have had a bad reputation as being a second class (often a second class afterthought), Matrix has shown that by considering them as a first class citizen and really focusing on mapping the highest common denominator between services rather than lowest common denominator, it’s possible for them to work transparently in practice. Beeper is a great example of Matrix bridging being used for real in the wild (rather amusingly they just shipped emoji reactions for WhatsApp on iOS via their WhatsApp<->Matrix bridge before WhatsApp themselves did…)

Architecturally, it could look like this:

Or, more likely (given a dedicated bridge between two proprietary services would be a bit of a special case, and you’d have to solve the dilemma of who hosts the bridge), both services could run a bridge to a common open standard protocol like Matrix or XMPP instead (thus immediately enabling interoperability with everyone else connected to that network):

Please note that while these examples show server-side bridges, in practice it would be infinitely preferable to use client-side bridges when connecting to E2EE services - meaning that decrypted message data would only ever be exposed on the client (which obviously has access to the decrypted data already). Client-side bridges are currently complicated by OS limits on background tasks and push notification semantics (on mobile, at least), but one could envisage a scenario where you install a little stub AliceChat client on your phone which auths you with AliceChat and then sits in the background receiving messages and bridging them through to Matrix or XMPP, like this:

Another possible architecture could be for the E2EE gatekeeper to expose their open APIs on the clients, rather than the server. DMA allows this, to the best of our knowledge - and would allow other apps on the device to access the message data locally (with appropriate authorisation, of course) - effectively doing a form of realtime data liberation from the closed service to an open system, looking something like this:

Finally, it's worth noting that when peer-to-peer decentralised protocols like P2P Matrix enter production, clientside bridges could bridge directly into a local communication server running on the handset - thus avoiding metadata being exposed on Matrix or XMPP servers providing a common language between the service providers.

Locating users

Now, the above describes the simplest and most naive directory lookup system imaginable - the problem of deciding which provider to use to connect to each user is shouldered by the users. This isn’t that unreasonable - after all, users may have strong feelings about what providers to use to talk to a given user. Alice might be quite happy to talk to Bob via BobChat, but might be very deliberately avoiding talking to him on DaveChat, for whatever ominous reasons.

However, it’s likely in future we will also see other directory services appear in order to map phone numbers (or other identities) to providers - whether these piggyback on top of existing identity providers (gatekeepers, DNS, telcos, SSO providers, governments) or are decentralised through some other mechanism. For instance, Bob could send AliceChat a blinded proof that he authorises them to automatically route traffic to him over at BobChat, with BobChat maintaining a matching proof that Bob is who he claims to be (having gone through BobChat’s auth process) - and the proofs could be linked using a temporary key such that Bob doesn’t even need to maintain a long-term one. (Thanks to James Monaghan for suggesting this one!)

Another alternative to having the user decide where to find each other could be to use a decentralised Keybase-style identity system to track verified mappings of identities (phone numbers, email addresses etc) through to service providers - perhaps something like IDX might fit the bill? While this decentralised identity lookups have historically been a hard problem, there is a lot of promising work happening in this space currently and the future looks promising.

Talking to Bob

Meanwhile, Alice still needs to talk to Bob. As already discussed, unless everyone speaks the same end-to-end encrypted protocol (be it Matrix, WhatsApp or anything else), we inevitably have a trade-off here between interoperability and privacy if Bob is not on the same system as Alice (assuming AliceChat is end-to-end encrypted) - and we will need to clearly warn Alice that the conversation is no longer end-to-end encrypted:


To be clear: right now, today, if Bob were on AliceChat, he could be copy-pasting all your messages into (say) Google Translate in a frantic effort to workaround the fact that his closed E2EE chat platform has no way to do machine translation. However, in a DMA world, Bob could legitimately loop a translation bot into the conversation… and Alice would be warned that the conversation was no longer secure (given the data is now being bridged over to Google).

This is a clear improvement in user experience and transparency. Likewise, if I’m talking to a bridged user today on one of these platforms, I have no way of telling that they have chosen to prioritise interop over E2EE - which is frankly terrifying. If I’m talking to someone on WhatsApp today I blindly assume that they are E2EE as they are on the same platform - and if they’re using an unofficial app or bridge, I have no way to tell. Whereas in a DMA world, you would expect the gatekeeper to transparently expose it.

If anything, this is good news for the gatekeeper in that it consciously advertises a big selling point for them: that for full E2EE, users need to talk to other users in the same walled garden (unless of course the platform speaks the same protocol). No more need for bus shelter adverts to remind everywhere that WhatsApp is E2EE - instead they can remind the user every time they talk to someone outside the walled garden!

Just to spell it out: the DMA does not require or encourage any reduction in end-to-end encryption for WhatsApp or similar: full end-to-end encryption will still be there for users in the same platform, including through to users on custom clients (assuming the gatekeeper doesn’t flex and turn it off for other reasons).

Obviously, this flow only considers the simple case of Alice inviting Bob. The flow is of course symmetrical for Bob inviting Alice; AliceChat will need to advertise bridges which can be used to connect to its users. As Bob pops up from BobChat, the bridge would use AliceChat’s newly open APIs to provision a user for him, authing him as per any other user (thus ensuring that AliceChat doesn’t need to have trusted BobChat to have authenticated the user). The bridge then sends/receives messages on Bob’s behalf within AliceChat.

Group communication

This is all very well for 1:1 chats - which are the initial scope of the DMA. However, over the coming years, we expect group chats to also be in scope. The good news is that the same general architecture works for group chats too. We need a better source of identity though: AliceChat can’t possibly independently authenticate all the new users which might be joining via group conversations on other servers (especially if they join indirectly via another server). This means adopting one of the decentralised identity lookup approaches outlined earlier to determine whether Charlie on CharlieChat is the real Charlie or an imposter.

Another problem which emerges with group chats which span multiple service providers is that of indirect routing, especially if the links between the providers use different protocols. What if AliceChat has a direct bridge to BobChat (a bit like AIM and ICQ both spoke OSCAR), BobChat and CharlieChat are connected by Matrix bridges, and AliceChat and CharlieChat are connected via XMPP bridges? We need a way for the bridges to decide who forwards traffic for each network, and who bridges the users for which network. If they were all on Matrix or XMPP this would happen automatically, but with mixed protocols we’d probably have to extend the lookup protocol to establish a spanning tree for each conversation to prevent forwarding loops.

Here’s a deliberately twisty example to illustrate the above thought experiment:

There is also a risk of bridge proliferation here - in the worst case, every service would have to source bridges to directly connect to every other service who came along, creating a nightmarish n-by-m problem. But in practice, we expect direct proprietary-to-proprietary bridges to be rare: instead, we already have open standard communication protocols like Matrix and XMPP which provide a common language between bridges - so in practice, you could just end up in a world where each service has to find a them-to-Matrix or them-to-XMPP bridge (which could be run by them, or whatever trusted party they delegate to).

Conclusion

A mesh of bridges which connect together the open APIs of proprietary vendors by converting them into open standards may seem unwieldy at first - but it’s precisely the sort of ductwork which links both phone networks and the Internet together in practice. As long as the bridging provides for highest common denominator fidelity at the best impedance ratio, then it’s conceptually no different to converting circuit switched phone calls to VoIP, or wired to wireless Ethernet, or any of the other bridges which we take entirely for granted in our lives thanks to their transparency.

Meanwhile, while this means a bit more user interface in the communication apps in order to select networks and warn about trustedness, the benefits to users are enormous as they put the user squarely back in control of their conversations. And the UX will improve as the tech evolves.

The bottom line is, we should not be scared of interoperability, just because we’ve grown used to a broken world where nothing can interconnect. There are tractable ways to solve it in a way that empowers and informs the user - and the DMA has now given the industry the opportunity to demonstrate that it can work.

Interoperability without sacrificing privacy: Matrix and the DMA

25.03.2022 22:39 — General Matthew Hodgson
Last update: 25.03.2022 18:01

Yesterday the EU Parliament & Council agreed on the contents of the Digital Markets Act - new legislation from the EU intended to limit anticompetitive behaviour from tech “gatekeepers”, i.e. big tech companies (those with market share larger than €75B or with more than €7.5B a year of revenue).

This is absolutely landmark legislation, where the EU has decided not to break the gatekeepers up in order to create a more competitive marketplace - but instead to “break them open”. This is unbelievably good news for the open Internet, as it is obligating the gatekeepers to provide open APIs for their communication services. In other words: no longer will the tech giants be able to arbitrarily lock their users inside their walled gardens - there will be a legal requirement for them to expose APIs to other services.

While the formal outcomes of yesterday’s agreement haven’t been published yet (beyond this press release), our understanding is that the DMA will mandate:

  • Gatekeepers will have to provide open and documented APIs to their services, on request, in order to facilitate interoperability (i.e. so that other services can communicate with their users).
  • These APIs must preserve the same level of end-to-end encryption (if any) to remote users as is available to local users.
  • This applies to 1:1 messaging and file transfer in the short term, and group messaging, file-transfer, 1:1 VoIP and group VoIP in the longer term.

This is the best possible outcome imaginable for the open internet. Never again will a big tech company be able to hold their users hostage in a walled garden, or arbitrarily close down or sabotage their APIs.

So, what’s the catch?

Since the DMA announcement on Thursday, there’s been quite a lot of yelling from some very experienced voices that mandating interoperability via open APIs is going to irrevocably undermine end-to-end encrypted messengers like WhatsApp. This seems to mainly be born out of a concern that the DMA is somehow trying to subvert end-to-end encryption, despite the fact that the DMA explicitly mandates that the APIs must expose the same level of security, including end-to-end encryption, that local users are using. (N.B. Signal doesn’t qualify as a gatekeeper, so none of this is relevant to Signal).

So, for WhatsApp, it means that the API would expose both the message-passing interface as well as the key management APIs required to interoperate with WhatsApp using your own end-to-end-encrypted WhatsApp client - E2EE would be preserved.

However, this does mean that if you were to actively interoperate between providers (e.g. if Matrix turned up and asked WhatsApp, post DMA, to expose an API we could use to write bridges against), then that bridge would need to convert between WhatsApp’s E2EE’d payloads and Matrix’s E2EE’d payloads. (Even though both WhatsApp and Matrix use the Double Ratchet, the actual payloads within the encryption are completely different and would need to be converted). Therefore such a bridge has to re-encrypt the traffic - which means that the plaintext is exposed on the bridge, putting it at risk and breaking the end-to-end encryption guarantee.

There are solutions to this, however:

  • We could run the bridge somewhere relatively safe - e.g. the user’s client. There’s a bunch of work going on already in Matrix to run clientside bridges, so that your laptop or phone effectively maintains a connection over to iMessage or WhatsApp or whatever as if it were logged in… but then relays the messages into Matrix once re-encrypted. By decentralising the bridges and spreading them around the internet, you avoid them becoming a single honeypot that bad actors might look to attack: instead it becomes more a question of endpoint compromise (which is already a risk today).

  • The gatekeeper could switch to a decentralised end-to-end encrypted protocol like Matrix to preserve end-to-end encryption throughout. This is obviously significant work on the gatekeeper’s side, but we shouldn’t rule it out. For instance, making the transition for a non-encrypted service is impressively little work, as we proved with Gitter. (We’d ideally need to figure out decentralised/federated identity-lookup first though, to avoid switching from one centralised identity database to another).

  • Worst case, we could flag to the user that their conversation is insecure (the chat equivalent of a scary TLS certificate warning). Honestly, this is something communication apps (including Matrix-based ones!) should be doing anyway: as a user you should be able to tell what 3rd parties (bots, integrations etc) have been added to a given conversation. Adding this sort of semantic actually opens up a much richer set of communication interactions, by giving the user the flexibility over who to trust with their data, even if it breaks the platonic ideal of pure E2E encryption.

On balance, we think that the benefits of mandating open APIs outweigh the risks that someone is going to run a vulnerable large-scale bridge and undermine everyone’s E2EE. It’s better to have the option to be able to get at your data in the first place than be held hostage in a walled garden.

Other considerations

One other complaint which has come up a bunch is around speed of innovation: the idea that WhatsApp or similar would be seriously slowed down by having to (effectively) maintain stable documented federation APIs, and figure out how to do backwards compatibility for new features. It’s true that this will take a bit more effort (similar to how adding GDPR compliance takes some effort), but the ends make it more than worth it. Plus, if the rag-tag Matrix ecosystem can do it, it doesn’t seem unreasonable to think that a $600B company like Meta can figure it out too...

Another consideration is that it might make it too easy to build malicious 3rd party clients - e.g. building your own "special" version of Signal which connects to the official service, but deliberately or otherwise has security flaws. The fact is that we're already in this position though: there are illicit alternative clients flying around all over the place, and the onus is on the app stores to protect their users from installing malware. This isn't reason to throw the baby of interoperability out with the bathwater of bootleg clients.

The final complaint is about moderation and abuse: while open APIs are good news for consumer choice, they can also be used by spammers, phishers and other miscreants to cause problems for the users within the gatekeeper. Much like a mediaeval citadel; opening up your walled garden means that both good and bad people can turn up. And much like real life, this is a solvable problem, even if it’s unfortunate: the benefits of free trade massively outweigh the downsides of having to police strangers more effectively. Frankly, moderation and anti-abuse approaches on the Internet today are infamously broken, with centralised moderation by gatekeepers producing increasingly erratic results. By opening the walled gardens, we are forcing a much-needed opportunity to review how to empower users and admins to filter unwanted content on their own terms. There’s a recent write-up of the proposed approach for Matrix at https://element.io/blog/moderation-needs-a-radical-change/, which outlines one strategy - but there are many others. Honestly, having to improve moderation tooling is a worthwhile price to pay for the benefits of open APIs.

So, there you have it. Hopefully you’ll agree that the benefits here outweigh the risks: without open APIs we wouldn't even have the option to talk about interoperability. We should be celebrating a new dawn for open access, rather than fearing that the sky is falling and this is nefarious attempt to undermine end-to-end encryption.

The Mega Matrix Holiday Special 2021

22.12.2021 23:27 — General Matthew Hodgson
Last update: 22.12.2021 17:54

Hi all,

If you’re reading this - congratulations; you made it through another year :) Every winter we sit down and review Matrix’s progress over the last twelve months, and look forward to the next - for it’s all too easy to get lost in the day-to-day development and fail to realise how much the overall project is evolving, especially when it’s one as large and ambitious as Matrix!

Looking back at 2021, it’s unbelievable how much stuff has been going on in the core team (as you can tell by the length of this post - sorry!). There’s been a really interesting mix of activity too - between massive improvements to the core functionality and baseline features that Matrix provides, and also major breakthroughs on next generation work. But first, let’s check out what’s been happening in the wider ecosystem…

The Matrix Ecosystem

Over 2021 the Matrix ecosystem has expanded unrecognisably. This time last year we were aware of 2 governments who were seriously adopting Matrix at scale (France and Germany), with the UK and US starting to roll out initial deployments. 12 months later, and we are now aware of 12 governments who are adopting Matrix in various capacities - and we hope to be able to talk about at least some of them in public in 2022! The UK and US have both progressed significantly too.

Meanwhile, one of the most exciting new public sector stories this year has been gematik: Germany’s national healthcare agency, who announced Matrix as the basis for interoperable secure messaging throughout the whole healthcare sector. This is a genuine step change for Matrix: rather than a government putting out tenders for “a secure messaging solution”, instead we are seeing tenders for Matrix solution providers. The Matrix industry is real; it exists today, and we’re seeing more and more new providers such as Famedly (building on the Flutter/Dart stack which powers FluffyChat) and Folivonet (building on the Trixnity Kotlin Multiplatform stack) stepping up to get involved - as well as many more big incumbents. We created Matrix in order to bootstrap a new decentralised communication industry, and frankly it’s amazing to see it actually taking shape.

Another big step change has been the number of existing chat providers looking to become part of the wider Matrix network. Back in September our friends at Rocket.Chat announced that they’re working on Matrix support for federation, perhaps inspired by our case study in making Gitter speak Matrix - and meanwhile Matrix comes up a lot in the context of Twitter’s Bluesky initiative, and a few big players we can’t yet mention have also been in touch wanting to natively talk Matrix too.

We’ve also seen a huge shift in big enterprises adopting Matrix for self-sovereign secure communication (although we can’t drop any names yet 😔). This may have been spurred on by such misadventures as Electronic Arts being compromised via a leaked Slack access token, but it feels like many of the biggest organisations now realise that unquestioningly handing their data to Slack or Teams is a bad idea, when they could have an end-to-end encrypted deployment of their own instead.

There has also been a turning point in legislation in favour of Matrix - with the EU Digital Markets Act pushing hard for interoperability for ‘big tech’ communication services in the EU (see Amandine’s take here), and meanwhile Eric Migicovsky, CEO at Beeper has been busy testifying to US Congress on the merits of interoperability too. It’s not inconceivable that we will soon live in a world where governments mandate that the walled gardens will finally have to open up, and we may see a whole new level of interest in communication providers wanting to join Matrix!

Communities themselves have also been embracing Matrix more and more over the last year: we were incredibly proud to host FOSDEM 2021, the world’s biggest open source conference via Matrix (all 35K attendees!) - and we’re gearing up to do it again in February for FOSDEM 2022 (this time with our very first FOSDEM Matrix dev room!). We were also really glad that Libera.chat let us point a dedicated homeserver and IRC bridge at their new IRC network (meaning you can join anywhere on Libera from Matrix as #channel:libera.chat, and talk to anyone as @nick:libera.chat). High profile open source projects have been adopting Matrix all over the place - Debian, Fedora, NixOS, Arch, Tor, Ansible, WHATWG and many others (check out this list!) now have their own Matrix servers and spaces. (You know things are busy when we haven’t had time to do a big blog post to announce folk as important as these joining the network!)

Finally, there has been an explosion of new projects and milestones in the wider community - Conduit entered beta as a super exciting lightweight Rust homeserver implementation; FluffyChat hit 1.0 with an impressively polished Flutter-based experience; Beeper pre-launched to huge amounts of mainstream excitement; Cinny exploded out of the blue as an incredibly elegant next-generation web client; Keanu materialised from The Guardian Project as their glossy Matrix client suite; Commune appeared as a hybrid messageboard/chatroom interface; Nheko has matured significantly with huge E2EE improvements and feature and VoIP polish; NeoChat and libQuotient development is progressing solidly; Fractal is busy with the fractal-next rewrite to move everything over to matrix-rust-sdk and GTK 4; Syphon continues to forge ahead as a privacy-focused Flutter-based client, and non-chat clients like The Board, Populus and Matrix Highlight have started to appear in earnest too! We also had a super successful Google Summer of Code this year, with a record number of 7 students participating in both core team and community projects.

Please note this is just a random sample of all the community news over the last year - to get more colour on what’s been going on, we highly recommend flipping through the This Week In Matrix archives!

The Matrix Spec

The Matrix spec is the single source of truth of what Matrix actually is, and this year it got some major improvements thanks to a beautiful new website at https://spec.matrix.org thanks to Will Bamberg, formerly of MDN (and who’s now back fighting the good fight with the MDN team at OWD).

Aside from the new spec site, we also released our first official point release in a while - Matrix 1.1, and we’re going to aim to keep regular releases happening once a quarter from here on in. It’s also worth noting that it’s very much a feature and not a bug that spec releases lag behind the various spec proposals which fly around as the core team and community experiment with new features like spaces, threads, etc. We very deliberately only merge change proposals to the spec which have been proven to work in real life implementations, and which have fully passed the spec review process (along with any dependencies they might have!).

Talking of which, in 2021 we saw a record 109 Matrix Spec Change proposals (MSCs) created. Even better, we closed 62 MSCs - so while the backlog is still growing, we’re still making very concrete progress. Of the 109 new MSCs: 34 were from the wider Matrix community, 34 were from ex-community contributors who are now part of the core team, 13 were from the founding Matrix team, and 28 were from folks hired to work on Matrix by Element on behalf of the Matrix.org Foundation. This feels like a pretty healthy blend of contributions, and while it’s true that spec work could always progress faster, things do seem to be heading in the right direction.

The latest MSC stats

In the new year, the Spec Core Team (responsible for reviewing MSCs and voting on what gets merged to the spec) is going to make a concerted effort to carve out more dedicated time for spec work - thankfully one of the side-effects of Matrix growing is that there are now a lot more people around with whom we can share other work, hopefully meaning that we can put significantly more hours into keeping the spec growing healthily.

Synapse

Synapse is the primary homeserver implementation published by the Matrix core team, and its maturity is unrecognisable from where we were a year ago. One of the big breakthroughs has been stabilising memory usage through caching improvements - the Matrix.org synapse now reliably only uses 2-3GB of RAM on its main process, despite its activity having more than doubled over the last year (up from 513K monthly active users to 1.11M!).

Synapse memory performance fixes

Further signs of maturity include Synapse’s radically improved new documentation and the new module API, the fact that mypy type-safety coverage has improved from ~75% to over 89.7% (across 151,903 lines of code!), and the fact that Open Tracing support has matured to the point that visualising complex cross-worker behaviour is nowadays a genuine pleasure. Frankly, Synapse should be feeling robust and stable these days - if you see folks claiming otherwise, please check they’re not basing that on outdated info (or failing that, get them to file bug reports that we can jump on!).

Meanwhile, on the feature side, we’ve landed a huge spate of long-awaited core functionality. Probably the best way to track it is by the Matrix Spec Change proposals (MSCs) which have been implemented (although I dare you to also go and check out Synapse’s changelog, all 675KB of it, which is frankly a thing of beauty and will take you down a rabbithole all the way back to v0.0.0 in Aug 2014 if you so desire ;P). Major MSCs which we’ve landed include:

  • Spaces! It’s hard to overstate how positive this has been for Matrix’s usability: at last, we can group our rooms together however we please, both for our own edification and to share with others - and we can view space hierarchies over federation, complete with pagination (MSC2946) as well as specify who can join a room based on whether they’re a member of a given space/room (MSC3083).
  • Threads! Yes, that’s right - coming any day now to a Matrix client close to you, we have ‘classic’ threaded messaging landing, providing sidebars of conversation through the new m.thread relation type (MSC3440), building on Matrix’s existing aggregation API as used for edits and reactions. We’ve chosen to prioritise single-level-deep-threads rather than arbitrarily-deep-trees (MSC2836) as it maps more easily to a chat UX, although the two approaches are not mutually exclusive.
  • Aggregations! Everyone’s favourite bête noire in Matrix tends to be that aggregations for edits & reactions predate today’s Matrix Spec Change process and went mainstream without using a vendor prefix before their spec had been stabilised. Better late than never, we’ve taken advantage of Threads to go back and fix what once went wrong - and now MSC2674 and MSC2675 and friends are hopefully on a much better track to provide a basis for how aggregations work - both in the spec and in the reference implementation in Synapse.

Anatomy of a bête noire

  • Social Login via multiple SSO providers (MSC2858) - almost 50% of new registrations on the Matrix.org homeserver now use social login! Interestingly the split of SSO usage is roughly 70% Google, 12% GitHub, 11% Apple, 6% Facebook and 1% GitLab. Make of that what you will!
  • Knocking (MSC2403)! Huge thanks to Sorunome and Anoa, we now support the ability to knock to ask to join a room if not yet invited. If this sounds unfamiliar, it may be because it hasn’t landed in Element yet, but expect it to land next year.
  • Refresh tokens (MSC2918)! At last, we have a standard way for clients to refresh their access tokens, so that if your access token leaks it will not give access to your account indefinitely. (This also has yet to land in Element, but has been proved on a branch on Hydrogen).

Finally, last but not least, Eric from Gitter has been fearlessly hacking his way through some of Matrix’s gnarliest problems in his quest to bring Matrix+Element up to full feature parity with Gitter. In practice, this means adding the ability to incrementally import old history into existing Matrix rooms (MSC2716), so we can expose the vast amounts of knowledge in Gitter’s archives directly into Matrix - and in future provide bridging in general of existing archives (Slack, Discord, mailing lists, newsgroups, forums, etc.) into Matrix.

This is a tough problem, as Matrix rooms are fundamentally immutable - events sent into a room cannot be changed. However, we can bend time a bit and add old chapters of history to the room as if we’d just discovered them down the back of the sofa - and this is what MSC2716 does. The (rewritten!) spec proposal is a thing of beauty and well worth a look, and you can see an early preview in action back on Matrix Live in June. Over the last few months it’s been merging and maturing in Synapse and we should see it in the wild in the near future! And for bonus points Eric’s also just added in Jump-to-date support (MSC3030), letting clients jump around room history by timestamp - another Gitter feature that we sorely need, and will also help us publish excellent Gitter-style online chat archives in future. You can see it in action in last week’s Matrix Live!

Element

Meanwhile, on the client side, Element continues to act as a flagship client to drive the development of the official client SDKs we ship as the Matrix.org Foundation - and our focus more than ever before has been to ensure that Matrix can be used to create mainstream-usable polished glossy apps. After all, Matrix will only succeed if clients emerge which can punch their weight against the enormous incumbents - be they Slack, Teams, WhatsApp or Discord.

This year, improving UX quality has been front and center - and hopefully the shift has been obvious in the app (and huge thanks to everyone who tweeted/tooted/enthused about improvements when they saw them!). Part of this has been ensuring that all new features are built in a design- and product-led fashion by folks who are explicitly focused on product engineering - with product design involved from the outset and with coordination and focus provided by product management folks. This is far from the typical way that FOSS operates, but if we’re to succeed against the incumbents we have to beat them at their own game (just as, for instance, Mozilla wields conventional product management in their browser wars).

More recently, there’s also been a major shift towards structured user testing in order to evaluate new features and analyse how users trip over the app in general, including radically improved analytics (for those who opt in!) to help visualise which bits of the app aren’t working. In the new year, the expectation is to double down on user testing: quite simply, if you can hand Element to a casual mainstream user and they can get the core jobs done (sign up, chat to someone, call someone, etc.) without tripping over, then mission successful :)

The Element blog covers the work this year from the Element side, but from the Matrix side, the key changes include:

  • finalising Spaces as a way to group together rooms - providing the equivalent of Discord servers or Slack workspaces, or alternatively letting you gather your own rooms together into a private space.
  • building out Threads (available in labs; launching soon!)
  • Social Login!
  • radically improving Element’s Information Architecture (i.e. the layout of the UI, so that the panels and buttons are correctly semantically grouped together in a visual hierarchy)
  • adding Voice Messages as a really beautiful polished feature powered by MSC3245
  • adding Location Share (available in labs; launching soon!) powered by MSC3488 (and in future MSC3489 for live-location sharing - in dev on iOS right now!)
  • adding Chat Export, thanks to the amazing GSOC work by Jaiwanth
  • adding Polls via MSC3381.

Spaces in all their glory

From a spec perspective, it’s been particularly exciting to be finally using Extensible Events (MSC1767) for many of these new features: voice messages, location sharing and polls are all experimenting with this new idiom for expressing richer structured data over Matrix while presenting a consistent and useful ‘fallback’ representation for clients which don’t know how to natively render the richer data.

We’ve also done a huge amount of work this year in improving 1:1 VoIP - both via MSC2746 and within the JS, iOS & Android Matrix SDKs. If you haven’t tried doing a 1:1 call via Matrix recently we’d highly recommend giving it a go - probably the main remaining bug at this point is that we need to find a better default ringtone for Element(!). Huge thanks go to Šimon Brandner both for his community contributions to VoIP and across all of Element Web - including proper screensharing for 1:1 (and group!) VoIP calls. This has also laid excellent groundwork for native Group VoIP/Video over Matrix - more on that later.

On Element Mobile, work on all the above features has been balanced by fighting against the various platform’s quirks, and lots of under-the-hood work improving performance. iOS has gone through a long journey to get back to stability after iOS’s push notification API changes, while also improving incremental sync performance by rearchitecting the local cache in the client. Android meanwhile has been working away improving the app; reworking Notifications, migrating to Kotlin coroutines and Hilt, and closing over 690 GH issues. Android has also had its fair share of dramas, including some recent long Play Store review times, but we’ve come through the other side intact.

However, we’ve been thinking more and more about the nightmarish pain point that is the amount of time we spend implementing the same features across the three different platforms. This becomes particularly apparent for security-sensitive features such as end-to-end encryption, or major API changes such as aggregations, spaces or sync v3 (more on that later). Or simply rapidly sharing improvements to implementation best practices between platforms.

Historically we consciously built platform-native Matrix SDKs in order to provide entirely idiomatic SDKs for other Matrix developers to use - and also to better dogfood the protocol and ensure that the heterogenous implementations could interoperate successfully. However, in practice, relatively few third party projects other than Element build on top of matrix-ios-sdk and matrix-android-sdk2 - and meanwhile there are more than enough other Matrix clients out there nowadays to dogfood interoperability against (including alternative experimental clients from the core team such as Hydrogen).

So, we’ve been thinking increasingly seriously about how to solve this…

A new hope: matrix-rust-sdk

matrix-rust-sdk is an attempt to build a new reference client SDK for Matrix which can be used by as many platforms as possible - hopefully forever stopping us from reimplementing the wheel more than we need to. Work began towards the end of 2019, building on top of Ruma’s excellent Matrix rust crates, and poljar has been working away solidly at it ever since. We teased matrix-rust-sdk in last year’s update, but as of this year it is properly coming of age and we’ve started using it in earnest - beginning by swapping out Element Android’s encryption implementation for matrix-rust-sdk-crypto (the E2EE cryptography crate provided by the SDK).

If you’re not familiar with Rust, the main benefits we get here are a heavy emphasis on safety and security without compromising performance; while providing a single codebase which can be used equally from iOS, native Desktop apps such as Fractal, Android (with native bindings) and even Web (via WASM, in future). While technically this results in a “non-native” SDK relative to matrix-js-sdk, matrix-ios-sdk and matrix-android-sdk - in practice, it’s become so common to depend on native-code shared libraries (outside the web, at least) that it’s not really a problem.

Initial results look wildly promising here: “Element R” (formerly known as Corroded Element - the codename for the Rust-enhanced version of Element Android) builds are now out there, and out-perform the kotlin E2EE implementation by roughly 10x, thanks to using native code and Rust’s improved parallelisation.

Our next step is to start using it on iOS, and we’ll be experimenting with a next-generation of Element iOS shortly in the new year with the SDK provided exclusively by matrix-rust-sdk. Element will also be funding more people to work fulltime on matrix-rust-sdk itself, and to see what the developer experience is like when you use it seriously on the Web - watch this space!

Bridges, Bots, Widgets and Integration Managers

Elsewhere in Matrix, the Bridge Crew has busy polishing bridges like crazy - working away on encrypted application services (MSC3202), massively improving the IRC bridge (particularly in the fallout of the great Freenode->Libera migration), stabilising and extending matrix-bifrost (our XMPP-and-more bridge), getting libpurple bridging working properly in bifrost, getting matrix-appservice-slack and matrix-appservice-discord stable enough to be hosted by EMS, experimenting with matrix-bot-sdk as an alternative bridging API, and even looking at adding matrix-rust-sdk-crypto into matrix-bot-sdk as an elegant way to power robust encrypted bridges (thus replacing Panatalaimon for that use case).

There’s also a new kid in town: matrix-hookshot (formerly known as matrix-github) is a new all-singing-all-dancing general purpose integration built on matrix-bot-sdk, coming soon to an integration manager near you, which can bridge through to GitHub, GitLab, JIRA and freeform webhooks! Check it out a few weeks ago on Matrix Live. matrix-hookshot is primarily Node, but is also getting in on the Rust action with some functions being implemented in native code.

Meanwhile, change is afoot for integration managers, which have been screaming out for an overhaul for years. There was a cheeky hint in last week’s Matrix Live where Dimension did an unexpected cameo looking particularly swish… All shall be revealed next year!

A whole new Dimension

Dendrite, Low bandwidth Matrix and Peer-to-Peer Matrix

Dendrite is our next-generation homeserver implementation written in Go, and having shipped the first beta in Oct 2020, we’ve cut another 11 releases over the course of this year - adding in features such as E2EE key backups, cross-signing, support for room versions 7, 8 and 9 (knocking and restricted join rules), massive state resolution performance improvements, an entirely new state storage implementation that uses ~15x less disk space, sync filtering, experimental support for peeking-over-federation (MSC2444) - not to mention huge numbers of bug fixes. Even more excitingly, we’re in the process of ditching Kafka in favour of native-Go message queuing in the form of NATS!

However, it’s been a bit of a weird year as the team has been repeatedly pulled onto other projects due to competing priorities - and there’s still a bunch of stuff left which is keeping us in beta. Some of this is plain old missing features (search, push rules/notifications, room upgrades, presence etc) - but we’ve also run up against some problems over the last few months while implementing new room versions and similar thanks to the sheer number of different microservices which Dendrite is made out of. In retrospect, it feels like Dendrite has ended up too granular, and when hacking on it you get slowed down badly by all the boilerplate required to glue the various services together. Therefore, we’ve just started to merge some of the services together - still preserving horizontal scaling of course, but refactoring the architecture a bit while we’re still in beta to help speed up development again. So far things are looking promising! We’re also really looking forwards to s7evinK joining the team to work on Dendrite fulltime in the coming weeks :)

Talking of competing priorities, there have been three other big missions going on at the same time as Dendrite dev: firstly - formalising Low Bandwidth Matrix. LB Matrix is super important for maximising battery life on mobile, as well as (obviously) supporting worse network conditions - and it’s effectively a prerequisite for P2P Matrix. We did a bunch of experiments around it back in 2019, but earlier in the year we needed it for real and MSC3079 was the result. The low bandwidth dialect which we’ve proposed in the MSC is designed for use on the real Internet using standard IETF protocols (CoAP + DTLS + CBOR) and so isn’t quite as exotic as the 2019 version, but still gives a ~10-20x bandwidth improvement over normal HTTP+JSON based Matrix. It hasn’t made it to Element yet, but if you’re interested go check out the blog post!

Secondly, we’ve been sidetracked by the entirety of P2P Matrix. This is our long-term mission to let Matrix run peer-to-peer without the need for any servers (or indeed Internet connectivity, thanks to Bluetooth Low Energy) by embedding servers such as Dendrite into clients such as Element and so let each Matrix Client have its own personal local homeserver. We’ve made massive progress over the course of the year on P2P - the biggest breakthroughs being Pinecone as an entirely new P2P overlay network, with the novel SNEK (sequentially networked edwards key) routing topology. (The animation below shows a P2P network arranging itself into a SNEK!)

SNEK

You can read all about it in the blog post, but suffice it to say that Pinecone outperformed all the other P2P overlay networks in meshnet-lab’s Mobility2 test:

Pinecone perf benchmarks

You can play with P2P Matrix today on iOS and Android (head over to #p2p:matrix.org for builds), but there is some major work still to be done:

  • We need to bridge to today’s Matrix network. Right now, having a weird experimental test network for P2P means that in practice nobody actually uses it other than for demos - whereas if you could actually talk to everything else in Matrix, it’d be way more compelling and interesting to use and dogfood. We’re currently thinking about how best to do this!
  • We need to standardise the actual transport to be used over Pinecone. Currently it uses HTTPS over μTP (purely because empirically it handled packet loss and congestion well, and LB Matrix wasn’t ready at the time). We’re currently experimenting with switching to LB Matrix using our own CoAP implementation called PineCoAP (potentially using pCoCoA congestion control, given CoAP doesn’t provide any congestion control out of the box), but this is early days.
  • We still need to finalise store-and-forward: if your destination is offline, do you buffer your transactions in the network somehow, or do you use another Matrix node to buffer them?
  • Relatedly, we need to tweak federation so that if events get lost, federation for a room can recover more gracefully than it does today - for instance, by bundling redundant auth events on transactions, or by providing more recovery mechanisms.
  • We still need to spec and implement multihomed accounts, so that your identity on your phone is not divorced from your identity on your laptop.
  • …and obviously, we need a robust post-beta Dendrite to act as the local homeserver!

Right now focus is going back to Dendrite for a bit, but P2P work will resume again in the new year :)

Finally, the third big distraction from Dendrite has been… sync v3.

Sync v3

Sync v3 is shaping up to be the single most significant improvement to Matrix since we began.

Syncing data from the homeserver to the client is obviously fundamental to Matrix - and the current behaviour (sync v2) is far from perfect, as it’s designed around the assumption that your client wants to receive information for every room that it’s in. In the early days of Matrix, this was fine: a typical user might be in tens of conversations, and it’s useful to have them all available for offline access. Nowadays, however, it’s a disaster: users can easily accumulate hundreds or thousands of rooms - especially with rooms used to describe spaces or profiles and other structured realtime data. Moreover, the number of rooms you’re in typically increases linearly over time, unbounded, as nobody wants to archive their old conversations.

So, the idea of sync v3 is that you only sync the strict subset of data that your client actually cares about to display in its UI - effectively making both initial and incremental sync instant, incredibly low bandwidth, and completely independent of the number of rooms you’re in (just as filesystem performance should be independent of the number of directories or files present).

For instance, the full initial sync for @matthew:matrix.org in sync v2 is 417MB of JSON uncompressed - or ~100MB if gzipped, and takes about 5 minutes to calculate on matrix.org (during which it murders the sync worker responsible and hammers the database like crazy). By contrast, sync v3’s initial sync is 15KB uncompressed, or 5106 bytes compressed - and synced in 250 microseconds from a local sync-v3 server. Yes folks, that’s somewhere between a 30,000x to 1,200,000x improvement over sync v2, depending on how you count it.

Sync v3 gets this unbelievable performance by the client defining a sliding window into the server’s datasets, sized and ordered as needed for the client’s UI. This effectively performs real-time serverside pagination, so that as the client scrolls around or filters their roomlist or membership list, the client requests new views from the server. Meanwhile the server sends incremental updates to the client if they intersect with the sliding window. This may sound unwieldy, but in practice it works fine (although we’ll have some interesting challenges when we get around to encrypting state events, given serverside ordering and filtering will become distinctly harder). It also doesn’t design out offline access, as the client caches its view of the world so even if you do go offline you can still work with all the data that has sent to your client so far (and the client could even proactively paginate in other content, if it wanted to, similar to an email client synchronising for offline access).

Sync v3 exists today as a proxy called sync-v3 which sits between any existing homeserver and a sync-v3-capable Matrix client. It’s very early days, but Hydrogen has basic v3 support on a branch which we’ve been using to experiment with the API and flesh it out - and you can see a demo and intro talk in last week’s Matrix Live!

The API itself is still in flux, but those interested can see the initial spec design at https://github.com/matrix-org/sync-v3/blob/main/api.md and also an MSC is emerging at MSC3575. Next steps will be to finish hooking up to Hydrogen (including filtering the room list), finish the MSC, and then start thinking about implementing it in other clients and servers!

Fast Joins over Federation

While we’re on the subject of speeding up Matrix… it’s all very well being able to sync your client instantly, but the other big complaint everyone has about Matrix is how long it takes to join rooms - especially big ones. As most people will know, it can easily take 5-10 minutes to join a large room like Matrix HQ on a new homeserver - and given this is the first experience most users have of running their own homeserver, it can prove pretty disastrous and we are determined to fix it. It will become even more relevant when we implement peeking over federation, as the last thing you want is to have to wait 5 minutes to temporarily dip into some random federated room to see if you want to join it or not (or to sniff its room state for things like extensible profiles or MSC2313 reputation rooms).

So, to address this, we’re currently in the middle of experimenting with MSC2775 (Lazyloading over Federation) in Synapse. This MSC lets servers participate in a room before they’ve received the full room state by defining a subset of state which is mandatory for participation, and then letting the rest get added lazily. It’s quite a violent change as it means the assumption that room state is complete (to the best of the server’s knowledge) is no longer true - but given Matrix already has to handle incomplete room state, it’s not necessarily a showstopper.

Watch this space for how well it works in practice, but we’re hoping for a ~20x speed improvement in joining Matrix HQ.

Hydrogen

2021 has been a busy year for Hydrogen - our ultra-lightweight Matrix Client, which provides a small but perfectly formed progressive web app for us to experiment on! There have been no fewer than 56 releases over the course of the year, with loads of contributions from Bruno, Midhun (who joined first as a GSOCcer and then as a fulltime Element employee) and also Danila who interned at Element on Hydrogen over the summer.

People often ask why Hydrogen exists as well as Element Web - and the reason is because Element Web is (for now at least) very far from a progressive web app and is stuffed full of features, whereas Hydrogen is intended to be as lightweight and simple and efficient as possible while also targeting as wide a range of web browsers as possible (even Internet Explorer!). It also provides a simpler platform for experimenting with new approaches such as sync v3 or OIDC without getting entangled in the constant hive of activity around Element Web. Finally, it gives us a playground to experiment with embeddable chat clients thanks to Hydrogen’s strict MVVM component model.

In terms of features, 2021 has seen huge steps forwards as Hydrogen converges on feature parity with Element - proper mentions and replies; rich formatted linkified messages; reactions; redactions; memberlist; member info; webpush notifications; proper image, video & file uploads; SSO login; sync v3(!) and so much more. Can’t wait to see what 2022 will bring!

End-to-End Encryption

2021 saw the long-awaited creation of a dedicated cryptography team to focus exclusively on improving encryption in Matrix: previously encryption expertise was split across various different areas, meaning that it could prove hard to carve out time to tackle the bigger remaining encryption challenges.

So far the team has been busy digging deep into the few remaining causes of UISIs (undecryptable messages), including automated UISI reporting and tracing E2EE flows end-to-end (from client to server to server to client). There’s also been an initial wave of UX work - with much more to come next year as we overhaul cross-signing and device backups to make it way more user friendly.

Meanwhile, on the more foundational side of things, we’re continuing to define Decentralised MLS as a potential next-generation form of end-to-end encryption, building on the IETF’s MLS work - providing much better scalability for large chat rooms and potentially helping with some causes of encryption failures. Hubert (uhoreg) has been leading the charge here, with his latest thoughts emerging here alongside a brand new demo showing his DMLS simulator - which under the hood is actually sending real Matrix events over DMLS!

DMLS

Otherwise, the team has had three big projects: adding matrix-rust-sdk-crypto into Element Android (which we already covered above), arranging a fresh security audit of Matrix’s end-to-end encryption (due to complete January 2022)… and, most excitingly: vodozemac.

Vodozemac (pronounced roughly vod-oz-eh-matz) is an entirely new implementation of our Olm and Megolm end-to-end encryption system, written from scratch in pure Rust, aiming to replace the original reference C/C++11 implementation in libolm. Originally written as an experiment for matrix-rust-sdk at the beginning of the year, in the last week it’s received a huge explosion of attention from poljar and dkasak to bring it up to production quality… for we decided that if we are doing a full E2EE audit for Matrix, we should target the new and future codebase rather than burn money on re-auditing the legacy libolm library (much as the original 2016 review of libolm happened when the library was fresh and new).

The motivation for vodozemac in general is to benefit from the intrinsic type and memory safety and fearless parallelism provided by Rust - and also maintain full type & memory safety throughout the matrix-rust-sdk stack, including encryption. Over the last year we’ve been taking more and more of a careful look at libolm, and despite our best efforts a few memory management bugs have crept in - which vodozemac should be immune to. Vodozemac will solve another embarrassing problem with libolm: that its default cryptography primitives are designed for correctness rather than performance or safety. By switching to Rust’s ed25519-dalek and rustCrypto AES primitives we should be in a much better position in terms of performance and safety.

Next up, we’ll be fully integrating vodozemac into matrix-rust-sdk, and figuring out how best to provide it as a libolm replacement in general.

Matrix Security

Alongside the new Cryptography team we’ve also established a new dedicated Security team for Matrix, led by dkasak. As well as fuzzing excursions into libolm and similar research, Denis has been handling all our security disclosure policy submissions, managing the Intigriti bug bounty programme, helping coordinate all our security releases, and coordinating the upcoming external independent security audit of vodozemac, matrix-rust-sdk, Element and Synapse. It’s a huge step forwards to be able to fund full-time infosec researchers to focus exclusively on Matrix, and this is just the beginning!

Trust and Safety

Another place where we’ve created a dedicated team this year is around Trust & Safety: building tools to fight spam and abuse on our own servers, while also empowering the wider network of users, moderators and admins to manage abuse as they see fit. This includes lots of work on Mjolnir, our primary moderation bot, but also defining MSCs such as MSC3215 (Aristotle: Moderation in all things) and MSC3531 (Letting moderators hide messages pending moderation) and internal tooling as we experiment with different approaches.

We’ll have more updates on this in the coming year as we release the tools we’ve been working on, but suffice it to say that the goal is to empower mainstream users in the wider Matrix network to apply their own rules as they see fit, directly from the comfort of their favourite Matrix client - without having to know what a Mjolnir is (or how to run one), and without having to be a moderation expert.

OpenID Connect

A new project brewing throughout 2021 has been the investigation into replacing the entirety of Matrix’s authentication APIs with industry standard OpenID Connect. Spearheaded by Quentin, this has proved to be a fascinating and challenging endeavour, but we’re starting to see some really interesting results. The problem we’re trying to solve here is:

  • As Matrix grows, we’re seeing more and more clients and services appearing which you might want to log into with your Matrix account. But do you really want to trust each app with your account password? And what if you only want to give it access to a small subset of your account?
  • Similarly, we’re seeing more and more login mechanisms used to access Matrix - it’s no longer just a matter just a username + password; many servers use single-sign-on (e.g. mozilla.org) or social login (fosdem.org, matrix.org), or layer on 2FA or MFA hardware tokens and similar to access their accounts via an SSO provider. We also see passwordless login on the horizon.

So, do we really want to mandate each new Matrix client to have to implement custom flows to handle this explosion of login/registration mechanisms? And is it even really the client’s problem in the first place? You’re securing access to your account on your chosen server, which isn’t really a client-specific thing at all.

The real turning point for the project however has been our recent experiences building out a new wave of single-use domain-specific clients (see below) for video conferencing, whiteboarding, metaverse-browsing etc… where by far the most painful bit of the project has been hooking up the UI for login, registration, guest access, incremental signup, password reset, email verification, CAPTCHA, SSO, etc. And that’s even when building on top of matrix-react-sdk, which theoretically has it all already thanks to Element Web!

Frankly, it has become blindingly obvious that it’s crazy for clients to reimplement this every time, and they should instead chuck the user over to a sign-on portal provided by their homeserver - just like Google and everyone else’s single-sign-on does. And rather than inventing our own homebrew way of doing that, we should just use the existing industry standard SSO best practices defined by OpenID Connect.

The main objections which have come up against this are: “what if my Matrix client doesn’t have a web browser, or what if I want to provide my own native login UI”, and “does this design out the idea of using a single password to access your account as well as your E2EE history”? In both instances, we have workarounds: in practice, there are so many Matrix clients around that we won’t be removing today’s legacy login/registration APIs any time soon (just like HTTP Basic Auth is still very much a thing on the web!). And in terms of “cryptographic login”, there are ways we could daisychain the auth required to unlock your E2EE storage to also authenticate you with your server - although this would be a major extension (much as cryptographic login is already today!)

The current status is that we’ve defined a set of initial MSCs (MSC2964, MSC2965, MSC2966 and MSC2967), and are implementing an initial Open ID Connect auth server (in Rust!) called matrix-authentication-service (better name suggestions welcome!) designed to sit alongside your homeserver, and we’re experimenting with hooking Hydrogen (and some of the new domain-specific clients) up to see how it feels. But if it goes as well as we think it might, folks should prepare for 2022 to be the year where Matrix’s authentication system finally gets fixed!

Native Matrix Video/VoIP Conferencing

One of the most anticipated features in Matrix over the years has been the prospect of native, decentralised, end-to-end encrypted video and voice conferencing. Today, voice and video conferencing in Matrix works by embedding Jitsi as a third party centralised service into your chatroom. This works fairly well - but Jitsi is an entirely separate service with lots of moving parts, and its own concept of users and access control (provided by XMPP!) and its megolm-based end-to-end-encryption doesn’t actually integrate with Matrix’s own Olm identities, verification or cross-signing. The fact that the conference is then logically centralised on whoever is hosting the Jitsi service also misses one Matrix’s main goals - that users should be able to hold a conversation without being dependent on any single service or provider. Plus it’s really confusing that Matrix has proper native 1:1 calls for DMs… but then switches to a totally different system in group chats.

So, this year we set out to fix it - and succeeded :D The solution hinges around MSC3401 - a spec proposal that describes how to extend native 1:1 calls to work for groups, while providing real flexibility on how to actually mix the calls together. At the simplest extreme, it defines how full mesh calls work (where every client simply calls every other client simultaneously) - but then also defines how you can mix calls together either using a single focus (conferencing server) or multiple foci run by different parties, where foci can either be Selective Forwarding Units (SFUs, like Jitsi) or Multipoint Conferencing Units (MCUs, like FreeSWITCH). The end result is to give us decentralised, cascading, end-to-end encrypted conferencing which even has direct compatibility with today’s 1:1 Matrix calling, letting you easily hook in bots and bridges which already support 1:1 Matrix calls!

Robert Long has been frantically hacking away at the initial implementation over the last few months, fleshing out full-mesh conferencing at first and getting it running in as many browsers as possible (including Mobile Safari and Chrome Android!). We were hoping to fully unveil the end result in time for Christmas, but in practice we hit some last minute snags (turns out Matthew forgot guest users can’t use TURN, who knew? so much for incremental login! 😰) which have pushed the launch to early next year. But hopefully in a few weeks, you’ll be able to start jumping on a native group call in Matrix!

Meanwhile, those interested can see all the gory details from our CommCon 2021 talk a few weeks ago, complete with a demo of the shape of things to come…

Next up, we’ll be working on building an MSC3401-compatible SFU so we can go beyond full mesh (which typically supports a maximum of ~7 callers). Our candidates right now are mediasoup, ion-sfu, janus and signal-calling-service - we’ll let you know how it goes! Also, if you’re interested in helping us build this out quicker, we are frantically searching for more WebRTC & VoIP gurus to join the team at Element working on this.

Applications Beyond Chat

Finally, 2021 was the year where we seriously started building out functionality on Matrix which goes far beyond plain old chat rooms.

Work began in the summer as a research project led by Ryan, formerly tech lead for Element Web - looking at ways to store hierarchical structured data into Matrix while preserving real-time semantics; effectively using Matrix as a collaborative decentralised object tree, providing CRDT (Conflict-free Replicated Data Types) to allow richer applications to be built on Matrix. This journey led him to create Patience as a test environment for building out these sort of clients, and meanwhile Timo (famous of The Board) joined the team to build out Full Screen Widgets in Element, providing a much better UI for beyond-chat experiments.

Meanwhile, Matthew Weidner and the Composable Systems Lab at CMU stunned us all by presenting a complete CRDT solution using Matrix named Collabs at Strange Loop 2021. This is really impressive stuff - the brave of heart can go and embed a Matrix-powered end-to-end-encrypted collaborative markdown editor straight into Element via Collabs by following the instructions here. In practice, Collabs works by serialising the CRDT updates as base64 blobs inside Matrix timeline events (hello Wave, is that you?), but we’re now investigating how you might reconcile this with maintaining a proper realtime object tree in Matrix.

It’s hard to overstate how powerful storing freeform tree CRDTs in Matrix would be. It could open up everything from decentralised encrypted collaborative document editing to collaborative whiteboarding and collaborative Figma-style (or Penpot- or Blender-style) design. You could even start storing an HTML DOM into a room, alongside its binary assets, giving you a multiplayer DOM to build on… and then imagine if you could store the syntax tree of the code operating on that DOM alongside it, in the same room. Before you know it, we will have created kind of some incredible Smalltalk / Croquet / Alan Kay nirvana where code is data and data is code and it’s all running live in some kind of decentralised encrypted multiplayer Metaverse :D

While we’ve been looking at storing object trees in Matrix, another obvious angle that has emerged is to use Matrix for encrypted decentralised file storage. MSC3089 is a proposal on how you might represent hierarchies of files in Matrix - where each room acts effectively as a directory of files, with spaces forming a directory structure (much as they do already in today’s Matrix), leveraging Matrix’s existing decentralised access control mechanisms to control who can access what. Combine such a file storage system with the collaborative editing capabilities mentioned above, and suddenly a really exciting proposition starts to emerge. We’re investigating this right now, and all will be revealed early next year…

Finally, and last but not least, Robert Long has been building on top of our shiny new Native Matrix Voice/Video Conferencing capabilities to use Matrix as the communication backbone for a truly open, equitable and interoperable vision of the Metaverse. The best way of describing it is to look at his awesome Third Room demo from the Open Metaverse Interoperability Group demo session in September:

Now, some folks will recall that since day one (in fact, since before day one) the hope for Matrix was that it might end up as the communications fabric of the Metaverse. We were about 4 years early when we first starting enthusing about this, and then still ahead of our time when we did the world’s first 3D Video calling over Matrix. However, it now feels like the world has finally caught up - and we’re in grave danger of being overtaken by a dystopia where the big tech companies balkanize the Metaverse into a series of closed proprietary user-exploiting walled gardens, much like today’s incumbent chat silos - but even worse.

This is our chance to fix it before it’s too late, and Element is funding a small but highly targeted team to focus exclusively on building out open interoperable Metaverse over Matrix - ensuring that collaboration in 3D (and 2D) spatial environments in future is decentralised, secure and standards-based. This obviously ties in directly with the rest of the Beyond Chat projects listed above: it’s early days, but it’s incredibly exciting to imagine where we could end up if this works!

Finally, a question which has kept coming up while working on Beyond Chat projects has been whether to implement this new functionality as Matrix widgets, bake them into existing Matrix clients, or build them as domain-specific dedicated Matrix clients. But perhaps we’re thinking about this all wrong: what if your Matrix client was just a browser for Matrix rooms? Some of these could be chatrooms. Some of these could be VoIP/Video conferences or Discord-style voice/video rooms. Some of these could be message boards or mailing lists. Some of these could be collaborative editors or whiteboards. Some of these could be 3D views into the metaverse. Some of these could be rendered via widgets; some could be rendered natively if the client knows how. And some of these could even be good old web pages(!!!).

Imagine if your Matrix client was effectively a genuine browser of arbitrary decentralised realtime content? If your view into a Matrix room was just that: a full window view into that room, be it textual or 2D or 3D - and your Matrix client was just a browser which added the necessary chrome and navigation to help you tab between rooms, login and logout, manage your encryption, track who’s in the room, track your notifications, etc.?

Meanwhile, if you’re in a web browser, you might hop into a lightweight single-page domain-specific webapp which happens to use Matrix for collaboration. Or if you’re in a Matrix client/browser, you could hop to the same matrix URL to get at the same functionality with all the supporting chrome and UI overlays sliding in as needed…

Perhaps the vision of Matrix as the missing communication layer of the open Web is more literal than we ever thought. Eitherway, it will be fascinating to see how Applications Beyond Chat evolves over the next year.

2022

Now, I dare you to cross-reference all of the above with last year’s predictions for 2021 to see how we did :D In practice, the only things from the list we haven’t got to are peeking-over-federation (although arguably fast joins are a key part of that), account portability, and restoring incremental sign-up (although our new clients have it!).

So, here go the predictions for 2022 (keeping it short, otherwise it’ll be 2023 before this blog post gets finished…):

  • Client polish and performance - our prime directive is to ensure that Matrix clients can be built with UX polish and quality which exceeds our centralised alternatives. In practice, this means:
    • Element must spark joy. Ensuring Element’s Information Architecture continues to be simplified and refined, and that nobody who knows how to use a computer hits a WTF moment when first using the app. Never again do we want to see someone on Twitter saying “I have no idea how to use Matrix”.
    • Instant launch. With Sync v3 and matrix-rust-sdk we hope to make Element launch instantly on all platforms - including initial sync.
    • Fast joins. We should never get bored while waiting to join a room or accept an invite.
    • Spaces. While Spaces are already a huge improvement in letting users organise and discover rooms, there’s still much more to be done:
      • Flair - Users who are members of a space should be able to announce it loud and proud with a Flair badge on their avatar, like we used to with the old pre-spaces Communities feature (MSC3219 being the potential proposal).
      • Synchronising access controls - You should be able to apply access controls based on whether a user is a member of a given group (so that if you invite them to #moderators:example.com, they automatically get made moderator in all the rooms in a given space). It looks likely that this will be implemented at last using joepie91’s MSC3216 proposal for Synchronized access control for Spaces (rather than Matthew’s original MSC2962 - an excellent example of the community steering the spec process :)
      • Bulk joins - It should be a one-button operation to join all the rooms in a space.
      • **Subspaces **- as more and more spaces emerge, the ability to navigate them as a hierarchy becomes more and more useful. We want to get to the point where we can turn off the Matrix.org public rooms list, and instead present a Space tree of all the good rooms we know about in Matrix… delegating over curation to the wider community; building a huge USENET-style hierarchy of where to go in Matrix. To do that, we need subspaces to sing!
      • Removing communities/groups, which will then be entirely superseded by spaces.
    • **Threads **go-live!
    • Location share go-live
    • Pinned messages, so the most important messages are always visible to everyone n the room
    • Starred messages, so you never lose a message ever again
    • Custom emoji, finally merging in all the custom emoji work from the community.
  • matrix-rust-sdk
    • Element iOS on rust-sdk
    • Element Android on rust-sdk-crypto
    • …and experiment to see how matrix-rust-sdk feels on Web? It’s a real shame that Daydream got archived…
  • Encryption
    • Vodozemac in matrix-rust-sdk, maybe even elsewhere.
    • **Updated E2EE Audit **spanning vodozemac, olm+megolm, matrix-rust-sdk… and a representative sample of a typical Element+Synapse deployment.
    • DMLS - getting to the point where we can experiment with it in real clients.
    • Encryption Agility - the ability to migrate encrypted history is going to become really important as we evolve our E2EE, whether that’s by adding in post-quantum algorithms, or moving from Megolm to MLS, or any other shifts. We will need to start thinking about it in 2022.
  • Next-generation MSCs
    • Aggregations - finalising the foundational MSCs for aggregations, at last
    • Extensible events - finalising the foundational MSCs for extensible events, at last
    • **Sync v3 **- finalising the MSC and implementing it in matrix-rust-sdk
    • Fast joins - getting them implemented in Synapse and Dendrite
    • Peeking over federation - getting them implemented in Synapse and Dendrite
    • Extensible profiles - who needs a facebook wall when you have a profile room on Matrix?
    • Open ID Connect - using OIDC as an alternative auth mechanism for new clients.
  • Gitter parity
    • Importing the Gitter archives into Matrix via MSC2716
    • Implementing excellent public static Matrix archives (replacing both view.matrix.org and gitter.im’s static views)
    • Transfiguring Gitter into a Gitter-themed Element
  • Dendrite
    • **Parity with Synapse **- and out of beta, with any luck!
  • P2P Matrix
    • Exposing the normal Matrix network via P2P!
    • Multihomed accounts
    • Store and forward (if only by relaying via other P2P Matrix nodes)
    • **Low bandwidth transports **- via PineCoAP or similar
    • Making federation robust in a highly disconnected network.
  • Hydrogen
    • **Daily Driver **- making sure that Hydrogen can be readily used as a daily driver Matrix client, even if it lacks full parity with Element.
    • **Embeddable Hydrogen **- making the most of Hydrogen as a tiny lightweight PWA to embed it into existing websites.
  • Bots and Bridges
    • **Landing End-to-Bridge-Encryption **for all existing matrix-appservice-bridge based bridges
    • All the integrations!
    • First-class UI for configuring integrations!
  • Trust & Safety
    • Empower users to manage abuse within their communities.
  • Native Group Voice/Video Conferencing
    • Launch a standalone conferencing app!
    • Build a MSC3401-capable SFU
    • Add native group calling to Element
  • Border gateways
    • Something we didn’t mention in 2021 is the increasing interest in building border gateways and hardware cross domain gateways to safely link different Matrix federations together. We expect to see a lot of activity in this space in 2022, and there should be some new MSCs too :)
  • Beyond Chat
    • **Metaverse on Matrix **- building out the dream as per above!
    • **Collaborative editing **- extending Matrix to store trees of events, and collaborate on them in realtime - starting with a collaborative editor!
    • File storage in Matrix - building out real-life file storage on top of Matrix.

So, there you have it. If you’ve got this far… it’s incredible; you’re amazing: thank you for reading! The sheer length of this update shows just how much Matrix has grown in 2021 relative to previous years; it’s frankly terrifying to imagine how long the equivalent post will be next year. We may have to change the format a little :)

And that’s a wrap for 2021: we hope you stay safe and have an excellent end of the year. Huge thanks for flying Matrix and supporting the project - we literally wouldn’t be here without you.

- Matthew, Amandine & the whole Matrix core team.

Disclosing CVE-2021-40823 and CVE-2021-40824: E2EE vulnerability in multiple Matrix clients

13.09.2021 17:29 — Security Denis Kasak

Today we are disclosing a critical security issue affecting multiple Matrix clients and libraries including Element (Web/Desktop/Android), FluffyChat, Nheko, Cinny, and SchildiChat. Element on iOS is not affected.

Specifically, in certain circumstances it may be possible to trick vulnerable clients into disclosing encryption keys for messages previously sent by that client to user accounts later compromised by an attacker.

Exploiting this vulnerability to read encrypted messages requires gaining control over the recipient’s account. This requires either compromising their credentials directly or compromising their homeserver.

Thus, the greatest risk is to users who are in encrypted rooms containing malicious servers. Admins of malicious servers could attempt to impersonate their users' devices in order to spy on messages sent by vulnerable clients in that room.

This is not a vulnerability in the Matrix or Olm/Megolm protocols, nor the libolm implementation. It is an implementation bug in certain Matrix clients and SDKs which support end-to-end encryption (“E2EE”).

We have no evidence of the vulnerability being exploited in the wild.

This issue was discovered during an internal audit by Denis Kasak, a security researcher at Element.

Remediation and Detection

Patched versions of affected clients are available now; please upgrade as soon as possible — we apologise sincerely for the inconvenience. If you are unable to upgrade, consider keeping vulnerable clients offline until you can. If vulnerable clients are offline, they cannot be tricked into disclosing keys. They may safely return online once updated.

Unfortunately, it is difficult or impossible to retroactively identify instances of this attack with standard logging levels present on both clients and servers. However, as the attack requires account compromise, homeserver administrators may wish to review their authentication logs for any indications of inappropriate access.

Similarly, users should review the list of devices connected to their account with an eye toward missing, untrusted, or non-functioning devices. Because an attacker must impersonate an existing or historical device, exploiting this vulnerability would either break an existing login on the user’s account, or a historical device would be re-added and flagged as untrusted.

Lastly, if you have previously verified the users / devices in a room, you would witness the safety shield on the room turn red during the attack, indicating the presence of an untrusted and potentially malicious device.

Affected Software

Given the severity of this issue, Element attempted to review all known encryption-capable Matrix clients and libraries so that patches could be prepared prior to public disclosure.

Known vulnerable software:

We believe the following software is not vulnerable:

We believe the following are not vulnerable due to not implementing key sharing:

Background

Matrix supports the concept of “key sharing”, letting a Matrix client which lacks the keys to decrypt a message request those keys from that user's other devices or the original sender's device.

This was a feature added in 2016 in order to address edge cases where a newly logged-in device might not have the necessary keys to decrypt historical messages. Specifically, if other devices in the room are unaware of the new device due to a network partition, they have no way to encrypt for it—meaning that the only way the new device will be able to decrypt history is if the recipient's other devices share the necessary keys with it.

Other situations where key sharing is desirable include when the recipient hasn't backed up their keys (either online or offline) and needs them to decrypt history on a new login, or when facing implementation bugs which prevent clients from sending keys correctly. Requesting keys from a user's other devices sidesteps these issues.

Key sharing is described here in the Matrix E2EE Implementation Guide, which contains the following paragraph:

In order to securely implement key sharing, clients must not reply to every key request they receive. The recommended strategy is to share the keys automatically only to verified devices of the same user.

This is the approach taken in the original implementation in matrix-js-sdk, as used in Element Web and others, with the extension of also letting the sending device service keyshare requests from recipient devices. Unfortunately, the implementation did not sufficiently verify the identity of the device requesting the keyshare, meaning that a compromised account can impersonate the device requesting the keys, creating this vulnerability.

This is not a protocol or specification bug, but an implementation bug which was then unfortunately replicated in other independent implementations.

While we believe we have identified and contacted all affected E2EE client implementations: if your client implements key sharing requests, we strongly recommend you check that you cryptographically verify the identity of the device which originated the key sharing request.

Next Steps

The fact that this vulnerability was independently introduced so many times is a clear signal that the current wording in the Matrix Spec and the E2EE Implementation Guide is insufficient. We will thoroughly review the related documentation and revise it with clear guidelines on safely implementing key sharing.

Going further, we will also consider whether key sharing is still a necessary part of the Matrix protocol. If it is not, we will remove it. As discussed above, key sharing was originally introduced to make E2EE more reliable while we were ironing out its many edge cases and failure modes. Meanwhile, implementations have become much more robust, to the point that we may be able to go without key sharing completely. We will also consider changing how we present situations in which you cannot decrypt messages because the original sender was not aware of your presence. For example, undecryptable messages could be filed in a separate conversation thread, or those messages could require that keys are shared manually, effectively turning a bug into a feature.

We will also accelerate our work on matrix-rust-sdk as a portable reference implementation of the Matrix protocol, avoiding the implicit requirement that each independent library must necessarily reimplement this logic on its own. This will have the effect of reducing attack surface and simplifying audits for software which chooses to use matrix-rust-sdk.

Finally, we apologise to the wider Matrix community for the inconvenience and disruption of this issue. While Element discovered this vulnerability during an internal audit of E2EE implementations, we will be funding an independent end-to-end audit of the reference Matrix E2EE implementations (not just Olm + libolm) in the near future to help mitigate the risk from any future vulnerabilities. The results of this audit will be made publicly available.

Timeline

Ultimately, Element took two weeks from initial discovery to completing an audit of all known, public E2EE implementations. It took a further week to coordinate disclosure, culminating in today's announcement.

  • Monday, 23rd August — Discovery that Element Web is exploitable.
  • Thursday, 26th August — Determination that Element Android is exploitable with a modified attack.
  • Wednesday, 1 September — Determination that Element iOS fails safe in the presence of device changes.
  • Friday, 3 September — Determination that FluffyChat and Nheko are exploitable.
  • Tuesday, 7th September — Audit of Matrix clients and libraries complete.
  • Wednesday, 8th September — Affected software authors contacted, disclosure timelines agreed.
  • Friday, 10th September — Public pre-disclosure notification. Downstream packagers (e.g., Linux distributions) notified via Matrix and e-mail.
  • Monday, 13th September — Coordinated releases of all affected software, public disclosure.