Yesterday the EU Parliament & Council agreed on the contents of the Digital
Markets Act - new legislation from the EU intended to limit anticompetitive
behaviour from tech “gatekeepers”, i.e. big tech companies (those with market
share larger than €75B or with more than €7.5B a year of revenue).
This is absolutely landmark legislation, where the EU has decided not to break
the gatekeepers up in order to create a more competitive marketplace - but
instead to “break them open”. This is unbelievably good news for the open
Internet, as it is obligating the gatekeepers to provide open APIs for their
communication services. In other words: no longer will the tech giants be
able to arbitrarily lock their users inside their walled gardens - there will
be a legal requirement for them to expose APIs to other services.
While the formal outcomes of yesterday’s agreement haven’t been published yet
(beyond this press release),
our understanding is that the DMA will mandate:
Gatekeepers will have to provide open and documented APIs to their
services, on request, in order to facilitate interoperability (i.e. so
that other services can communicate with their users).
These APIs must preserve the same level of end-to-end encryption (if any)
to remote users as is available to local users.
This applies to 1:1 messaging and file transfer in the short term, and
group messaging, file-transfer, 1:1 VoIP and group VoIP in the longer
term.
This is the best possible outcome imaginable for the open internet. Never
again will a big tech company be able to hold their users hostage in a walled
garden, or arbitrarily close down or sabotage their APIs.
So, what’s the catch?
Since the DMA announcement on Thursday, there’s been quite a lot
of yelling from some very
experienced voices that mandating interoperability via open APIs is going to
irrevocably undermine end-to-end encrypted messengers like WhatsApp. This
seems to mainly be born out of a concern that the DMA is somehow trying to
subvert end-to-end encryption, despite the fact that the DMA explicitly
mandates that the APIs must expose the same level of security, including
end-to-end encryption, that local users are using. (N.B. Signal doesn’t
qualify as a gatekeeper, so none of this is relevant to Signal).
So, for WhatsApp, it means that the API would expose both the message-passing
interface as well as the key management APIs required to interoperate with
WhatsApp using your own end-to-end-encrypted WhatsApp client - E2EE would be
preserved.
However, this does mean that if you were to actively interoperate between
providers (e.g. if Matrix turned up and asked WhatsApp, post DMA, to expose
an API we could use to write bridges against), then that bridge would need to
convert between WhatsApp’s E2EE’d payloads and Matrix’s E2EE’d payloads.
(Even though both WhatsApp and Matrix use the Double Ratchet, the actual
payloads within the encryption are completely different and would need to be
converted). Therefore such a bridge has to re-encrypt the traffic - which
means that the plaintext is exposed on the bridge, putting it at risk and
breaking the end-to-end encryption guarantee.
There are solutions to this, however:
We could run the bridge somewhere relatively safe - e.g. the user’s client.
There’s a bunch of work going on already in Matrix to run clientside
bridges, so that your laptop or phone effectively maintains a connection
over to iMessage or WhatsApp or whatever as if it were logged in… but then
relays the messages into Matrix once re-encrypted. By decentralising the
bridges and spreading them around the internet, you avoid them becoming a
single honeypot that bad actors might look to attack: instead it becomes
more a question of endpoint compromise (which is already a risk today).
The gatekeeper could switch to a decentralised end-to-end encrypted protocol
like Matrix to preserve end-to-end encryption throughout. This is
obviously significant work on the gatekeeper’s side, but we shouldn’t rule
it out. For instance, making the transition for a non-encrypted service is
impressively little work, as we proved with Gitter.
(We’d ideally need to figure out decentralised/federated identity-lookup
first though, to avoid switching from one centralised identity database
to another).
Worst case, we could flag to the user that their conversation is insecure
(the chat equivalent of a scary TLS certificate warning). Honestly, this
is something communication apps (including Matrix-based ones!) should be
doing anyway: as a user you should be able to tell what 3rd parties
(bots, integrations etc) have been added to a given conversation. Adding
this sort of semantic actually opens up a much richer set of communication
interactions, by giving the user the flexibility over who to trust with
their data, even if it breaks the platonic ideal of pure E2E encryption.
On balance, we think that the benefits of mandating open APIs outweigh the
risks that someone is going to run a vulnerable large-scale bridge and
undermine everyone’s E2EE. It’s better to have the option to be able to get
at your data in the first place than be held hostage in a walled garden.
Other considerations
One other complaint which has come up a bunch is around speed of innovation:
the idea that WhatsApp or similar would be seriously slowed down by having
to (effectively) maintain stable documented federation APIs, and figure out
how to do backwards compatibility for new features. It’s true that this will
take a bit more effort (similar to how adding GDPR compliance takes some
effort), but the ends make it more than worth it. Plus, if the rag-tag
Matrix ecosystem can do it, it doesn’t seem unreasonable to think that a
$600B company like Meta can figure it out too...
Another consideration is that it might make it too easy to build malicious 3rd
party clients - e.g. building your own "special" version of Signal which
connects to the official service, but deliberately or otherwise has security
flaws. The fact is that we're already in this position though: there are
illicit alternative clients flying around all over the place, and the onus is
on the app stores to protect their users from installing malware. This isn't
reason to throw the baby of interoperability out with the bathwater of
bootleg clients.
The final complaint is about moderation and abuse: while open APIs are good
news for consumer choice, they can also be used by spammers, phishers and
other miscreants to cause problems for the users within the gatekeeper. Much
like a mediaeval citadel; opening up your walled garden means that both good
and bad people can turn up. And much like real life, this is a solvable problem,
even if it’s unfortunate: the benefits of free trade massively outweigh the
downsides of having to police strangers more effectively. Frankly,
moderation and anti-abuse approaches on the Internet today are infamously
broken, with centralised moderation by gatekeepers producing increasingly
erratic results. By opening the walled gardens, we are forcing a much-needed
opportunity to review how to empower users and admins to filter unwanted
content on their own terms. There’s a recent write-up of the proposed
approach for Matrix at
https://element.io/blog/moderation-needs-a-radical-change/,
which outlines one strategy - but there are many others. Honestly, having to improve
moderation tooling is a worthwhile price to pay for the benefits of open
APIs.
So, there you have it. Hopefully you’ll agree that the benefits here outweigh
the risks: without open APIs we wouldn't even have the option to talk about
interoperability. We should be celebrating a new dawn for open access,
rather than fearing that the sky is falling and this is nefarious attempt to
undermine end-to-end encryption.
If you’re reading this - congratulations; you made it through another year :) Every winter we sit down and review Matrix’s progress over the last twelve months, and look forward to the next - for it’s all too easy to get lost in the day-to-day development and fail to realise how much the overall project is evolving, especially when it’s one as large and ambitious as Matrix!
Looking back at 2021, it’s unbelievable how much stuff has been going on in the core team (as you can tell by the length of this post - sorry!). There’s been a really interesting mix of activity too - between massive improvements to the core functionality and baseline features that Matrix provides, and also major breakthroughs on next generation work. But first, let’s check out what’s been happening in the wider ecosystem…
The Matrix Ecosystem
Over 2021 the Matrix ecosystem has expanded unrecognisably. This time last year we were aware of 2 governments who were seriously adopting Matrix at scale (France and Germany), with the UK and US starting to roll out initial deployments. 12 months later, and we are now aware of 12 governments who are adopting Matrix in various capacities - and we hope to be able to talk about at least some of them in public in 2022! The UK and US have both progressed significantly too.
Meanwhile, one of the most exciting new public sector stories this year has been gematik: Germany’s national healthcare agency, who announced Matrix as the basis for interoperable secure messaging throughout the whole healthcare sector. This is a genuine step change for Matrix: rather than a government putting out tenders for “a secure messaging solution”, instead we are seeing tenders for Matrix solution providers. The Matrix industry is real; it exists today, and we’re seeing more and more new providers such as Famedly (building on the Flutter/Dart stack which powers FluffyChat) and Folivonet (building on the Trixnity Kotlin Multiplatform stack) stepping up to get involved - as well as many more big incumbents. We created Matrix in order to bootstrap a new decentralised communication industry, and frankly it’s amazing to see it actually taking shape.
Another big step change has been the number of existing chat providers looking to become part of the wider Matrix network. Back in September our friends at Rocket.Chat announced that they’re working on Matrix support for federation, perhaps inspired by our case study in making Gitter speak Matrix - and meanwhile Matrix comes up a lot in the context of Twitter’s Bluesky initiative, and a few big players we can’t yet mention have also been in touch wanting to natively talk Matrix too.
We’ve also seen a huge shift in big enterprises adopting Matrix for self-sovereign secure communication (although we can’t drop any names yet 😔). This may have been spurred on by such misadventures as Electronic Arts being compromised via a leaked Slack access token, but it feels like many of the biggest organisations now realise that unquestioningly handing their data to Slack or Teams is a bad idea, when they could have an end-to-end encrypted deployment of their own instead.
There has also been a turning point in legislation in favour of Matrix - with the EU Digital Markets Act pushing hard for interoperability for ‘big tech’ communication services in the EU (see Amandine’s take here), and meanwhile Eric Migicovsky, CEO at Beeper has been busy testifying to US Congress on the merits of interoperability too. It’s not inconceivable that we will soon live in a world where governments mandate that the walled gardens will finally have to open up, and we may see a whole new level of interest in communication providers wanting to join Matrix!
Communities themselves have also been embracing Matrix more and more over the last year: we were incredibly proud to host FOSDEM 2021, the world’s biggest open source conference via Matrix (all 35K attendees!) - and we’re gearing up to do it again in February for FOSDEM 2022 (this time with our very first FOSDEM Matrix dev room!). We were also really glad that Libera.chat let us point a dedicated homeserver and IRC bridge at their new IRC network (meaning you can join anywhere on Libera from Matrix as #channel:libera.chat, and talk to anyone as @nick:libera.chat). High profile open source projects have been adopting Matrix all over the place - Debian, Fedora, NixOS, Arch, Tor, Ansible, WHATWG and many others (check out this list!) now have their own Matrix servers and spaces. (You know things are busy when we haven’t had time to do a big blog post to announce folk as important as these joining the network!)
Finally, there has been an explosion of new projects and milestones in the wider community - Conduit entered beta as a super exciting lightweight Rust homeserver implementation; FluffyChat hit 1.0 with an impressively polished Flutter-based experience; Beeper pre-launched to huge amounts of mainstream excitement; Cinny exploded out of the blue as an incredibly elegant next-generation web client; Keanu materialised from The Guardian Project as their glossy Matrix client suite; Commune appeared as a hybrid messageboard/chatroom interface; Nheko has matured significantly with huge E2EE improvements and feature and VoIP polish; NeoChat and libQuotient development is progressing solidly; Fractal is busy with the fractal-next rewrite to move everything over to matrix-rust-sdk and GTK 4; Syphon continues to forge ahead as a privacy-focused Flutter-based client, and non-chat clients like The Board, Populus and Matrix Highlight have started to appear in earnest too! We also had a super successful Google Summer of Code this year, with a record number of 7 students participating in both core team and community projects.
Please note this is just a random sample of all the community news over the last year - to get more colour on what’s been going on, we highly recommend flipping through the This Week In Matrix archives!
The Matrix Spec
The Matrix spec is the single source of truth of what Matrix actually is, and this year it got some major improvements thanks to a beautiful new website at https://spec.matrix.org thanks to Will Bamberg, formerly of MDN (and who’s now back fighting the good fight with the MDN team at OWD).
Aside from the new spec site, we also released our first official point release in a while - Matrix 1.1, and we’re going to aim to keep regular releases happening once a quarter from here on in. It’s also worth noting that it’s very much a feature and not a bug that spec releases lag behind the various spec proposals which fly around as the core team and community experiment with new features like spaces, threads, etc. We very deliberately only merge change proposals to the spec which have been proven to work in real life implementations, and which have fully passed the spec review process (along with any dependencies they might have!).
Talking of which, in 2021 we saw a record 109 Matrix Spec Change proposals (MSCs) created. Even better, we closed 62 MSCs - so while the backlog is still growing, we’re still making very concrete progress. Of the 109 new MSCs: 34 were from the wider Matrix community, 34 were from ex-community contributors who are now part of the core team, 13 were from the founding Matrix team, and 28 were from folks hired to work on Matrix by Element on behalf of the Matrix.org Foundation. This feels like a pretty healthy blend of contributions, and while it’s true that spec work could always progress faster, things do seem to be heading in the right direction.
In the new year, the Spec Core Team (responsible for reviewing MSCs and voting on what gets merged to the spec) is going to make a concerted effort to carve out more dedicated time for spec work - thankfully one of the side-effects of Matrix growing is that there are now a lot more people around with whom we can share other work, hopefully meaning that we can put significantly more hours into keeping the spec growing healthily.
Synapse
Synapse is the primary homeserver implementation published by the Matrix core team, and its maturity is unrecognisable from where we were a year ago. One of the big breakthroughs has been stabilising memory usage through caching improvements - the Matrix.org synapse now reliably only uses 2-3GB of RAM on its main process, despite its activity having more than doubled over the last year (up from 513K monthly active users to 1.11M!).
Further signs of maturity include Synapse’s radically improved new documentation and the new module API, the fact that mypy type-safety coverage has improved from ~75% to over 89.7% (across 151,903 lines of code!), and the fact that Open Tracing support has matured to the point that visualising complex cross-worker behaviour is nowadays a genuine pleasure. Frankly, Synapse should be feeling robust and stable these days - if you see folks claiming otherwise, please check they’re not basing that on outdated info (or failing that, get them to file bug reports that we can jump on!).
Meanwhile, on the feature side, we’ve landed a huge spate of long-awaited core functionality. Probably the best way to track it is by the Matrix Spec Change proposals (MSCs) which have been implemented (although I dare you to also go and check out Synapse’s changelog, all 675KB of it, which is frankly a thing of beauty and will take you down a rabbithole all the way back to v0.0.0 in Aug 2014 if you so desire ;P). Major MSCs which we’ve landed include:
Spaces! It’s hard to overstate how positive this has been for Matrix’s usability: at last, we can group our rooms together however we please, both for our own edification and to share with others - and we can view space hierarchies over federation, complete with pagination (MSC2946) as well as specify who can join a room based on whether they’re a member of a given space/room (MSC3083).
Threads! Yes, that’s right - coming any day now to a Matrix client close to you, we have ‘classic’ threaded messaging landing, providing sidebars of conversation through the new m.thread relation type (MSC3440), building on Matrix’s existing aggregation API as used for edits and reactions. We’ve chosen to prioritise single-level-deep-threads rather than arbitrarily-deep-trees (MSC2836) as it maps more easily to a chat UX, although the two approaches are not mutually exclusive.
Aggregations! Everyone’s favourite bête noire in Matrix tends to be that aggregations for edits & reactions predate today’s Matrix Spec Change process and went mainstream without using a vendor prefix before their spec had been stabilised. Better late than never, we’ve taken advantage of Threads to go back and fix what once went wrong - and now MSC2674 and MSC2675 and friends are hopefully on a much better track to provide a basis for how aggregations work - both in the spec and in the reference implementation in Synapse.
Social Login via multiple SSO providers (MSC2858) - almost 50% of new registrations on the Matrix.org homeserver now use social login! Interestingly the split of SSO usage is roughly 70% Google, 12% GitHub, 11% Apple, 6% Facebook and 1% GitLab. Make of that what you will!
Knocking (MSC2403)! Huge thanks to Sorunome and Anoa, we now support the ability to knock to ask to join a room if not yet invited. If this sounds unfamiliar, it may be because it hasn’t landed in Element yet, but expect it to land next year.
Refresh tokens (MSC2918)! At last, we have a standard way for clients to refresh their access tokens, so that if your access token leaks it will not give access to your account indefinitely. (This also has yet to land in Element, but has been proved on a branch on Hydrogen).
Finally, last but not least, Eric from Gitter has been fearlessly hacking his way through some of Matrix’s gnarliest problems in his quest to bring Matrix+Element up to full feature parity with Gitter. In practice, this means adding the ability to incrementally import old history into existing Matrix rooms (MSC2716), so we can expose the vast amounts of knowledge in Gitter’s archives directly into Matrix - and in future provide bridging in general of existing archives (Slack, Discord, mailing lists, newsgroups, forums, etc.) into Matrix.
This is a tough problem, as Matrix rooms are fundamentally immutable - events sent into a room cannot be changed. However, we can bend time a bit and add old chapters of history to the room as if we’d just discovered them down the back of the sofa - and this is what MSC2716 does. The (rewritten!) spec proposal is a thing of beauty and well worth a look, and you can see an early preview in action back on Matrix Live in June. Over the last few months it’s been merging and maturing in Synapse and we should see it in the wild in the near future! And for bonus points Eric’s also just added in Jump-to-date support (MSC3030), letting clients jump around room history by timestamp - another Gitter feature that we sorely need, and will also help us publish excellent Gitter-style online chat archives in future. You can see it in action in last week’s Matrix Live!
Element
Meanwhile, on the client side, Element continues to act as a flagship client to drive the development of the official client SDKs we ship as the Matrix.org Foundation - and our focus more than ever before has been to ensure that Matrix can be used to create mainstream-usable polished glossy apps. After all, Matrix will only succeed if clients emerge which can punch their weight against the enormous incumbents - be they Slack, Teams, WhatsApp or Discord.
This year, improving UX quality has been front and center - and hopefully the shift has been obvious in the app (and huge thanks to everyone who tweeted/tooted/enthused about improvements when they saw them!). Part of this has been ensuring that all new features are built in a design- and product-led fashion by folks who are explicitly focused on product engineering - with product design involved from the outset and with coordination and focus provided by product management folks. This is far from the typical way that FOSS operates, but if we’re to succeed against the incumbents we have to beat them at their own game (just as, for instance, Mozilla wields conventional product management in their browser wars).
More recently, there’s also been a major shift towards structured user testing in order to evaluate new features and analyse how users trip over the app in general, including radically improved analytics (for those who opt in!) to help visualise which bits of the app aren’t working. In the new year, the expectation is to double down on user testing: quite simply, if you can hand Element to a casual mainstream user and they can get the core jobs done (sign up, chat to someone, call someone, etc.) without tripping over, then mission successful :)
The Element blog covers the work this year from the Element side, but from the Matrix side, the key changes include:
finalising Spaces as a way to group together rooms - providing the equivalent of Discord servers or Slack workspaces, or alternatively letting you gather your own rooms together into a private space.
building out Threads (available in labs; launching soon!)
Social Login!
radically improving Element’s Information Architecture (i.e. the layout of the UI, so that the panels and buttons are correctly semantically grouped together in a visual hierarchy)
adding Location Share (available in labs; launching soon!) powered by MSC3488 (and in future MSC3489 for live-location sharing - in dev on iOS right now!)
adding Chat Export, thanks to the amazing GSOC work by Jaiwanth
From a spec perspective, it’s been particularly exciting to be finally using Extensible Events (MSC1767) for many of these new features: voice messages, location sharing and polls are all experimenting with this new idiom for expressing richer structured data over Matrix while presenting a consistent and useful ‘fallback’ representation for clients which don’t know how to natively render the richer data.
We’ve also done a huge amount of work this year in improving 1:1 VoIP - both via MSC2746 and within the JS, iOS & Android Matrix SDKs. If you haven’t tried doing a 1:1 call via Matrix recently we’d highly recommend giving it a go - probably the main remaining bug at this point is that we need to find a better default ringtone for Element(!). Huge thanks go to Šimon Brandner both for his community contributions to VoIP and across all of Element Web - including proper screensharing for 1:1 (and group!) VoIP calls. This has also laid excellent groundwork for native Group VoIP/Video over Matrix - more on that later.
On Element Mobile, work on all the above features has been balanced by fighting against the various platform’s quirks, and lots of under-the-hood work improving performance. iOS has gone through a long journey to get back to stability after iOS’s push notification API changes, while also improving incremental sync performance by rearchitecting the local cache in the client. Android meanwhile has been working away improving the app; reworking Notifications, migrating to Kotlin coroutines and Hilt, and closing over 690 GH issues. Android has also had its fair share of dramas, including some recent long Play Store review times, but we’ve come through the other side intact.
However, we’ve been thinking more and more about the nightmarish pain point that is the amount of time we spend implementing the same features across the three different platforms. This becomes particularly apparent for security-sensitive features such as end-to-end encryption, or major API changes such as aggregations, spaces or sync v3 (more on that later). Or simply rapidly sharing improvements to implementation best practices between platforms.
Historically we consciously built platform-native Matrix SDKs in order to provide entirely idiomatic SDKs for other Matrix developers to use - and also to better dogfood the protocol and ensure that the heterogenous implementations could interoperate successfully. However, in practice, relatively few third party projects other than Element build on top of matrix-ios-sdk and matrix-android-sdk2 - and meanwhile there are more than enough other Matrix clients out there nowadays to dogfood interoperability against (including alternative experimental clients from the core team such as Hydrogen).
So, we’ve been thinking increasingly seriously about how to solve this…
A new hope: matrix-rust-sdk
matrix-rust-sdk is an attempt to build a new reference client SDK for Matrix which can be used by as many platforms as possible - hopefully forever stopping us from reimplementing the wheel more than we need to. Work began towards the end of 2019, building on top of Ruma’s excellent Matrix rust crates, and poljar has been working away solidly at it ever since. We teased matrix-rust-sdk in last year’s update, but as of this year it is properly coming of age and we’ve started using it in earnest - beginning by swapping out Element Android’s encryption implementation for matrix-rust-sdk-crypto (the E2EE cryptography crate provided by the SDK).
If you’re not familiar with Rust, the main benefits we get here are a heavy emphasis on safety and security without compromising performance; while providing a single codebase which can be used equally from iOS, native Desktop apps such as Fractal, Android (with native bindings) and even Web (via WASM, in future). While technically this results in a “non-native” SDK relative to matrix-js-sdk, matrix-ios-sdk and matrix-android-sdk - in practice, it’s become so common to depend on native-code shared libraries (outside the web, at least) that it’s not really a problem.
Initial results look wildly promising here: “Element R” (formerly known as Corroded Element - the codename for the Rust-enhanced version of Element Android) builds are now out there, and out-perform the kotlin E2EE implementation by roughly 10x, thanks to using native code and Rust’s improved parallelisation.
Our next step is to start using it on iOS, and we’ll be experimenting with a next-generation of Element iOS shortly in the new year with the SDK provided exclusively by matrix-rust-sdk. Element will also be funding more people to work fulltime on matrix-rust-sdk itself, and to see what the developer experience is like when you use it seriously on the Web - watch this space!
Bridges, Bots, Widgets and Integration Managers
Elsewhere in Matrix, the Bridge Crew has busy polishing bridges like crazy - working away on encrypted application services (MSC3202), massively improving the IRC bridge (particularly in the fallout of the great Freenode->Libera migration), stabilising and extending matrix-bifrost (our XMPP-and-more bridge), getting libpurple bridging working properly in bifrost, getting matrix-appservice-slack and matrix-appservice-discord stable enough to be hosted by EMS, experimenting with matrix-bot-sdk as an alternative bridging API, and even looking at adding matrix-rust-sdk-crypto into matrix-bot-sdk as an elegant way to power robust encrypted bridges (thus replacing Panatalaimon for that use case).
There’s also a new kid in town: matrix-hookshot (formerly known as matrix-github) is a new all-singing-all-dancing general purpose integration built on matrix-bot-sdk, coming soon to an integration manager near you, which can bridge through to GitHub, GitLab, JIRA and freeform webhooks! Check it out a few weeks ago on Matrix Live. matrix-hookshot is primarily Node, but is also getting in on the Rust action with some functions being implemented in native code.
Meanwhile, change is afoot for integration managers, which have been screaming out for an overhaul for years. There was a cheeky hint in last week’s Matrix Live where Dimension did an unexpected cameo looking particularly swish… All shall be revealed next year!
Dendrite, Low bandwidth Matrix and Peer-to-Peer Matrix
Dendrite is our next-generation homeserver implementation written in Go, and having shipped the first beta in Oct 2020, we’ve cut another 11 releases over the course of this year - adding in features such as E2EE key backups, cross-signing, support for room versions 7, 8 and 9 (knocking and restricted join rules), massive state resolution performance improvements, an entirely new state storage implementation that uses ~15x less disk space, sync filtering, experimental support for peeking-over-federation (MSC2444) - not to mention huge numbers of bug fixes. Even more excitingly, we’re in the process of ditching Kafka in favour of native-Go message queuing in the form of NATS!
However, it’s been a bit of a weird year as the team has been repeatedly pulled onto other projects due to competing priorities - and there’s still a bunch of stuff left which is keeping us in beta. Some of this is plain old missing features (search, push rules/notifications, room upgrades, presence etc) - but we’ve also run up against some problems over the last few months while implementing new room versions and similar thanks to the sheer number of different microservices which Dendrite is made out of. In retrospect, it feels like Dendrite has ended up too granular, and when hacking on it you get slowed down badly by all the boilerplate required to glue the various services together. Therefore, we’ve just started to merge some of the services together - still preserving horizontal scaling of course, but refactoring the architecture a bit while we’re still in beta to help speed up development again. So far things are looking promising! We’re also really looking forwards to s7evinK joining the team to work on Dendrite fulltime in the coming weeks :)
Talking of competing priorities, there have been three other big missions going on at the same time as Dendrite dev: firstly - formalising Low Bandwidth Matrix. LB Matrix is super important for maximising battery life on mobile, as well as (obviously) supporting worse network conditions - and it’s effectively a prerequisite for P2P Matrix. We did a bunch of experiments around it back in 2019, but earlier in the year we needed it for real and MSC3079 was the result. The low bandwidth dialect which we’ve proposed in the MSC is designed for use on the real Internet using standard IETF protocols (CoAP + DTLS + CBOR) and so isn’t quite as exotic as the 2019 version, but still gives a ~10-20x bandwidth improvement over normal HTTP+JSON based Matrix. It hasn’t made it to Element yet, but if you’re interested go check out the blog post!
Secondly, we’ve been sidetracked by the entirety of P2P Matrix. This is our long-term mission to let Matrix run peer-to-peer without the need for any servers (or indeed Internet connectivity, thanks to Bluetooth Low Energy) by embedding servers such as Dendrite into clients such as Element and so let each Matrix Client have its own personal local homeserver. We’ve made massive progress over the course of the year on P2P - the biggest breakthroughs being Pinecone as an entirely new P2P overlay network, with the novel SNEK (sequentially networked edwards key) routing topology. (The animation below shows a P2P network arranging itself into a SNEK!)
You can read all about it in the blog post, but suffice it to say that Pinecone outperformed all the other P2P overlay networks in meshnet-lab’s Mobility2 test:
You can play with P2P Matrix today on iOS and Android (head over to #p2p:matrix.org for builds), but there is some major work still to be done:
We need to bridge to today’s Matrix network. Right now, having a weird experimental test network for P2P means that in practice nobody actually uses it other than for demos - whereas if you could actually talk to everything else in Matrix, it’d be way more compelling and interesting to use and dogfood. We’re currently thinking about how best to do this!
We need to standardise the actual transport to be used over Pinecone. Currently it uses HTTPS over μTP (purely because empirically it handled packet loss and congestion well, and LB Matrix wasn’t ready at the time). We’re currently experimenting with switching to LB Matrix using our own CoAP implementation called PineCoAP (potentially using pCoCoA congestion control, given CoAP doesn’t provide any congestion control out of the box), but this is early days.
We still need to finalise store-and-forward: if your destination is offline, do you buffer your transactions in the network somehow, or do you use another Matrix node to buffer them?
Relatedly, we need to tweak federation so that if events get lost, federation for a room can recover more gracefully than it does today - for instance, by bundling redundant auth events on transactions, or by providing more recovery mechanisms.
We still need to spec and implement multihomed accounts, so that your identity on your phone is not divorced from your identity on your laptop.
…and obviously, we need a robust post-beta Dendrite to act as the local homeserver!
Right now focus is going back to Dendrite for a bit, but P2P work will resume again in the new year :)
Finally, the third big distraction from Dendrite has been… sync v3.
Sync v3
Sync v3 is shaping up to be the single most significant improvement to Matrix since we began.
Syncing data from the homeserver to the client is obviously fundamental to Matrix - and the current behaviour (sync v2) is far from perfect, as it’s designed around the assumption that your client wants to receive information for every room that it’s in. In the early days of Matrix, this was fine: a typical user might be in tens of conversations, and it’s useful to have them all available for offline access. Nowadays, however, it’s a disaster: users can easily accumulate hundreds or thousands of rooms - especially with rooms used to describe spaces or profiles and other structured realtime data. Moreover, the number of rooms you’re in typically increases linearly over time, unbounded, as nobody wants to archive their old conversations.
So, the idea of sync v3 is that you only sync the strict subset of data that your client actually cares about to display in its UI - effectively making both initial and incremental sync instant, incredibly low bandwidth, and completely independent of the number of rooms you’re in (just as filesystem performance should be independent of the number of directories or files present).
For instance, the full initial sync for @matthew:matrix.org in sync v2 is 417MB of JSON uncompressed - or ~100MB if gzipped, and takes about 5 minutes to calculate on matrix.org (during which it murders the sync worker responsible and hammers the database like crazy). By contrast, sync v3’s initial sync is 15KB uncompressed, or 5106 bytes compressed - and synced in 250 microseconds from a local sync-v3 server. Yes folks, that’s somewhere between a 30,000x to 1,200,000x improvement over sync v2, depending on how you count it.
Sync v3 gets this unbelievable performance by the client defining a sliding window into the server’s datasets, sized and ordered as needed for the client’s UI. This effectively performs real-time serverside pagination, so that as the client scrolls around or filters their roomlist or membership list, the client requests new views from the server. Meanwhile the server sends incremental updates to the client if they intersect with the sliding window. This may sound unwieldy, but in practice it works fine (although we’ll have some interesting challenges when we get around to encrypting state events, given serverside ordering and filtering will become distinctly harder). It also doesn’t design out offline access, as the client caches its view of the world so even if you do go offline you can still work with all the data that has sent to your client so far (and the client could even proactively paginate in other content, if it wanted to, similar to an email client synchronising for offline access).
Sync v3 exists today as a proxy called sync-v3 which sits between any existing homeserver and a sync-v3-capable Matrix client. It’s very early days, but Hydrogen has basic v3 support on a branch which we’ve been using to experiment with the API and flesh it out - and you can see a demo and intro talk in last week’s Matrix Live!
The API itself is still in flux, but those interested can see the initial spec design at https://github.com/matrix-org/sync-v3/blob/main/api.md and also an MSC is emerging at MSC3575. Next steps will be to finish hooking up to Hydrogen (including filtering the room list), finish the MSC, and then start thinking about implementing it in other clients and servers!
Fast Joins over Federation
While we’re on the subject of speeding up Matrix… it’s all very well being able to sync your client instantly, but the other big complaint everyone has about Matrix is how long it takes to join rooms - especially big ones. As most people will know, it can easily take 5-10 minutes to join a large room like Matrix HQ on a new homeserver - and given this is the first experience most users have of running their own homeserver, it can prove pretty disastrous and we are determined to fix it. It will become even more relevant when we implement peeking over federation, as the last thing you want is to have to wait 5 minutes to temporarily dip into some random federated room to see if you want to join it or not (or to sniff its room state for things like extensible profiles or MSC2313 reputation rooms).
So, to address this, we’re currently in the middle of experimenting with MSC2775 (Lazyloading over Federation) in Synapse. This MSC lets servers participate in a room before they’ve received the full room state by defining a subset of state which is mandatory for participation, and then letting the rest get added lazily. It’s quite a violent change as it means the assumption that room state is complete (to the best of the server’s knowledge) is no longer true - but given Matrix already has to handle incomplete room state, it’s not necessarily a showstopper.
Watch this space for how well it works in practice, but we’re hoping for a ~20x speed improvement in joining Matrix HQ.
Hydrogen
2021 has been a busy year for Hydrogen - our ultra-lightweight Matrix Client, which provides a small but perfectly formed progressive web app for us to experiment on! There have been no fewer than 56 releases over the course of the year, with loads of contributions from Bruno, Midhun (who joined first as a GSOCcer and then as a fulltime Element employee) and also Danila who interned at Element on Hydrogen over the summer.
People often ask why Hydrogen exists as well as Element Web - and the reason is because Element Web is (for now at least) very far from a progressive web app and is stuffed full of features, whereas Hydrogen is intended to be as lightweight and simple and efficient as possible while also targeting as wide a range of web browsers as possible (even Internet Explorer!). It also provides a simpler platform for experimenting with new approaches such as sync v3 or OIDC without getting entangled in the constant hive of activity around Element Web. Finally, it gives us a playground to experiment with embeddable chat clients thanks to Hydrogen’s strict MVVM component model.
In terms of features, 2021 has seen huge steps forwards as Hydrogen converges on feature parity with Element - proper mentions and replies; rich formatted linkified messages; reactions; redactions; memberlist; member info; webpush notifications; proper image, video & file uploads; SSO login; sync v3(!) and so much more. Can’t wait to see what 2022 will bring!
End-to-End Encryption
2021 saw the long-awaited creation of a dedicated cryptography team to focus exclusively on improving encryption in Matrix: previously encryption expertise was split across various different areas, meaning that it could prove hard to carve out time to tackle the bigger remaining encryption challenges.
So far the team has been busy digging deep into the few remaining causes of UISIs (undecryptable messages), including automated UISI reporting and tracing E2EE flows end-to-end (from client to server to server to client). There’s also been an initial wave of UX work - with much more to come next year as we overhaul cross-signing and device backups to make it way more user friendly.
Meanwhile, on the more foundational side of things, we’re continuing to define Decentralised MLS as a potential next-generation form of end-to-end encryption, building on the IETF’s MLS work - providing much better scalability for large chat rooms and potentially helping with some causes of encryption failures. Hubert (uhoreg) has been leading the charge here, with his latest thoughts emerging here alongside a brand new demo showing his DMLS simulator - which under the hood is actually sending real Matrix events over DMLS!
Otherwise, the team has had three big projects: adding matrix-rust-sdk-crypto into Element Android (which we already covered above), arranging a fresh security audit of Matrix’s end-to-end encryption (due to complete January 2022)… and, most excitingly: vodozemac.
Vodozemac (pronounced roughly vod-oz-eh-matz) is an entirely new implementation of our Olm and Megolm end-to-end encryption system, written from scratch in pure Rust, aiming to replace the original reference C/C++11 implementation in libolm. Originally written as an experiment for matrix-rust-sdk at the beginning of the year, in the last week it’s received a huge explosion of attention from poljar and dkasak to bring it up to production quality… for we decided that if we are doing a full E2EE audit for Matrix, we should target the new and future codebase rather than burn money on re-auditing the legacy libolm library (much as the original 2016 review of libolm happened when the library was fresh and new).
The motivation for vodozemac in general is to benefit from the intrinsic type and memory safety and fearless parallelism provided by Rust - and also maintain full type & memory safety throughout the matrix-rust-sdk stack, including encryption. Over the last year we’ve been taking more and more of a careful look at libolm, and despite our best efforts a few memory management bugs have crept in - which vodozemac should be immune to. Vodozemac will solve another embarrassing problem with libolm: that its default cryptography primitives are designed for correctness rather than performance or safety. By switching to Rust’s ed25519-dalek and rustCrypto AES primitives we should be in a much better position in terms of performance and safety.
Next up, we’ll be fully integrating vodozemac into matrix-rust-sdk, and figuring out how best to provide it as a libolm replacement in general.
Matrix Security
Alongside the new Cryptography team we’ve also established a new dedicated Security team for Matrix, led by dkasak. As well as fuzzing excursions into libolm and similar research, Denis has been handling all our security disclosure policy submissions, managing the Intigriti bug bounty programme, helping coordinate all our security releases, and coordinating the upcoming external independent security audit of vodozemac, matrix-rust-sdk, Element and Synapse. It’s a huge step forwards to be able to fund full-time infosec researchers to focus exclusively on Matrix, and this is just the beginning!
Trust and Safety
Another place where we’ve created a dedicated team this year is around Trust & Safety: building tools to fight spam and abuse on our own servers, while also empowering the wider network of users, moderators and admins to manage abuse as they see fit. This includes lots of work on Mjolnir, our primary moderation bot, but also defining MSCs such as MSC3215 (Aristotle: Moderation in all things) and MSC3531 (Letting moderators hide messages pending moderation) and internal tooling as we experiment with different approaches.
We’ll have more updates on this in the coming year as we release the tools we’ve been working on, but suffice it to say that the goal is to empower mainstream users in the wider Matrix network to apply their own rules as they see fit, directly from the comfort of their favourite Matrix client - without having to know what a Mjolnir is (or how to run one), and without having to be a moderation expert.
OpenID Connect
A new project brewing throughout 2021 has been the investigation into replacing the entirety of Matrix’s authentication APIs with industry standard OpenID Connect. Spearheaded by Quentin, this has proved to be a fascinating and challenging endeavour, but we’re starting to see some really interesting results. The problem we’re trying to solve here is:
As Matrix grows, we’re seeing more and more clients and services appearing which you might want to log into with your Matrix account. But do you really want to trust each app with your account password? And what if you only want to give it access to a small subset of your account?
Similarly, we’re seeing more and more login mechanisms used to access Matrix - it’s no longer just a matter just a username + password; many servers use single-sign-on (e.g. mozilla.org) or social login (fosdem.org, matrix.org), or layer on 2FA or MFA hardware tokens and similar to access their accounts via an SSO provider. We also see passwordless login on the horizon.
So, do we really want to mandate each new Matrix client to have to implement custom flows to handle this explosion of login/registration mechanisms? And is it even really the client’s problem in the first place? You’re securing access to your account on your chosen server, which isn’t really a client-specific thing at all.
The real turning point for the project however has been our recent experiences building out a new wave of single-use domain-specific clients (see below) for video conferencing, whiteboarding, metaverse-browsing etc… where by far the most painful bit of the project has been hooking up the UI for login, registration, guest access, incremental signup, password reset, email verification, CAPTCHA, SSO, etc. And that’s even when building on top of matrix-react-sdk, which theoretically has it all already thanks to Element Web!
Frankly, it has become blindingly obvious that it’s crazy for clients to reimplement this every time, and they should instead chuck the user over to a sign-on portal provided by their homeserver - just like Google and everyone else’s single-sign-on does. And rather than inventing our own homebrew way of doing that, we should just use the existing industry standard SSO best practices defined by OpenID Connect.
The main objections which have come up against this are: “what if my Matrix client doesn’t have a web browser, or what if I want to provide my own native login UI”, and “does this design out the idea of using a single password to access your account as well as your E2EE history”? In both instances, we have workarounds: in practice, there are so many Matrix clients around that we won’t be removing today’s legacy login/registration APIs any time soon (just like HTTP Basic Auth is still very much a thing on the web!). And in terms of “cryptographic login”, there are ways we could daisychain the auth required to unlock your E2EE storage to also authenticate you with your server - although this would be a major extension (much as cryptographic login is already today!)
The current status is that we’ve defined a set of initial MSCs (MSC2964, MSC2965, MSC2966 and MSC2967), and are implementing an initial Open ID Connect auth server (in Rust!) called matrix-authentication-service (better name suggestions welcome!) designed to sit alongside your homeserver, and we’re experimenting with hooking Hydrogen (and some of the new domain-specific clients) up to see how it feels. But if it goes as well as we think it might, folks should prepare for 2022 to be the year where Matrix’s authentication system finally gets fixed!
Native Matrix Video/VoIP Conferencing
One of the most anticipated features in Matrix over the years has been the prospect of native, decentralised, end-to-end encrypted video and voice conferencing. Today, voice and video conferencing in Matrix works by embedding Jitsi as a third party centralised service into your chatroom. This works fairly well - but Jitsi is an entirely separate service with lots of moving parts, and its own concept of users and access control (provided by XMPP!) and its megolm-based end-to-end-encryption doesn’t actually integrate with Matrix’s own Olm identities, verification or cross-signing. The fact that the conference is then logically centralised on whoever is hosting the Jitsi service also misses one Matrix’s main goals - that users should be able to hold a conversation without being dependent on any single service or provider. Plus it’s really confusing that Matrix has proper native 1:1 calls for DMs… but then switches to a totally different system in group chats.
So, this year we set out to fix it - and succeeded :D The solution hinges around MSC3401 - a spec proposal that describes how to extend native 1:1 calls to work for groups, while providing real flexibility on how to actually mix the calls together. At the simplest extreme, it defines how full mesh calls work (where every client simply calls every other client simultaneously) - but then also defines how you can mix calls together either using a single focus (conferencing server) or multiple foci run by different parties, where foci can either be Selective Forwarding Units (SFUs, like Jitsi) or Multipoint Conferencing Units (MCUs, like FreeSWITCH). The end result is to give us decentralised, cascading, end-to-end encrypted conferencing which even has direct compatibility with today’s 1:1 Matrix calling, letting you easily hook in bots and bridges which already support 1:1 Matrix calls!
Robert Long has been frantically hacking away at the initial implementation over the last few months, fleshing out full-mesh conferencing at first and getting it running in as many browsers as possible (including Mobile Safari and Chrome Android!). We were hoping to fully unveil the end result in time for Christmas, but in practice we hit some last minute snags (turns out Matthew forgot guest users can’t use TURN, who knew? so much for incremental login! 😰) which have pushed the launch to early next year. But hopefully in a few weeks, you’ll be able to start jumping on a native group call in Matrix!
Meanwhile, those interested can see all the gory details from our CommCon 2021 talk a few weeks ago, complete with a demo of the shape of things to come…
Next up, we’ll be working on building an MSC3401-compatible SFU so we can go beyond full mesh (which typically supports a maximum of ~7 callers). Our candidates right now are mediasoup, ion-sfu, janus and signal-calling-service - we’ll let you know how it goes! Also, if you’re interested in helping us build this out quicker, we are frantically searching for more WebRTC & VoIP gurus to join the team at Element working on this.
Applications Beyond Chat
Finally, 2021 was the year where we seriously started building out functionality on Matrix which goes far beyond plain old chat rooms.
Work began in the summer as a research project led by Ryan, formerly tech lead for Element Web - looking at ways to store hierarchical structured data into Matrix while preserving real-time semantics; effectively using Matrix as a collaborative decentralised object tree, providing CRDT (Conflict-free Replicated Data Types) to allow richer applications to be built on Matrix. This journey led him to create Patience as a test environment for building out these sort of clients, and meanwhile Timo (famous of The Board) joined the team to build out Full Screen Widgets in Element, providing a much better UI for beyond-chat experiments.
Meanwhile, Matthew Weidner and the Composable Systems Lab at CMU stunned us all by presenting a complete CRDT solution using Matrix named Collabs at Strange Loop 2021. This is really impressive stuff - the brave of heart can go and embed a Matrix-powered end-to-end-encrypted collaborative markdown editor straight into Element via Collabs by following the instructions here. In practice, Collabs works by serialising the CRDT updates as base64 blobs inside Matrix timeline events (hello Wave, is that you?), but we’re now investigating how you might reconcile this with maintaining a proper realtime object tree in Matrix.
It’s hard to overstate how powerful storing freeform tree CRDTs in Matrix would be. It could open up everything from decentralised encrypted collaborative document editing to collaborative whiteboarding and collaborative Figma-style (or Penpot- or Blender-style) design. You could even start storing an HTML DOM into a room, alongside its binary assets, giving you a multiplayer DOM to build on… and then imagine if you could store the syntax tree of the code operating on that DOM alongside it, in the same room. Before you know it, we will have created kind of some incredible Smalltalk / Croquet / Alan Kay nirvana where code is data and data is code and it’s all running live in some kind of decentralised encrypted multiplayer Metaverse :D
While we’ve been looking at storing object trees in Matrix, another obvious angle that has emerged is to use Matrix for encrypted decentralised file storage. MSC3089 is a proposal on how you might represent hierarchies of files in Matrix - where each room acts effectively as a directory of files, with spaces forming a directory structure (much as they do already in today’s Matrix), leveraging Matrix’s existing decentralised access control mechanisms to control who can access what. Combine such a file storage system with the collaborative editing capabilities mentioned above, and suddenly a really exciting proposition starts to emerge. We’re investigating this right now, and all will be revealed early next year…
Finally, and last but not least, Robert Long has been building on top of our shiny new Native Matrix Voice/Video Conferencing capabilities to use Matrix as the communication backbone for a truly open, equitable and interoperable vision of the Metaverse. The best way of describing it is to look at his awesome Third Room demo from the Open Metaverse Interoperability Group demo session in September:
Now, some folks will recall that since day one (in fact, since before day one) the hope for Matrix was that it might end up as the communications fabric of the Metaverse. We were about 4 years early when we first starting enthusing about this, and then still ahead of our time when we did the world’s first 3D Video calling over Matrix. However, it now feels like the world has finally caught up - and we’re in grave danger of being overtaken by a dystopia where the big tech companies balkanize the Metaverse into a series of closed proprietary user-exploiting walled gardens, much like today’s incumbent chat silos - but even worse.
This is our chance to fix it before it’s too late, and Element is funding a small but highly targeted team to focus exclusively on building out open interoperable Metaverse over Matrix - ensuring that collaboration in 3D (and 2D) spatial environments in future is decentralised, secure and standards-based. This obviously ties in directly with the rest of the Beyond Chat projects listed above: it’s early days, but it’s incredibly exciting to imagine where we could end up if this works!
Finally, a question which has kept coming up while working on Beyond Chat projects has been whether to implement this new functionality as Matrix widgets, bake them into existing Matrix clients, or build them as domain-specific dedicated Matrix clients. But perhaps we’re thinking about this all wrong: what if your Matrix client was just a browser for Matrix rooms? Some of these could be chatrooms. Some of these could be VoIP/Video conferences or Discord-style voice/video rooms. Some of these could be message boards or mailing lists. Some of these could be collaborative editors or whiteboards. Some of these could be 3D views into the metaverse. Some of these could be rendered via widgets; some could be rendered natively if the client knows how. And some of these could even be good old web pages(!!!).
Imagine if your Matrix client was effectively a genuine browser of arbitrary decentralised realtime content? If your view into a Matrix room was just that: a full window view into that room, be it textual or 2D or 3D - and your Matrix client was just a browser which added the necessary chrome and navigation to help you tab between rooms, login and logout, manage your encryption, track who’s in the room, track your notifications, etc.?
Meanwhile, if you’re in a web browser, you might hop into a lightweight single-page domain-specific webapp which happens to use Matrix for collaboration. Or if you’re in a Matrix client/browser, you could hop to the same matrix URL to get at the same functionality with all the supporting chrome and UI overlays sliding in as needed…
Perhaps the vision of Matrix as the missing communication layer of the open Web is more literal than we ever thought. Eitherway, it will be fascinating to see how Applications Beyond Chat evolves over the next year.
2022
Now, I dare you to cross-reference all of the above with last year’s predictions for 2021 to see how we did :D In practice, the only things from the list we haven’t got to are peeking-over-federation (although arguably fast joins are a key part of that), account portability, and restoring incremental sign-up (although our new clients have it!).
So, here go the predictions for 2022 (keeping it short, otherwise it’ll be 2023 before this blog post gets finished…):
Client polish and performance - our prime directive is to ensure that Matrix clients can be built with UX polish and quality which exceeds our centralised alternatives. In practice, this means:
Element must spark joy. Ensuring Element’s Information Architecture continues to be simplified and refined, and that nobody who knows how to use a computer hits a WTF moment when first using the app. Never again do we want to see someone on Twitter saying “I have no idea how to use Matrix”.
Instant launch. With Sync v3 and matrix-rust-sdk we hope to make Element launch instantly on all platforms - including initial sync.
Fast joins. We should never get bored while waiting to join a room or accept an invite.
Spaces. While Spaces are already a huge improvement in letting users organise and discover rooms, there’s still much more to be done:
Flair - Users who are members of a space should be able to announce it loud and proud with a Flair badge on their avatar, like we used to with the old pre-spaces Communities feature (MSC3219 being the potential proposal).
Synchronising access controls - You should be able to apply access controls based on whether a user is a member of a given group (so that if you invite them to #moderators:example.com, they automatically get made moderator in all the rooms in a given space). It looks likely that this will be implemented at last using joepie91’s MSC3216 proposal for Synchronized access control for Spaces (rather than Matthew’s original MSC2962 - an excellent example of the community steering the spec process :)
Bulk joins - It should be a one-button operation to join all the rooms in a space.
**Subspaces **- as more and more spaces emerge, the ability to navigate them as a hierarchy becomes more and more useful. We want to get to the point where we can turn off the Matrix.org public rooms list, and instead present a Space tree of all the good rooms we know about in Matrix… delegating over curation to the wider community; building a huge USENET-style hierarchy of where to go in Matrix. To do that, we need subspaces to sing!
Removing communities/groups, which will then be entirely superseded by spaces.
**Threads **go-live!
Location share go-live
Pinned messages, so the most important messages are always visible to everyone n the room
Starred messages, so you never lose a message ever again
Custom emoji, finally merging in all the custom emoji work from the community.
matrix-rust-sdk
Element iOS on rust-sdk
Element Android on rust-sdk-crypto
…and experiment to see how matrix-rust-sdk feels on Web? It’s a real shame that Daydream got archived…
Encryption
Vodozemac in matrix-rust-sdk, maybe even elsewhere.
**Updated E2EE Audit **spanning vodozemac, olm+megolm, matrix-rust-sdk… and a representative sample of a typical Element+Synapse deployment.
DMLS - getting to the point where we can experiment with it in real clients.
Encryption Agility - the ability to migrate encrypted history is going to become really important as we evolve our E2EE, whether that’s by adding in post-quantum algorithms, or moving from Megolm to MLS, or any other shifts. We will need to start thinking about it in 2022.
Next-generation MSCs
Aggregations - finalising the foundational MSCs for aggregations, at last
Extensible events - finalising the foundational MSCs for extensible events, at last
**Sync v3 **- finalising the MSC and implementing it in matrix-rust-sdk
Fast joins - getting them implemented in Synapse and Dendrite
Peeking over federation - getting them implemented in Synapse and Dendrite
Extensible profiles - who needs a facebook wall when you have a profile room on Matrix?
Open ID Connect - using OIDC as an alternative auth mechanism for new clients.
Gitter parity
Importing the Gitter archives into Matrix via MSC2716
Implementing excellent public static Matrix archives (replacing both view.matrix.org and gitter.im’s static views)
Transfiguring Gitter into a Gitter-themed Element
Dendrite
**Parity with Synapse **- and out of beta, with any luck!
P2P Matrix
Exposing the normal Matrix network via P2P!
Multihomed accounts
Store and forward (if only by relaying via other P2P Matrix nodes)
**Low bandwidth transports **- via PineCoAP or similar
Making federation robust in a highly disconnected network.
Hydrogen
**Daily Driver **- making sure that Hydrogen can be readily used as a daily driver Matrix client, even if it lacks full parity with Element.
**Embeddable Hydrogen **- making the most of Hydrogen as a tiny lightweight PWA to embed it into existing websites.
Bots and Bridges
**Landing End-to-Bridge-Encryption **for all existing matrix-appservice-bridge based bridges
All the integrations!
First-class UI for configuring integrations!
Trust & Safety
Empower users to manage abuse within their communities.
Something we didn’t mention in 2021 is the increasing interest in building border gateways and hardware cross domain gateways to safely link different Matrix federations together. We expect to see a lot of activity in this space in 2022, and there should be some new MSCs too :)
Beyond Chat
**Metaverse on Matrix **- building out the dream as per above!
**Collaborative editing **- extending Matrix to store trees of events, and collaborate on them in realtime - starting with a collaborative editor!
File storage in Matrix - building out real-life file storage on top of Matrix.
So, there you have it. If you’ve got this far… it’s incredible; you’re amazing: thank you for reading! The sheer length of this update shows just how much Matrix has grown in 2021 relative to previous years; it’s frankly terrifying to imagine how long the equivalent post will be next year. We may have to change the format a little :)
And that’s a wrap for 2021: we hope you stay safe and have an excellent end of the year. Huge thanks for flying Matrix and supporting the project - we literally wouldn’t be here without you.
Today we are disclosing a critical security issue affecting multiple Matrix clients and libraries including Element (Web/Desktop/Android), FluffyChat, Nheko, Cinny, and SchildiChat. Element on iOS is not affected.
Specifically, in certain circumstances it may be possible to trick vulnerable clients into disclosing encryption keys for messages previously sent by that client to user accounts later compromised by an attacker.
Exploiting this vulnerability to read encrypted messages requires gaining control over the recipient’s account. This requires either compromising their credentials directly or compromising their homeserver.
Thus, the greatest risk is to users who are in encrypted rooms containing malicious servers. Admins of malicious servers could attempt to impersonate their users' devices in order to spy on messages sent by vulnerable clients in that room.
This is not a vulnerability in the Matrix or Olm/Megolm protocols, nor the libolm implementation. It is an implementation bug in certain Matrix clients and SDKs which support end-to-end encryption (“E2EE”).
We have no evidence of the vulnerability being exploited in the wild.
This issue was discovered during an internal audit by Denis Kasak, a security researcher at Element.
Remediation and Detection
Patched versions of affected clients are available now; please upgrade as soon as possible — we apologise sincerely for the inconvenience. If you are unable to upgrade, consider keeping vulnerable clients offline until you can. If vulnerable clients are offline, they cannot be tricked into disclosing keys. They may safely return online once updated.
Unfortunately, it is difficult or impossible to retroactively identify instances of this attack with standard logging levels present on both clients and servers. However, as the attack requires account compromise, homeserver administrators may wish to review their authentication logs for any indications of inappropriate access.
Similarly, users should review the list of devices connected to their account with an eye toward missing, untrusted, or non-functioning devices. Because an attacker must impersonate an existing or historical device, exploiting this vulnerability would either break an existing login on the user’s account, or a historical device would be re-added and flagged as untrusted.
Lastly, if you have previously verified the users / devices in a room, you would witness the safety shield on the room turn red during the attack, indicating the presence of an untrusted and potentially malicious device.
Affected Software
Given the severity of this issue, Element attempted to review all known encryption-capable Matrix clients and libraries so that patches could be prepared prior to public disclosure.
fractal-next is not vulnerable due to depending on a recent enough commit.
weechat-matrix-rs, a rewrite of the original weechat-matrix script, had no tagged releases, but some commits did depend on vulnerable versions of the matrix-rust-sdk. Users should ensure to update to the latest master of weechat-matrix-rs, presently 2b093a7.
Matrix supports the concept of “key sharing”, letting a Matrix client which lacks the keys to decrypt a message request those keys from that user's other devices or the original sender's device.
This was a feature added in 2016 in order to address edge cases where a newly logged-in device might not have the necessary keys to decrypt historical messages. Specifically, if other devices in the room are unaware of the new device due to a network partition, they have no way to encrypt for it—meaning that the only way the new device will be able to decrypt history is if the recipient's other devices share the necessary keys with it.
Other situations where key sharing is desirable include when the recipient hasn't backed up their keys (either online or offline) and needs them to decrypt history on a new login, or when facing implementation bugs which prevent clients from sending keys correctly. Requesting keys from a user's other devices sidesteps these issues.
Key sharing is described here in the Matrix E2EE Implementation Guide, which contains the following paragraph:
In order to securely implement key sharing, clients must not reply to every key request they receive. The recommended strategy is to share the keys automatically only to verified devices of the same user.
This is the approach taken in the original implementation in matrix-js-sdk, as used in Element Web and others, with the extension of also letting the sending device service keyshare requests from recipient devices. Unfortunately, the implementation did not sufficiently verify the identity of the device requesting the keyshare, meaning that a compromised account can impersonate the device requesting the keys, creating this vulnerability.
This is not a protocol or specification bug, but an implementation bug which was then unfortunately replicated in other independent implementations.
While we believe we have identified and contacted all affected E2EE client implementations: if your client implements key sharing requests, we strongly recommend you check that you cryptographically verify the identity of the device which originated the key sharing request.
Next Steps
The fact that this vulnerability was independently introduced so many times is a clear signal that the current wording in the Matrix Spec and the E2EE Implementation Guide is insufficient. We will thoroughly review the related documentation and revise it with clear guidelines on safely implementing key sharing.
Going further, we will also consider whether key sharing is still a necessary part of the Matrix protocol. If it is not, we will remove it. As discussed above, key sharing was originally introduced to make E2EE more reliable while we were ironing out its many edge cases and failure modes. Meanwhile, implementations have become much more robust, to the point that we may be able to go without key sharing completely. We will also consider changing how we present situations in which you cannot decrypt messages because the original sender was not aware of your presence. For example, undecryptable messages could be filed in a separate conversation thread, or those messages could require that keys are shared manually, effectively turning a bug into a feature.
We will also accelerate our work on matrix-rust-sdk as a portable reference implementation of the Matrix protocol, avoiding the implicit requirement that each independent library must necessarily reimplement this logic on its own. This will have the effect of reducing attack surface and simplifying audits for software which chooses to use matrix-rust-sdk.
Finally, we apologise to the wider Matrix community for the inconvenience and disruption of this issue. While Element discovered this vulnerability during an internal audit of E2EE implementations, we will be funding an independent end-to-end audit of the reference Matrix E2EE implementations (not just Olm + libolm) in the near future to help mitigate the risk from any future vulnerabilities. The results of this audit will be made publicly available.
Timeline
Ultimately, Element took two weeks from initial discovery to completing an audit of all known, public E2EE implementations. It took a further week to coordinate disclosure, culminating in today's announcement.
Monday, 23rd August — Discovery that Element Web is exploitable.
Thursday, 26th August — Determination that Element Android is exploitable with a modified attack.
Wednesday, 1 September — Determination that Element iOS fails safe in the presence of device changes.
Friday, 3 September — Determination that FluffyChat and Nheko are exploitable.
Tuesday, 7th September — Audit of Matrix clients and libraries complete.
Big news today: Element, the startup founded by the team who created Matrix,
just raised $30M of Series B funding in order to further accelerate Matrix
development and improve Element, the flagship Matrix app. The round is led by
our friends at Protocol Labs and Metaplanet,
the fund established by Jaan Tallinn (co-founder of
Skype and Kazaa). Both Protocol Labs and Metaplanet are spectacularly on
board our decentralised communication quest, and you couldn't really ask for
a better source of funding to help take Matrix to the next level. Thank you
for believing in Matrix and leading Element's latest funding!
You can read all about it from the Element perspective over at the
Element Blog,
but suffice it to say that this is enormous news for the Matrix ecosystem as a
whole. In addition to transforming the Element app, on the Matrix side this
means that there is now concrete funding secured to:
finish building out P2P Matrix
and get it live (including finishing Dendrite, given our P2P work builds on Dendrite!)
adding Threading to Element (yes, it's finally happening!)
speeding up room joins over federation
creating 'sync v3' to lazy-load all content and make the API super-snappy
lots of little long-overdue fun bits and pieces (yes, custom emoji, we're looking at you).
If you're wondering whether Protocol Labs' investment means that we'll be
seeing more overlap between IPFS and Matrix, then yes -
where it makes tech sense to do so, we're hoping to work more closely
together; for instance collaborating with the libp2p team on our P2P work
(we still need to experiment properly with gossipsub!), or perhaps giving
MSC2706
some attention. However, there are no plans to use cryptocurrency incentives
in Matrix or Element any time soon.
So, exciting times ahead! We'd like to inordinately thank everyone who has
supported Matrix over the years - especially our Patreon supporters, whose
donations pay for all the matrix.org infrastructure while inspiring others to
open their cheque books; the existing investors at Element (especially Notion
and Automattic, who have come in again on this round); all the large scale
Matrix deployments out there which are effectively turning Matrix into an
industry (hello gematik!) -
and everyone who has ever run a Matrix server, contributed code, used the
spec to make their own Matrix-powered creation, or simply chatted on Matrix.
Needless to say, Matrix wouldn't exist without you: the protocol and network
would have fizzled out long ago were it not for all the people supporting it
(the matrix.org server can now see over 35.5M addressable users on the
network!) - and meanwhile the ever-increasing energy of the community and the
core team combines to keep the protocol advancing forwards faster than ever.
We will do everything we possibly can to succeed in creating the long-awaited
secure communication layer of the open Web, and we look forward to large
amounts of Element's new funding being directed directly into core Matrix
development :)
It's only been two weeks since Dendrite 0.4 landed, but there's already a
significant new release with Dendrite 0.4.1 (it's amazing how much work we
can do on Dendrite when not off chasing low-bandwidth and P2P Matrix!)
This release further improves memory performance and radically improves
state resolution performance (rumour has it that it's a 10x speed-up).
Meanwhile, SS API sytest coverage is up to 91%(!!) and CS API is now at 63%.
We're going to try to keep the pressure up over the coming weeks - and once
sytest is at 100% coverage (and we're not missing any big features which sytest
doesn't cover yet) we'll be declaring a 1.0 :)
If you're running Dendrite, please upgrade. If not, perhaps this would be a
good version to give it a try? You can get it, as always from,
https://github.com/matrix-org/dendrite/releases/tag/v0.4.1. The changelog
follows:
Features
Support for room version 7 has been added
Key notary support is now more complete, allowing Dendrite to be used as a notary server for looking up signing keys
State resolution v2 performance has been optimised further by caching the create event, power levels and join rules in memory instead of parsing them repeatedly
The media API now handles cases where the maximum file size is configured to be less than 0 for unlimited size
The initial_state in a /createRoom request is now respected when creating a room
Code paths for checking if servers are joined to rooms have been optimised significantly
Fixes
A bug resulting in cannot xref null state block with snapshot during the new state storage migration has been fixed
Invites are now retired correctly when rejecting an invite from a remote server which is no longer reachable
The DNS cache cache_lifetime option is now handled correctly (contributed by S7evinK)
Invalid events in a room join response are now dropped correctly, rather than failing the entire join
The prev_state of an event will no longer be populated incorrectly to the state of the current event
Receiving an invite to an unsupported room version will now correctly return the M_UNSUPPORTED_ROOM_VERSION error code instead of M_BAD_JSON (contributed by meenal06)
We’re incredibly excited to officially announce that the national agency for
the digitalisation of the healthcare system in Germany (gematik)
has selected Matrix as the open standard on which to base all its
interoperable instant messaging standard - the TI-Messenger.
gematik has released a concept paper
that explains the initiative in full.
TL;DR
With the TI-Messenger, gematik is creating a nationwide decentralised private
communication network - based on Matrix - to support potentially more than
150,000 healthcare organisations within Germany’s national healthcare system.
It will provide end-to-end encrypted VoIP/Video and messaging for the whole
healthcare system, as well as the ability to share healthcare based data,
images and files.
Initially every healthcare provider (HCP) with an HBA (HPC ID card) will be
able to choose their own TI-Messenger provider. The homesever for HCP
accounts will be hosted by the provider’s datacentre. The homeserver for
institutions can be hosted by TI-Messenger providers, or on-premise.
Each organisation and individual will therefore retain complete ownership and
control of their communication data - while being able to share it securely
within the healthcare system with end-to-end encryption by default. All
servers in the Matrix-based private federation will be hosted within
Germany.
Needless to say, security is key when underpinning the entire nation’s
healthcare infrastructure and safeguarding sensitive patient data. As such,
the entire implementation will be accredited by BSI
(Federal Office for Information Security) and BfDI
(Federal Commissioner for Data Protection and Freedom of Information).
The full context...
Germany’s digital care modernisation law (“Digitale Versorgung und Pflege
Modernisierungs Gesetz” or DVPMG), which came into force in June 2021, spells
out the need for an instant messaging solution.
The urgency has increased by a significant rise in the use of instant
messaging and video conferencing within the healthcare system - for instance,
the amount of medical practices using messenger services doubled in 2020
compared to 2018 (much of this using insecure messaging solutions).
gematik, majority-owned by Germany’s
Federal Ministry of Health,
is responsible for the standardised digital transformation of Germany’s
healthcare sector. It focuses on improving efficiency and introducing new
ways of working by setting, testing and certifying healthcare technology
including electronic health cards, electronic patient records and
e-prescriptions.
TI-Messenger is gematik’s
technical specification for an interoperable secure instant messaging
standard. The healthcare industry will be able to build a wide range of apps
based on TI-Messenger specifications knowing that, being built on Matrix, all
those apps will interoperate.
More than 150,000 organisations - ranging from local doctors to clinics,
hospitals, and insurance companies - can potentially standardise on instant
messaging thanks to gematik’s TI-Messenger initiative.
The road to interoperability
By 1 October 2021, TI-Messenger will initially specify how communication
should work in practice between healthcare professionals (HCPs). Physicians
will be able to find and communicate with each other via TI-Messenger
approved apps - specifications include secure authentication mechanisms with
electronic health professional cards (eHBAs), electronic institution cards
(SMC-B) and a central FHIR directory. The first
compliant apps for HCPs are expected to be licensed by Q2 2022.
Eric Grey (product manager for TI-Messenger at gematik), reckons there will
initially be around 10-15 TI-Messenger compliant Matrix-based apps for HCP
communications available from different vendors.
Healthcare professionals will be able to choose a TI-Messenger provider, who
will be hosting their personal accounts and provide the messenger-client.
Healthcare organisations will choose a TI-Messenger provider to build the
dedicated homeserver infrastructure (on prem or in a data center), provide
the client and ongoing support.
What does this mean for the Matrix community?
Matrix is already integral to huge parts of the public sector; from the French
government’s Tchap platform, to Bundeswehr’s use of BwMessenger and adoption
by universities and schools across Europe.
Germany’s healthcare system standardising on Matrix takes this to entirely the
next level - and we can’t wait to see the rest of Europe (and the world!)
converge on Matrix for healthcare!
We'll have more info about TI-Messenger on this week's Matrix Live, out on
Friday - stay tuned!
Over the last few days we've seen a distributed spam attack across the public Matrix network, where large numbers of spambots have been registered across servers with open registration and then used to flood abusive traffic into rooms such as Matrix HQ.
The spam itself has been handled by temporarily banning the abused servers. However, on Monday and Tuesday the volume of traffic triggered performance problems for the homeservers participating in targeted rooms (e.g. memory explosions, or very delayed federation). This was due to a combination of factors, but one of the most important ones was Synapse issue #9490: that one busy room could cause head-of-line blocking, starving your server from processing events in other rooms, causing all traffic to fall behind.
We're happy to say that Synapse 1.37.1 fixes this and we now process inbound federation traffic asynchronously, ensuring that one busy room won't impact others. First impressions are that this has significantly improved federation performance and end-to-end encryption stability — for instance, new E2EE keys from remote users for a given conversation should arrive immediately rather than being blocked behind other traffic.
Please upgrade to Synapse 1.37.1 as soon as possible, in order to increase resilience to any other traffic spikes.
Also, we highly recommend that you disable open registration or, if you keep it enabled, use SSO or require email validation to avoid abusive signups. Empirically adding a CAPTCHA is not enough. Otherwise you may find your server blocked all over the place if it is hosting spambots.
Finally, if your server has open registration, PLEASE check whether spambots have been registered on your server, and deactivate them. Once deactivated, you will need to contact [email protected] to request that blocks on your server are removed.
Your best bet for spotting and neutralising dormant spambots is to review signups on your homeserver over the past 3-5 days and deactivate suspicious users. We do not recommend relying solely on lists of suspicious IP addresses for this task, as the distributed nature of the attack means any such list is likely to be incomplete or include shared proxies which may also catch legitimate users.
To ease review, we're working on an auditing script in #10290; feedback on whether this is useful would be appreciated. Problematic accounts can then be dealt with using the Deactivate Account Admin API.
Meanwhile, over to Dan for the Synapse 1.37 release notes.
Synapse 1.37 Release Announcement
Synapse 1.37 is now available!
**Note: ** The legacy APIs for Spam Checker extension modules are now considered deprecated and targeted for removal in August. Please see the module docs for information on updating.
This release also removes Synapse's built-in support for the obsolete ACMEv1 protocol for automatically obtaining TLS certificates. Server administrators should place Synapse behind a reverse proxy for TLS termination, or switch to a standalone ACMEv2 client like certbot.
Knock, knock?
After nearly 18 months and 129 commits, Synapse now includes support for MSC2403: Add "knock" feature and Room Version 7! This feature allows users to directly request admittance to private rooms, without having to track down an invitation out-of-band. One caveat: Though the server-side foundation is there, knocking is not yet implemented in clients.
A Unified Interface for Extension Modules
Third party modules can customize Synapse's behavior, implementing things like bespoke media storage providers or user event filters. However, Synapse previously lacked a unified means of enumerating and configuring third-party modules. That changes with Synapse 1.37, which introduces a new, generic interface for extensions.
This new interface consolidates configuration into one place, allowing for more flexibility and granularity by explicitly registering callbacks with specific hooks. You can learn more about the new module API in the docs linked above, or in Matrix Live S6E29, due out this Friday, July 2nd.
Safer Reauthentication
User-interactive authentication ("UIA") is required for potentially dangerous actions like removing devices or uploading cross-signing keys. However, Synapse can optionally be configured to provide a brief grace period such that users are not prompted to re-authenticate on actions taken shortly after logging in or otherwise authenticating.
This improves user experience, but also creates risks for clients which rely on UIA as a guard against actions like account deactivation. Synapse 1.37 protects users by exempting especially risky actions from the grace period. See #10184 for details.
Smaller Improvements
We've landed a number of smaller improvements which, together, make Synapse more responsive and reliable. We now:
More efficiently respond to key requests, preventing excessive load (#10221, #10144)
Render docs for each vX.Y Synapse release, starting with v1.37 (#10198)
Ensure that log entries from failures during early startup are not lost (#10191)
Have a notion of database schema "compatibility versions", allowing for more graceful upgrades and downgrades of Synapse (docs)
We've also resolved two bugs which could cause sync requests to immediately return with empty payloads (#8518), producing a tight loop of repeated network requests.
As many know, over the years we've experimented with how to let users locate
and curate sets of users and rooms in Matrix. Back in Nov
2017
we added 'groups' (aka 'communities') as a custom mechanism for this -
introducing identifiers beginning with a + symbol to represent sets of rooms
and users, like +matrix:matrix.org.
However, it rapidly became obvious that Communities had some major
shortcomings. They ended up being an extensive and entirely new API surface
(designed around letting you dynamically bridge the membership of a group
through to a single source of truth like LDAP) - while in practice groups
have enormous overlap with rooms: managing membership, inviting by email,
access control, power levels, names, topics, avatars, etc. Meanwhile the
custom groups API re-invented the wheel for things like pushing updates
to the client (causing a whole suite of
problems). So clients
and servers alike ended up reimplementing large chunks of similar
functionality for both rooms and groups.
And so almost before Communities were born, we started thinking about whether
it would make more sense to model them as a special type of room, rather than
being their own custom primitive.
MSC1215 had the first
thoughts on this in 2017, and then a formal proposal emerged at
MSC1772 in Jan 2019. We
started working on this in earnest at the end of 2020, and christened the new
way of handling groups of rooms and users as... Spaces!
Spaces work as follows:
You can designate specific rooms as 'spaces', which contain other rooms.
You can have a nested hierarchy of spaces.
You can rapidly navigate around that hierarchy using the new 'space summary'
(aka space-nav) API - MSC2946.
Spaces can be shared with other people publicly, or invite-only, or private
for your own curation purposes.
Rooms can appear in multiple places in the hierarchy.
You can have 'secret' spaces where you group your own personal rooms and
spaces into an existing hierarchy.
Today, we're ridiculously excited to be launching Space support as a beta in
matrix-react-sdk and matrix-android-sdk2 (and thus Element Web/Desktop and
Element Android) and Synapse
1.34.0 - so head
over to your nearest Element, make sure it's connected to the latest Synapse
(and that Synapse has Spaces enabled in its config) and find some Space to
explore! #community:matrix.org
might be a good start :)
The beta today gives us the bare essentials: and we haven't yet finished
space-based access controls such as setting powerlevels in rooms based on
space membership
(MSC2962)
or limiting who can join a room based on their space membership
(MSC3083) -
but these will be coming asap. We also need to figure out how to implement
Flair on top of Spaces rather than Communities.
This is also a bit of a turning point in Matrix's architecture: we are now
using rooms more and more as a generic way of modelling new features in
Matrix. For instance, rooms could be used as a structured way of storing
files (MSC3089);
Reputation data
(MSC2313) is stored in
rooms; Threads can be stored in rooms
(MSC2836); Extensible
Profiles are proposed as rooms too
(MSC1769). As such,
this pushes us towards ensuring rooms are as lightweight as possible in Matrix -
and that things like sync and changing profile scale independently of the
number of rooms you're in. Spaces effectively gives us a way of creating a
global decentralised filesystem hierarchy on top of Matrix - grouping the
existing rooms of all flavours into an epic multiplayer tree of realtime data.
It's like USENET had a baby with the Web!
For lots more info from the Element perspective, head over to the Element
blog.
Finally, the point of the beta is to gather feedback and fix bugs - so please
go wild in Element reporting your first impressions and help us make Spaces as
awesome as they deserve to be!
Just over a week ago we had the honour of using Matrix to host FOSDEM: the world's largest free & open source software conference. It's taken us a little while to write up the experience given we had to recover and catch up on business as usual... but better late than never, here's an overview of what it takes to run a ~30K attendee conference on Matrix!
[confetti and firework easter-eggs explode over the closing keynote of FOSDEM 2021]
First of all, a quick (re)introduction to Matrix for any newcomers: Matrix is an open source project which defines an open standard protocol for decentralised communication. The global Matrix network makes up at least 28M Matrix IDs spread over around 60K servers. For FOSDEM, we set up a fosdem.org server to host newcomers, provided by Element Matrix Services (EMS) - Element being the startup formed by the Matrix core team to help fund Matrix development.
The most unique thing about Matrix is that conversations get replicated across all servers whose users are present in the conversation, so there's never a single point of control or failure for a conversation (much as git repositories get replicated between all contributors). And so hosting FOSDEM in Matrix meant that everyone already on Matrix (including users bridged to Matrix from IRC, XMPP, Slack, Discord etc) could attend directly - in addition to users signing up for the first time on the FOSDEM server. Therefore the chat around FOSDEM 2021 now exists for posterity on all the Matrix servers whose users who participated; and we hope that the fosdem.org server will hang around for the benefit of all the newcomers for the foreseeable so they don't lose their accounts!
Talking of which: the vital stats of the weekend were as follows:
We saw almost 30K local users on the FOSDEM server + 4K remote users from elsewhere in Matrix.
There were 24,826 guests (read-only invisible users) on the FOSDEM server.
There were 8,060 distinct users actively joined to the public FOSDEM rooms...
...of which 3,827 registered on the FOSDEM server. (This is a bit of an eye-opener: over 50% of the actively participating attendees for FOSDEM were already on Matrix!)
These numbers don't count users who were viewing the livestreams directly, but only those who were attending via Matrix.
Given last year's FOSDEM had roughly 8,500 in-person attendees at the Université libre de Bruxelles, this feels like a pretty good outcome :)
Graphwise, local user activity on the FOSDEM server looked like this:
How was it built?
There were four main components on the Matrix side:
A horizontally-scalable Matrix server deployment (Synapse hosted in EMS)
A Jitsi cluster for the video conferencing, used to host all the Q&A sessions, hallway sessions, stands, and other adhoc video conferences
An elastically scalable Jibri cluster used to livestream the Jitsi conferences both to the official FOSDEM livestreams and to provide a local preview of the conference on Matrix (to avoid the Jitsis getting overloaded with folks who just want to view)
conference-bot - a Matrix bot which orchestrated the overall conference on Matrix, written from scratch for FOSDEM by TravisR, consuming the schedule from FOSDEM and maintaining all the necessary rooms with the correct permissions, widgets, invites, etc.
Architecturally, it looked like this:
On the clientside, we made heavy use of widgets: the ability to embed arbitrary web content as iframes into Matrix chatrooms. (Widgets currently exist as a set of proposals for the Matrix spec, which have been preemptively implemented in Element.)
For instance, the conference-bot created Matrix rooms for all the FOSDEM devrooms with a predefined widget for viewing the official FOSDEM livestream for that room, pointing at the appropriate HLS stream at stream.fosdem.org - which looked like this:
Each devroom also had a schedule widget available on the righthand side, visualising the schedule of that room - huge thanks to Hato and Steffen and folks at Nordeck for putting this together at the last minute; it enormously helped navigate the devrooms (and even had a live countdown to help you track where you were at in the schedule!)
Each devroom was also available via IRC on Freenode via a dedicated bridge (#fosdem-...) and via XMPP.
The bot also created rooms for each and every talk at FOSDEM (all 666 of them), as the space where the speaker and host could hang out in advance; watch the talk together, and then broadcast the Q&A session. At the end of the talk slot, the bot then transformed the talk room into a 'hallway' for the talk, and advertised it to the audience in the devroom, so folks could pose follow-on questions to the speaker as so often happens in real life at FOSDEM. The speaker's view of the talk rooms looked like this:
On the right-hand side you can see a "scoreboard" - a simple widget which tracked which messages in the devroom had been most upvoted, to help select questions for the Q&A session. On the left-hand side you can see a hybrid Jitsi/livestream widget used to coordinate between the speaker & host. By default, the widget showed the local livestream of the video call - if you clicked 'join conference' you'd join the Jitsi itself. This stopped view-only users from overloading the Jitsi once the room became public.
The widgets themselves were hosted by the bot (you can see them at https://github.com/matrix-org/conference-bot/tree/main/web). Meanwhile the chat.fosdem.org webclient itself ended up being identical to mainline Element Web 1.7.19, other than FOSDEM branding and being configured to hook the 'video call' button up to the hybrid Jitsi/livestream widget rather than a plain Jitsi.
Meanwhile, for conferencing we hosted an off-the-shelf Jitsi cluster sized to ~100 concurrent conferences, and for the Jibri livestreaming we set up an elastic scalable cluster using AWS Auto Scaling Groups. Jibri is essentially a Chromium which views the Jitsi webapp, running in a headless X server whose framebuffer and ALSA audio is hooked up to an ffmpeg process which livestreams to the appropriate destination - so we chose to run a separate VM for every concurrent livestream to keep them isolated from each other. The Jibri ffmpegs compressed the livestream to RTMP and relayed it to our nginx, which in turn relayed it to FOSDEM's livestreaming infrastructure for use in the official stream, as well as relaying it back to the local video preview in the Matrix livestream/video widget.
Here's a screengrab of the Jitsi/Jibri Grafana dashboard during the first day of the conference, showing 46 concurrent conferences in action, with 25 spare jibris in the scaling group cluster ready for action if needed :)
There was also an explosion of changes to Element itself to try to make things go as smoothly as possible. Probably the most important one was implementing Social Login - giving single-click registration for attendees who were happy to piggyback on existing identity providers (GitHub, GitLab, Google, Apple and Facebook) rather then signing up natively in Matrix:
This was a real epic to get together (and is also an important part of achieving parity between Gitter and Element) - and seems to have been surprisingly successful for FOSDEM. Almost 50% of users who signed up on the FOSDEM server did so via social login! We should also be turning it on for the matrix.org server this week.
Finally, on the Matrix server side, we ran a cluster of synapse worker processes (1 federation inbound, reader and sender, 1 pusher, 1 initial sync worker, 10 synchrotrons, 1 event persister, 1 event creator, 4 general purpose client readers, 1 typing worker and 1 user directory) within Kubernetes on EMS. These were hooked up for horizontal scalability as follows:
The sort of traffic we saw (from day 2) looked like this:
How did it go?
Overall, people seem to have had a good time. Some folks have even been kind enough to call it the best online event they've been to :) This probably reflects the fact that FOSDEM rocks no matter what - and that Matrix is an inherently social medium, built by and for open source communities (after all, the whole Matrix ecosystem is developed over Matrix!). Also, Matrix being an open network means that folks could join from all over, so the social dynamics already present in Matrix spilled over into FOSDEM - and we even saw a bunch of people spin up their own servers to participate; literally sharing the hosting responsibility themselves. Finally, having critical infrastructure rooms available such as #beerevent:fosdem.org, #cafe:fosdem.org and #food-trucks:fosdem.orgprobably helped as well.
That said, we did have some production incidents which impacted the event. The most serious one was on Saturday morning, where it transpired that some of the endpoints hosted on the main Synapse process were taking way more CPU than we'd anticipated - most importantly the /groups endpoints which handle traffic relating to communities (the legacy way of defining groups of rooms in Matrix). One of the last things we'd done to prepare for the conference on Friday night was to create a +fosdem:fosdem.org community which spanned all 1000 public rooms in the conference, as well as add the +staff:fosdem.org community to all of those rooms - and unfortunately we didn't anticipate how popular these would be. As a result we had to do some emergency rebalancing of endpoints, spinning up new workers and reconfiguring the loadbalancer to relieve load off the main process.
Ironically the Matrix server was largely working okay during this timeframe, given event-sending no longer passes through the main process - but the most serious impact was that the conference bot was unable to boot due to hitting a wide range of endpoints on startup as it syncs with the conference, some of which were timing out. This in turn impacted widgets, which had been hosted by the bot for convenience, meaning that the Jitsi conferences for stands and talk Q&A were unavailable (even though the Jitsi/Jibri cluster was fine). This was solved by lunchtime on Saturday: we are really sorry for folks whose Q&As or conferences got caught in this. On the plus side, we spotted that many affected rooms just added their own widgets for their own Jitsis or BBBs to continue with minimal distraction - effectively manually taking over from the bot.
The other main incident was briefly first thing on Sunday morning, where two Jibri livestreams ended up trying to broadcast video to the same RTMP URL (potentially due to a race when rapidly removing and re-adding the jitsi/livestream widget for one of the stands). This caused a cascading failure which briefly impacted all RTMP streams, but was solved within about 30 minutes. We also had a more minor problem with the active speaker recognition malfunctioning in Jitsi on Sunday (apparently a risk when using SCTP rather than Websockets as a transport within Jitsi) - this was solved around lunchtime. Again, we're really sorry if you were impacted by this. We've learned a lot from the experience, and if we end up doing this again we will make sure these failure modes don't repeat!
Other things we'd change if we have the chance to do it again include:
Providing a to-the-second countdown via a widget in the talk room so the speaker & host can see precisely when they're going 'live' in the devroom (and when precisely they're going to be cut in favour of the next talk)
Providing a scratch-pad of some kind in the talk room so the host & speaker can track which questions they want to answer, and which they've already answered
Keep the questions scoreboard and scratchpad visible to the speaker/host after their Q&A has finished so they can keep answering the questions in the per-talk room, and advertise the per-talk room more effectively.
Use Spaces rather than Communities to group the rooms together and automatically provide a structured room directory! (Like this!)
Use threads (once they land!) to help structure conversations in the devroom (perhaps these could even replace the hallway rooms?)
Make the schedule widgets easier to find, and have more of them around the place
Make room directory easier to find.
Give the option of recording the video in the per-talk and stands for posterity
Provide more tools to stands to help organise demos etc.
So, there you have it. We hope that this shows that it's possible to host successful large-scale conferences on Matrix using an entirely open source stack, and we hope that other events will be inspired to go online via Matrix! We should give a big shout out to HOPE, who independently trailblazed running conferences on Matrix last year and inspired us to make FOSDEM work.
Finally, huge thanks to FOSDEM for letting Matrix host the social side of the conference. This was a big bet, starting from scratch with our offer to help back in September, and we hope it paid off. Also, thanks to all the folks at Element who bust a gut to pull it together - and to the FOSDEM organisers, who were a real pleasure to work with.
Let's hope that FOSDEM 2022 will be back in person at ULB - but whatever happens, the infrastructure we built this year will be available if ever needed in future.
Imagine you could physically step into your favourite FOSS projects’ chatrooms, mailing lists or forums and talk in person to other community members, contributors or committers? Imagine you could see project leads show off their latest work in front of a packed audience, and then chat and brainstorm with them afterwards (and maybe grab a beer)? Imagine, as a developer, you could suddenly meet a random subset of your users, to hear and understand their joys and woes in person?
This is FOSDEM, Europe’s largest Free and Open Source conference, where every year thousands of people (last year, ~8,500) take over the Solbosch campus of the Université Libre de Bruxelles in Belgium for a weekend and turn it into both a cathedral and bazaar for FOSS, with over 800+ talks organised over 50+ tracks, hundreds of exhibitor stands, and the whole campus generally exploding into a physical manifestation of the Internet. The event is completely non-commercial and volunteer run, and is a truly unique and powerful (if slightly overwhelming!) experience to attend. Ever since we began Matrix in 2014, FOSDEM has been thefocalpointofouryear as we’ve rushed to demonstrate our latest work and catch up with the wider community and sync with other projects.
This year, things are of course different. Thankfully FOSDEM 2020 snuck in a few weeks before the COVID-19 pandemic went viral, but for FOSDEM 2021 on Feb 6/7th the conference will inevitably happen online. When this was announced a few months back, we reached out to FOSDEM to see if we could help: we’d just had a lot of fun helping HOPE go online, and meanwhile a lot of the work that’s gone into Matrix and Element in 2020 has been around large-scale community collaboration due to COVID - particularly thanks to all the development driven by Element’s German Education work. Meanwhile, we obviously love FOSDEM and want it to succeed as much as possible online - and we want to attempt to solve the impossible paradox of faithfully capturing the atmosphere and community of an event which is “online communities, but in person!”... but online.
And so, over the last few weeks we’ve been hard at work with the FOSDEM team to figure out how to make this happen, and we wanted to give an update on how things are shaping up (and to hopefully reassure folks that things are on track, and that devrooms don’t need to make their own plans!).
Firstly, FOSDEM will have its own dedicated Matrix server at fosdem.org (hosted by EMS along with a tonne of Jitsi’s) acting as the social backbone for the event. Matrix is particularly well suited for this, because:
We’re an open standard comms protocol with an open network run under a non-profit foundation with loads of open source implementations (including the reference ones): folks can jump on board and participate via their own servers, clients, bridges, bots etc.
We provide official bridges through to IRC and XMPP (and most other chat systems), giving as much openness and choice as possible - if folks want to participate via Freenode and XMPP they can!
We’re built with large virtual communities in mind (e.g. Mozilla, KDE, Matrix itself) - for instance, we’ve worked a lot on moderation recently.
We’ve spent a lot of time improving widgets recently: these give the ability to embed arbitrary webapps into chatrooms - letting you add livestreams, video conferences, schedules, Q&A dashboards etc, augmenting a plain old chatroom into a much richer virtual experience that can hopefully capture the semantics and requirements of an event like FOSDEM.
We’re currently in the middle of setting up the server with a dedicated Element as the default client, but what we’re aiming for is:
Attendees can lurk as read-only guests in devrooms without needing to set up accounts (or they can of course use their existing Matrix/IRC/XMPP accounts)
Every devroom and track will have its own chatroom, where the audience can hang out and view the livestream of that particular devroom (using the normal FOSDEM video livestream system). There’ll also be a ‘backstage’ room per track for coordination between the devroom organisers and the speakers.
The talks themselves will be prerecorded to minimise risk of disaster, but each talk will have a question & answer session at the end which will be a live Jitsi broadcast from the speaker and a host who will relay questions from the devroom.
Each talk will have a dedicated room too, where after the official talk slot the audience can pop in and chat to the speaker more informally if they’re available (by text and/or by moderated jitsi). During the talk, this room will act as the ‘stage’ for the speaker & host to watch the livestream and conduct the question & answer session.
Every stand will also have its own chatroom and optional jitsi+livestream, as will BOFs or other adhoc events, so folks can get involved both by chat and video, to get as close to the real event as possible (although it’s unlikely we’ll capture the unique atmospheric conditions of K building, which may or may not be a bug ;)
There’ll also be a set of official support, social etc rooms - and of course folks can always create their own! Unfortunately folks will have to bring their own beer though :(
All of this will be orchestrated by a Matrix bot (which is rapidly taking shape over at https://github.com/matrix-org/conference-bot), responsible for orchestrating the hundreds of required rooms, setting up the right widgets and permissions, setting up bridges to IRC & XMPP, and keeping everything in sync with the official live FOSDEM schedule.
N.B. This is aspirational, and is all still subject to change, but that said - so far it’s all coming together pretty well, and hopefully our next update will be opening up the rooms and the server so that folks can get comfortable in advance of the event.
Huge thanks go to the FOSDEM team for trusting us to sort out the social/chat layer of FOSDEM 2021 - we will do everything we can to make it as successful and as inclusive as we possibly can! :)
P.S. We need help!
FOSDEM is only a handful of weeks away, and we have our work cut out to bring this all together in time. There are a few areas where we could really do with some help:
Folks on XMPP often complain that the Bifröst Matrix<->XMPP bridge doesn’t support MAMs - meaning that if XMPP users lose connection, they lose scrollback. We’re not going to have time to fix this ourselves in time, so this would be a great time for XMPP folks who grok xmpp.js to come get involved and help to ensure the best possible XMPP experience! (Similarly on other bifrost shortcomings).
It’d be really nice to be able to render nice schedule widgets for each devroom, and embed the overall schedule in the support rooms etc. The current HTML schedules at https://fosdem.org/2021/schedule/day/saturday/ and (say) https://fosdem.org/2021/schedule/room/vcollab/ don’t exactly fit - if someone could write a thing which renders them at (say) 2:5 aspect ratio so they can fit nicely down the side of a chatroom then that could be awesome!
While we’ll bridge all the official rooms over to Freenode, it’d be even nicer if people could just hop straight into any room on the FOSDEM server (or beyond) via IRC - effectively exposing the whole thing as an IRC network for those who prefer IRC. We have a project to do this: matrix-ircd, but it almost certainly needs more love and polish before it could be used for something as big as this. If you like Rust and know Matrix, please jump in and get involved!
If you just want to follow along or help out, then we’ve created a general room for discussion over at #fosdem-matrix:fosdem.org. It’d be awesome to have as many useful bots & widgets as possible to help things along.