Matthew Hodgson

162 posts tagged with "Matthew Hodgson" (See all authors)

How we hosted FOSDEM 2021 on Matrix

2021-02-15 — Events, FOSDEM — Matthew Hodgson

Hi all,

Just over a week ago we had the honour of using Matrix to host FOSDEM: the world's largest free & open source software conference. It's taken us a little while to write up the experience given we had to recover and catch up on business as usual... but better late than never, here's an overview of what it takes to run a ~30K attendee conference on Matrix!

[confetti and firework easter-eggs explode over the closing keynote of FOSDEM 2021]

First of all, a quick (re)introduction to Matrix for any newcomers: Matrix is an open source project which defines an open standard protocol for decentralised communication. The global Matrix network makes up at least 28M Matrix IDs spread over around 60K servers. For FOSDEM, we set up a fosdem.org server to host newcomers, provided by Element Matrix Services (EMS) - Element being the startup formed by the Matrix core team to help fund Matrix development.

The most unique thing about Matrix is that conversations get replicated across all servers whose users are present in the conversation, so there's never a single point of control or failure for a conversation (much as git repositories get replicated between all contributors). And so hosting FOSDEM in Matrix meant that everyone already on Matrix (including users bridged to Matrix from IRC, XMPP, Slack, Discord etc) could attend directly - in addition to users signing up for the first time on the FOSDEM server. Therefore the chat around FOSDEM 2021 now exists for posterity on all the Matrix servers whose users who participated; and we hope that the fosdem.org server will hang around for the benefit of all the newcomers for the foreseeable so they don't lose their accounts!

Talking of which: the vital stats of the weekend were as follows:

We saw almost 30K local users on the FOSDEM server + 4K remote users from elsewhere in Matrix.
There were 24,826 guests (read-only invisible users) on the FOSDEM server.
There were 8,060 distinct users actively joined to the public FOSDEM rooms...
...of which 3,827 registered on the FOSDEM server. (This is a bit of an eye-opener: over 50% of the actively participating attendees for FOSDEM were already on Matrix!)
These numbers don't count users who were viewing the livestreams directly, but only those who were attending via Matrix.

Given last year's FOSDEM had roughly 8,500 in-person attendees at the Université libre de Bruxelles, this feels like a pretty good outcome :)

Graphwise, local user activity on the FOSDEM server looked like this:

🔗How was it built?

There were four main components on the Matrix side:

A horizontally-scalable Matrix server deployment (Synapse hosted in EMS)
A Jitsi cluster for the video conferencing, used to host all the Q&A sessions, hallway sessions, stands, and other adhoc video conferences
An elastically scalable Jibri cluster used to livestream the Jitsi conferences both to the official FOSDEM livestreams and to provide a local preview of the conference on Matrix (to avoid the Jitsis getting overloaded with folks who just want to view)
conference-bot - a Matrix bot which orchestrated the overall conference on Matrix, written from scratch for FOSDEM by TravisR, consuming the schedule from FOSDEM and maintaining all the necessary rooms with the correct permissions, widgets, invites, etc.

Architecturally, it looked like this:

On the clientside, we made heavy use of widgets: the ability to embed arbitrary web content as iframes into Matrix chatrooms. (Widgets currently exist as a set of proposals for the Matrix spec, which have been preemptively implemented in Element.)

For instance, the conference-bot created Matrix rooms for all the FOSDEM devrooms with a predefined widget for viewing the official FOSDEM livestream for that room, pointing at the appropriate HLS stream at stream.fosdem.org - which looked like this:

Each devroom also had a schedule widget available on the righthand side, visualising the schedule of that room - huge thanks to Hato and Steffen and folks at Nordeck for putting this together at the last minute; it enormously helped navigate the devrooms (and even had a live countdown to help you track where you were at in the schedule!)

Each devroom was also available via IRC on Freenode via a dedicated bridge (#fosdem-...) and via XMPP.

The bot also created rooms for each and every talk at FOSDEM (all 666 of them), as the space where the speaker and host could hang out in advance; watch the talk together, and then broadcast the Q&A session. At the end of the talk slot, the bot then transformed the talk room into a 'hallway' for the talk, and advertised it to the audience in the devroom, so folks could pose follow-on questions to the speaker as so often happens in real life at FOSDEM. The speaker's view of the talk rooms looked like this:

On the right-hand side you can see a "scoreboard" - a simple widget which tracked which messages in the devroom had been most upvoted, to help select questions for the Q&A session. On the left-hand side you can see a hybrid Jitsi/livestream widget used to coordinate between the speaker & host. By default, the widget showed the local livestream of the video call - if you clicked 'join conference' you'd join the Jitsi itself. This stopped view-only users from overloading the Jitsi once the room became public.

The widgets themselves were hosted by the bot (you can see them at https://github.com/matrix-org/conference-bot/tree/main/web). Meanwhile the chat.fosdem.org webclient itself ended up being identical to mainline Element Web 1.7.19, other than FOSDEM branding and being configured to hook the 'video call' button up to the hybrid Jitsi/livestream widget rather than a plain Jitsi.

Meanwhile, for conferencing we hosted an off-the-shelf Jitsi cluster sized to ~100 concurrent conferences, and for the Jibri livestreaming we set up an elastic scalable cluster using AWS Auto Scaling Groups. Jibri is essentially a Chromium which views the Jitsi webapp, running in a headless X server whose framebuffer and ALSA audio is hooked up to an ffmpeg process which livestreams to the appropriate destination - so we chose to run a separate VM for every concurrent livestream to keep them isolated from each other. The Jibri ffmpegs compressed the livestream to RTMP and relayed it to our nginx, which in turn relayed it to FOSDEM's livestreaming infrastructure for use in the official stream, as well as relaying it back to the local video preview in the Matrix livestream/video widget.

Here's a screengrab of the Jitsi/Jibri Grafana dashboard during the first day of the conference, showing 46 concurrent conferences in action, with 25 spare jibris in the scaling group cluster ready for action if needed :)

There was also an explosion of changes to Element itself to try to make things go as smoothly as possible. Probably the most important one was implementing Social Login - giving single-click registration for attendees who were happy to piggyback on existing identity providers (GitHub, GitLab, Google, Apple and Facebook) rather then signing up natively in Matrix:

This was a real epic to get together (and is also an important part of achieving parity between Gitter and Element) - and seems to have been surprisingly successful for FOSDEM. Almost 50% of users who signed up on the FOSDEM server did so via social login! We should also be turning it on for the matrix.org server this week.

Finally, on the Matrix server side, we ran a cluster of synapse worker processes (1 federation inbound, reader and sender, 1 pusher, 1 initial sync worker, 10 synchrotrons, 1 event persister, 1 event creator, 4 general purpose client readers, 1 typing worker and 1 user directory) within Kubernetes on EMS. These were hooked up for horizontal scalability as follows:

The sort of traffic we saw (from day 2) looked like this:

🔗How did it go?

Overall, people seem to have had a good time. Some folks have even been kind enough to call it the best online event they've been to :) This probably reflects the fact that FOSDEM rocks no matter what - and that Matrix is an inherently social medium, built by and for open source communities (after all, the whole Matrix ecosystem is developed over Matrix!). Also, Matrix being an open network means that folks could join from all over, so the social dynamics already present in Matrix spilled over into FOSDEM - and we even saw a bunch of people spin up their own servers to participate; literally sharing the hosting responsibility themselves. Finally, having critical infrastructure rooms available such as #beerevent:fosdem.org, #cafe:fosdem.org and #food-trucks:fosdem.org probably helped as well.

That said, we did have some production incidents which impacted the event. The most serious one was on Saturday morning, where it transpired that some of the endpoints hosted on the main Synapse process were taking way more CPU than we'd anticipated - most importantly the /groups endpoints which handle traffic relating to communities (the legacy way of defining groups of rooms in Matrix). One of the last things we'd done to prepare for the conference on Friday night was to create a +fosdem:fosdem.org community which spanned all 1000 public rooms in the conference, as well as add the +staff:fosdem.org community to all of those rooms - and unfortunately we didn't anticipate how popular these would be. As a result we had to do some emergency rebalancing of endpoints, spinning up new workers and reconfiguring the loadbalancer to relieve load off the main process.

Ironically the Matrix server was largely working okay during this timeframe, given event-sending no longer passes through the main process - but the most serious impact was that the conference bot was unable to boot due to hitting a wide range of endpoints on startup as it syncs with the conference, some of which were timing out. This in turn impacted widgets, which had been hosted by the bot for convenience, meaning that the Jitsi conferences for stands and talk Q&A were unavailable (even though the Jitsi/Jibri cluster was fine). This was solved by lunchtime on Saturday: we are really sorry for folks whose Q&As or conferences got caught in this. On the plus side, we spotted that many affected rooms just added their own widgets for their own Jitsis or BBBs to continue with minimal distraction - effectively manually taking over from the bot.

The other main incident was briefly first thing on Sunday morning, where two Jibri livestreams ended up trying to broadcast video to the same RTMP URL (potentially due to a race when rapidly removing and re-adding the jitsi/livestream widget for one of the stands). This caused a cascading failure which briefly impacted all RTMP streams, but was solved within about 30 minutes. We also had a more minor problem with the active speaker recognition malfunctioning in Jitsi on Sunday (apparently a risk when using SCTP rather than Websockets as a transport within Jitsi) - this was solved around lunchtime. Again, we're really sorry if you were impacted by this. We've learned a lot from the experience, and if we end up doing this again we will make sure these failure modes don't repeat!

Other things we'd change if we have the chance to do it again include:

Providing a to-the-second countdown via a widget in the talk room so the speaker & host can see precisely when they're going 'live' in the devroom (and when precisely they're going to be cut in favour of the next talk)
Providing a scratch-pad of some kind in the talk room so the host & speaker can track which questions they want to answer, and which they've already answered
Keep the questions scoreboard and scratchpad visible to the speaker/host after their Q&A has finished so they can keep answering the questions in the per-talk room, and advertise the per-talk room more effectively.
Use Spaces rather than Communities to group the rooms together and automatically provide a structured room directory! (Like this!)
Use threads (once they land!) to help structure conversations in the devroom (perhaps these could even replace the hallway rooms?)
Make the schedule widgets easier to find, and have more of them around the place
Make room directory easier to find.
Give the option of recording the video in the per-talk and stands for posterity
Provide more tools to stands to help organise demos etc.

So, there you have it. We hope that this shows that it's possible to host successful large-scale conferences on Matrix using an entirely open source stack, and we hope that other events will be inspired to go online via Matrix! We should give a big shout out to HOPE, who independently trailblazed running conferences on Matrix last year and inspired us to make FOSDEM work.

If you want to know more, we also did a talk about FOSDEM-on-Matrix in this month's Open Tech Will Save Us meetup and the Building Massive Virtual Communities on Matrix talk at FOSDEM went into more detail too. Our historical Taking FOSDEM online via Matrix blog has been somewhat overtaken by events but gives further context still.

Finally, huge thanks to FOSDEM for letting Matrix host the social side of the conference. This was a big bet, starting from scratch with our offer to help back in September, and we hope it paid off. Also, thanks to all the folks at Element who bust a gut to pull it together - and to the FOSDEM organisers, who were a real pleasure to work with.

Let's hope that FOSDEM 2022 will be back in person at ULB - but whatever happens, the infrastructure we built this year will be available if ever needed in future.

Taking FOSDEM online via Matrix

2021-01-04 — Events, FOSDEM — Matthew Hodgson

Imagine you could physically step into your favourite FOSS projects’ chatrooms, mailing lists or forums and talk in person to other community members, contributors or committers? Imagine you could see project leads show off their latest work in front of a packed audience, and then chat and brainstorm with them afterwards (and maybe grab a beer)? Imagine, as a developer, you could suddenly meet a random subset of your users, to hear and understand their joys and woes in person?

This is FOSDEM, Europe’s largest Free and Open Source conference, where every year thousands of people (last year, ~8,500) take over the Solbosch campus of the Université Libre de Bruxelles in Belgium for a weekend and turn it into both a cathedral and bazaar for FOSS, with over 800+ talks organised over 50+ tracks, hundreds of exhibitor stands, and the whole campus generally exploding into a physical manifestation of the Internet. The event is completely non-commercial and volunteer run, and is a truly unique and powerful (if slightly overwhelming!) experience to attend. Ever since we began Matrix in 2014, FOSDEM has been the focal point of our year as we’ve rushed to demonstrate our latest work and catch up with the wider community and sync with other projects.

This year, things are of course different. Thankfully FOSDEM 2020 snuck in a few weeks before the COVID-19 pandemic went viral, but for FOSDEM 2021 on Feb 6/7th the conference will inevitably happen online. When this was announced a few months back, we reached out to FOSDEM to see if we could help: we’d just had a lot of fun helping HOPE go online, and meanwhile a lot of the work that’s gone into Matrix and Element in 2020 has been around large-scale community collaboration due to COVID - particularly thanks to all the development driven by Element’s German Education work. Meanwhile, we obviously love FOSDEM and want it to succeed as much as possible online - and we want to attempt to solve the impossible paradox of faithfully capturing the atmosphere and community of an event which is “online communities, but in person!”... but online.

And so, over the last few weeks we’ve been hard at work with the FOSDEM team to figure out how to make this happen, and we wanted to give an update on how things are shaping up (and to hopefully reassure folks that things are on track, and that devrooms don’t need to make their own plans!).

Firstly, FOSDEM will have its own dedicated Matrix server at fosdem.org (hosted by EMS along with a tonne of Jitsi’s) acting as the social backbone for the event. Matrix is particularly well suited for this, because:

We’re an open standard comms protocol with an open network run under a non-profit foundation with loads of open source implementations (including the reference ones): folks can jump on board and participate via their own servers, clients, bridges, bots etc.
We provide official bridges through to IRC and XMPP (and most other chat systems), giving as much openness and choice as possible - if folks want to participate via Freenode and XMPP they can!
We’re built with large virtual communities in mind (e.g. Mozilla, KDE, Matrix itself) - for instance, we’ve worked a lot on moderation recently.
We’ve spent a lot of time improving widgets recently: these give the ability to embed arbitrary webapps into chatrooms - letting you add livestreams, video conferences, schedules, Q&A dashboards etc, augmenting a plain old chatroom into a much richer virtual experience that can hopefully capture the semantics and requirements of an event like FOSDEM.

We’re currently in the middle of setting up the server with a dedicated Element as the default client, but what we’re aiming for is:

Attendees can lurk as read-only guests in devrooms without needing to set up accounts (or they can of course use their existing Matrix/IRC/XMPP accounts)
Every devroom and track will have its own chatroom, where the audience can hang out and view the livestream of that particular devroom (using the normal FOSDEM video livestream system). There’ll also be a ‘backstage’ room per track for coordination between the devroom organisers and the speakers.
The talks themselves will be prerecorded to minimise risk of disaster, but each talk will have a question & answer session at the end which will be a live Jitsi broadcast from the speaker and a host who will relay questions from the devroom.
Each talk will have a dedicated room too, where after the official talk slot the audience can pop in and chat to the speaker more informally if they’re available (by text and/or by moderated jitsi). During the talk, this room will act as the ‘stage’ for the speaker & host to watch the livestream and conduct the question & answer session.
Every stand will also have its own chatroom and optional jitsi+livestream, as will BOFs or other adhoc events, so folks can get involved both by chat and video, to get as close to the real event as possible (although it’s unlikely we’ll capture the unique atmospheric conditions of K building, which may or may not be a bug ;)
There’ll also be a set of official support, social etc rooms - and of course folks can always create their own! Unfortunately folks will have to bring their own beer though :(
All of this will be orchestrated by a Matrix bot (which is rapidly taking shape over at https://github.com/matrix-org/conference-bot), responsible for orchestrating the hundreds of required rooms, setting up the right widgets and permissions, setting up bridges to IRC & XMPP, and keeping everything in sync with the official live FOSDEM schedule.

N.B. This is aspirational, and is all still subject to change, but that said - so far it’s all coming together pretty well, and hopefully our next update will be opening up the rooms and the server so that folks can get comfortable in advance of the event.

Huge thanks go to the FOSDEM team for trusting us to sort out the social/chat layer of FOSDEM 2021 - we will do everything we can to make it as successful and as inclusive as we possibly can! :)

🔗P.S. We need help!

FOSDEM is only a handful of weeks away, and we have our work cut out to bring this all together in time. There are a few areas where we could really do with some help:

Folks on XMPP often complain that the Bifröst Matrix<->XMPP bridge doesn’t support MAMs - meaning that if XMPP users lose connection, they lose scrollback. We’re not going to have time to fix this ourselves in time, so this would be a great time for XMPP folks who grok xmpp.js to come get involved and help to ensure the best possible XMPP experience! (Similarly on other bifrost shortcomings).
It’d be really nice to be able to render nice schedule widgets for each devroom, and embed the overall schedule in the support rooms etc. The current HTML schedules at https://fosdem.org/2021/schedule/day/saturday/ and (say) https://fosdem.org/2021/schedule/room/vcollab/ don’t exactly fit - if someone could write a thing which renders them at (say) 2:5 aspect ratio so they can fit nicely down the side of a chatroom then that could be awesome!
While we’ll bridge all the official rooms over to Freenode, it’d be even nicer if people could just hop straight into any room on the FOSDEM server (or beyond) via IRC - effectively exposing the whole thing as an IRC network for those who prefer IRC. We have a project to do this: matrix-ircd, but it almost certainly needs more love and polish before it could be used for something as big as this. If you like Rust and know Matrix, please jump in and get involved!
If you just want to follow along or help out, then we’ve created a general room for discussion over at #fosdem-matrix:fosdem.org. It’d be awesome to have as many useful bots & widgets as possible to help things along.

The Matrix Holiday Special 2020

2020-12-25 — General, Holiday Special — Matthew Hodgson

Hi all,

Over the years it’s become a tradition to write an end-of-year wrap-up on Christmas Eve, reviewing all the things the core Matrix team has been up over the year, and looking forwards to the next (e.g. here’s last year’s edition). These days there’s so much going on in Matrix it’s impossible to cover it all (and besides, we now have This Week In Matrix and better blogging in general to cover events as they happen). So here’s a quick overview of the highlights:

Introducing Cerulean

2020-12-18 — General — Matthew Hodgson

Hi all,

We have a bit of an unexpected early Christmas present for you today…

Alongside all the normal business-as-usual Matrix stuff, we’ve found some time to do a mad science experiment over the last few weeks - to test the question: “Is it possible to build a serious Twitter-style decentralised microblogging app using Matrix?”

It turns out the answer is a firm “yes” - and as a result we’d like to present a very early sneak preview of Cerulean: a highly experimental new microblogging app for Matrix, complete with first-class support for arbitrarily nested threading, with both Twitter-style (“vertical”) and HN/Reddit-style (“horizontal”) layout… and mobile web support!

Cerulean is unusual in many ways:

It’s (currently) a very minimal javascript app - only 2,500 lines of code.
It has zero dependencies (other than React).
- This is to show just how simple a fairly sophisticated Matrix client can be...
- ...and so the code can be easily understood by folks unfamiliar with Matrix...
- ...and so we can iterate fast while figuring out threading...
- ...and because none of the SDKs support threading yet :D
It relies on MSC2836: Threading - our highly experimental Matrix Spec Change to extend relationships (as used by reaction & edit aggregations) to support free-form arbitrary depth threading.
As such, it only works on Dendrite, as that’s where we’ve been experimenting with implementing MSC2836. (We’re now running an official public Dendrite server instance at dendrite.matrix.org though, which makes it easy to test - and our test Cerulean instance https://cerulean.matrix.org points at it by default).

This is **very much a proof of concept. **We’re releasing it today as a sneak preview so that intrepid Matrix experimenters can play with it, and to open up the project for contributions! (PRs welcome - it should be dead easy to hack on!). Also, we give no guarantees about data durability: both Cerulean and dendrite.matrix.org are highly experimental; do not trust them yet with important data; we reserve the right to delete it all while we iterate on the design.

🔗What can it do?

So for the first cut, we’ve implemented the minimal features to make this something you can just about use and play with for real :)

Home view (showing recent posts from folks you follow)
Timeline view (showing the recent posts or replies from a given user)
Thread view (showing a post and its replies as a thread)
Live updating (It’s Matrix, after all! We’ve disabled it for guests though.)
Posting plain text and images
Fully decentralised thanks to Matrix (assuming you’re on Dendrite)
Twitter-style “Vertical” threading (replies form a column; you indent when someone forks the conversation)
HN/Reddit/Email-style “Horizontal” threading (each reply is indented; forks have the same indentation)
Basic Registration & Login
Guest support (slightly faked with non-guest users, as Dendrite’s guest support isn’t finished yet)
Super-experimental proof-of-concept support for decentralised reputation filtering(!)

Obviously, there’s a huge amount of stuff needed for parity with a proper Twitter-style system:

Configurable follows. Currently the act of viewing someone’s timeline automatically follows them. This is because Dendrite doesn’t peek over federation yet (but it’s close), so you have to join a room to view its contents - and the act of viewing someone’s timeline room is how you follow them in Cerulean.
Likes (i.e. plain old Matrix reactions, although we might need to finally sort out federating them as aggregations rather than individually, if people use them like they use them on Twitter!)
Retweets (dead easy)
Pagination / infinite scrolling (just need to hook it up)
Protect your posts (dead easy; you just switch your timeline room to invite-only!)
Show (some) replies to messages in the Home view
Show parent and sibling context as well as child context in the Thread view
Mentions (we need to decide how to notify folks when they’re mentioned - perhaps Matrix’s push notifications should be extended to let you subscribe to keywords for public rooms you’re not actually in?)
Notifications (although this is just because Dendrite doesn’t do notifs yet)
Search (again, just needs to be implemented in Dendrite - although how do you search beyond the data in your current homeserver? Folks are used to global search)
Hashtags (it’s just search, basically)
Symlinks (see below)
Figure out how to handle lost unthreaded messages (see below)
Offline support? (if we were using a proper Matrix SDK, we’d hopefully get this for free, but currently Cerulean doesn’t store any state locally at all).

🔗How does it work?

Every message you send using Cerulean goes into two Matrix rooms, dubbed the "timeline" room and the "thread" room. The "timeline" room (with an alias of #@matthew:dendrite.matrix.org or whatever your matrix id is) is a room with all of your posts and no one else's. The "thread" room is a normal Matrix room which represents the message thread itself. Creating a new "Post" will create a new "thread" room. Replying to a post will join the existing "thread" room and send a message into that room. MSC2836 is used to handle threading of messages in the "thread” room - the replies refer to their parent via an m.relationship field in the event.

These semantics play nicely with existing Matrix clients, who will see one room per thread and a flattened chronological view of the thread itself (unless the client natively supports MSC2836, but none do yet apart from Cerulean). However, as Cerulean only navigates threaded messages with an m.reference relationship (eg it only ever uses the new /event_relationships API rather than /messages to pull in history), normal messages sent by Matrix into a thread or timeline room will not yet show up in Cerulean.

In this initial version, Cerulean literally posts the message twice into both rooms - but we’re also experimenting with the idea of adding “symlinks” to Matrix, letting the canonical version of the event be in the timeline room, and then the instance of the event in the thread room be a ‘symlink’ to the one in the timeline. This means that the threading metadata could be structured in the thread room, and let the user do things like turn their timeline private (or vice versa) without impacting the threading metadata. We could also add an API to both post to timeline and symlink into a thread in one fell swoop, rather than manually sending two events. It’d look something like this:

Cerulean diagram

We also experimented with cross-room threading (letting Bob’s timeline messages directly respond to Alice’s timeline messages and vice versa), but it posed some nasty problems - for instance, to find out what cross-room replies a message has, you’d need to store forward references somehow which the replier would need permission to create. Also, if you didn’t have access to view the remote room, the thread would break. So we’ve punted cross-room threading to a later MSC for now.

Needless to say, once we’re happy with how threading works at the protocol level, we’ll be looking at getting it into the UX of Element and mainstream Matrix chat clients too!

🔗What’s with the decentralised reputation button?

Cerulean is very much a test jig for new ideas (e.g. threading, timeline rooms, peeking), and we’re taking the opportunity to also use it as an experiment for our first forays into publishing and subscribing to reputation greylists; giving users the option to filter out content by default they might not want to see… but doing so on their own terms by subscribing to whatever reputation feed they prefer, while clearly visualising the filtering being applied. In other words, this is the first concrete experimental implementation of the work proposed in the second half of Combating Abuse in Matrix without Backdoors. This is super early days, and we haven’t even published a proto-MSC for the event format being used, but if you’re particularly interested in this domain it’s easy enough to figure out - just head over to #nsfw:dendrite.matrix.org (warning: not actually NSFW, yet) and look in /devtools to see what’s going on.

So, there you have it - further evidence that Matrix is not just for Chat, and a hopefully intriguing taste of the shape of things to come! Please check out the demo at https://cerulean.matrix.org or try playing with your own from https://github.com/matrix-org/cerulean, and then head over to #cerulean:matrix.org and let us know what you think! :)

Gitter now speaks Matrix!

2020-12-07 — General — Matthew Hodgson

Hi all,

It’s been just over 2 months since we revealed that Gitter was going to join Matrix - and we are incredibly proud to announce that Gitter has officially turned on true native Matrix connectivity: all public Gitter rooms are now available natively via Matrix, and all Gitter users now natively exist on Matrix. So, if you wanted to join the official Node.js language support room at https://gitter.im/nodejs/node from Matrix, just head over to #nodejs_node:gitter.im and *boom*, you’re in!

This means Gitter is now running a Matrix homeserver at gitter.im which exposes all the active public rooms - so if you go to the the room directory in Element (for instance) and select gitter.im as a homeserver, you can jump straight in:

Gitter room directory

Once you’re in, you can chat back and forth transparently between users on the Gitter side and the Matrix side, and you no longer have the ugly “Matrixbot” user faking the messages back and forth - these are ‘real’ users talking directly to one another, and every public msg in every public room is now automatically exposed into Matrix.

Gitter and Matrix going native!

So, suddenly all the developer communities previously living only in Gitter (Scala, Node, Webpack, Angular, Rails and thousands of others) are now available to anyone anywhere on Matrix - alongside communities bridged from Freenode and Slack; the native Matrix communities for Mozilla, KDE, GNOME communities etc. We’re hopeful that glueing everything together via Matrix will usher in a new age of open and defragmented dev collaboration, a bit like we used to have on IRC, back in the day.

This is also great news for mobile Gitter users - as the original mobile Gitter clients have been in a holding pattern for over a year, and native Matrix support for Gitter means they are now officially deprecated in favour of Element (or indeed any other mobile Matrix client).

🔗What features are ready?

Now, this is the first cut of native Matrix support in Gitter: much of the time since Gitter joined Element has been spent migrating stuff over from Gitlab to Element, and it’s only really been a month of work so far in hooking up Matrix. As a result: all the important features work, but there’s also stuff that’s yet to land:

Features ready today:

Ability to join rooms from Matrix via #org_repo:gitter.im
Bridging Edits, Replies (mapped to Threads on Gitter), Deletes, File transfer
Bridging Markdown & Emoji

What remains:

Ability to send/receive Direct Messages
Ability to plumb existing Matrix rooms into Gitter natively
Ability to view past Gitter history from Matrix. This is planned thanks to https://github.com/matrix-org/matrix-doc/pull/2716
Synchronising the full Gitter membership list to Matrix. Currently the membership syncs incrementally as people speak
Turning off the old Gitter bridge
Bridging emotes (/me support) (almost landed!)
Bridging read receipts
Synchronising room avatars
Bridge LaTeX

Stuff we’re not planning to support:

Ability to join arbitrary rooms on Matrix from Gitter. This could consume huge resources on Gitter, and we’re not in a rush to mirror all of Matrix into Gitter. This will get addressed when Gitter merges with Element into a pure Matrix client.
Bridging Reactions. Gitter doesn’t have these natively today, and rather than adding them to Gitter, we’d rather work on merging Gitter & Element together.

For more details, we strongly recommend checking out the native Matrix epic on Gitlab for the unvarnished truth straight from the coal-face!

🔗How do you make an existing chat system talk Matrix?

In terms of the work which has gone into this - Gitter has been an excellent case study of how you can easily plug an existing large established chat system into Matrix.

At high level, the core work needed was as simple as:

Add ‘virtual users’, so remote Matrix users can be modelled and represented in Gitter correctly: https://gitlab.com/gitterHQ/webapp/-/merge_requests/2027/diffs.
- This can be accomplished by simply adding a virtualUser property to your chat message/post/tweet schema which holds the mxid, displayName, and avatar as an alternative to your author field. Then display the virtualUser whenever available over the author.
Add an application service to Gitter to bridge traffic in & out of Matrix: https://gitlab.com/gitterHQ/webapp/-/merge_requests/2041/diffs
- This "application service" comes pre-packaged for you in many cases, so for example you can simply drop in a library like matrix-appservice-bridge in a Node.js application, and all of the Matrix talking complexity is handled for you.
Polish it!

In practice, Eric (lead Gitter dev) laid out the waypoints of the full journey:

First big step was to add the concept of virtual users to Gitter. We could also have created a new Gitter user for every new matrix ID that appears, but tagging them as virtual users is a bit cleaner.
Figuring out how to balance the Matrix traffic coming into/out of Gitter.
1. Spreading the inbound load comes for free via our existing load-balancer setup (ELB) where we already have 8 webapp servers running the various services of gitter.im. We just run the Matrix bridge on those servers alongside each web and api process, and then the load-balancer’s matrix.gitter.im spreads out to the servers.
2. Events from Matrix then hit the load balancer and reach one of the servers (no duplication when processing events).
3. If something on Gitter happens, the action occurs on one server and we just propagate it over to Matrix (no duplication or locking needed).
We have realtime websockets and Faye subscriptions already in the app which are backed by Mongoose database hooks whenever something changes. We just tapped into the same thing to be able to bridge across new information to Matrix as we receive it on Gitter.
Hooking up the official Matrix bridging matrix-appservice-bridge library to use Gitter’s existing MongoDB for storage instead of nedb.
Figuring out how to namespace the mxids of the gitter users:
1. It’s nice to have the mxid as human readable as possible instead of just the numerical userId in your service.
2. But if people can change their username in your service, you can’t change your mxid on Matrix. In the future, we’ll have portable accounts in Matrix to support this (MSC2787) but sadly these are still vapourware at this point.
3. If you naively just switch the user’s mxid when they rename their username, then you could end up leaking conversation history between mxids(!)
4. So we went with @username-userid:gitter.im for the Matrix ID to make it a bit more human readable but also unique so any renames can happen without affecting anything.
For room aliases, we decided to change our community/room URI syntax to underscores for the room aliases, #community_room:gitter.im
Figuring out how to bridge features correctly;
1. Emoji - mapping between :shortcode: and unicode emojis
2. Mapping between Gitter threaded conversations <-> Matrix replies
3. Mapping between Matrix mentions and Gitter mentions
Keeping users and room data in sync
1. We haven’t gotten there yet, but the data comes through the same Mongoose hook and we can update the bridged data as they change on the Gitter end.

Meanwhile, the Matrix side of gitter.im is hosted by Element Matrix Services and is a plain old Synapse, talking through to Gitter via the Application Service API. An alternative architecture would be to have got Gitter directly federating with Matrix by embedding a “homeserver library” into it (e.g. embedding Dendrite). However, given Dendrite is still beta and assumes it is storing its data itself (rather than persisting in an existing backend such as Gitter’s mongodb), we went for the simpler option to start with.

It’s been really interesting to see how this has played out week by week in the Gitter updates in This Week in Matrix: you can literally track the progress and see how the integration came to life between Oct 9, Oct 23, Nov 6, Nov 27 and finally Dec 4.

Huge thanks go to Eric Eastwood, the lead dev of Gitter and mastermind behind the project - and also to Half-Shot and Christian who’ve been providing all the support and review from the Matrix bridging team.

🔗What’s next?

First and foremost we’re going to be working through the “What remains” section of the list above: killing off the old bridge, sorting out plumbed rooms, hooking up DMs, importing old Gitter history into Matrix, etc. This should then give us an exceptionally low impedance link between Gitter & Matrix.

Then, as per our original announcement, the plan is:

In the medium/long term, it’s simply not going to be efficient for the combined Element/Gitter team to split our efforts maintaining two high-profile Matrix clients. Our plan is instead to merge Gitter’s features into Element (or next generations of Element) itself and then - if and only if Element has achieved parity with Gitter - we expect to upgrade the deployment on gitter.im to a Gitter-customised version of Element. The inevitable side-effect is that we’ll be adding new features to Element rather than Gitter going forwards.

Now, that means implementing some features in Matrix/Element to match...

Instant live room peeking (less than a second to load the webapp into a live-view of a massive room with 20K users!!)
Seamless onboarding thanks to using GitLab & GitHub for accounts
Curated hierarchical room directory
Magical creation of rooms on demand for every GitLab and GitHub project ever
GitLab/GitHub activity as a first-class citizen in a room’s side-panel
Excellent search-engine-friendly static content and archives
KaTeX support for Maths communities
Threads!

...and this work is in full swing:

We have a proposal for fast peeking (via lazy-loading state over federation) at MSC2775 and the new peek APIs at MSC2753 and MSC2444 (and even implemented by Dendrite)
Social Login for seamless onboarding via GitLab, GitHub & friends is almost landed in Element: https://github.com/matrix-org/matrix-react-sdk/pull/5426
Spaces gives a curated hierarchical room directory: https://github.com/vector-im/roadmap/issues/2
KaTeX support landed in Element Web a few weeks ago thanks to akissinger and thosgood
Threads are on the horizon, thanks to MSC2836.

The only bits which aren’t already progressing yet are tighter GL/GH integration, and better search engine optimised static archives.

So, the plan is to get cracking on the rest of the feature parity, then merge Gitter & Element together - and meanwhile continue getting the rest of the world into Matrix :)

We live in exciting times: open standards-based interoperable communication is on the rise again, and we hope Gitter’s new life in Matrix is the beginning of a new age of cross-project developer collaboration, at last escaping the fragmentation we’ve suffered over the last few years.

Finally, please do give feedback via Gitter or Matrix (or mail!) on the integration and where you’d like to see it go next!

Thanks for flying Matrix and Gitter,

-- The Matrix Team

Dendrite 0.3.0 released

2020-11-16 — Releases — Matthew Hodgson

Hi all,

Heads up that we just cut another beta release of Dendrite - now at 0.3.0!

This is a really fun release given almost all the changes are contributed from the wider community - so huge thanks to S7evinK, MayeulC and felix!

The main new feature is full Read Receipt support thanks to S7evinK, which makes an enormous perceptual improvement when using Dendrite - so especial thanks are due there :)

So, if you're interested in helping us test, please spin up a copy from https://github.com/matrix-org/dendrite and let us know how it goes - and if you're already running one, now is an excellent time to upgrade!

Full changelog (including 0.2.1, which we forgot to blog about) follows:

🔗Dendrite 0.3.0 (2020-11-16)

🔗Features

Read receipts (both inbound and outbound) are now supported (contributed by S7evinK)
Forgetting rooms is now supported (contributed by S7evinK)
The -version command line flag has been added (contributed by S7evinK)

🔗Fixes

User accounts that contain the = character can now be registered
Backfilling should now work properly on rooms with world-readable history visibility (contributed by MayeulC)
The gjson dependency has been updated for correct JSON integer ranges
Some more client event fields have been marked as omit-when-empty (contributed by S7evinK)
The build.sh script has been updated to work properly on all POSIX platforms (contributed by felix)

🔗Dendrite 0.2.1 (2020-10-22)

🔗Fixes

Forward extremities are now calculated using only references from other extremities, rather than including outliers, which should fix cases where state can become corrupted (#1556)
Old state events will no longer be processed by the sync API as new, which should fix some cases where clients incorrectly believe they have joined or left rooms (#1548)
More SQLite database locking issues have been resolved in the latest events updater (#1554)
Internal HTTP API calls are now made using H2C (HTTP/2) in polylith mode, mitigating some potential head-of-line blocking issues (#1541)
Roomserver output events no longer incorrectly flag state rewrites (#1557)
Notification levels are now parsed correctly in power level events (gomatrixserverlib#228, contributed by Pestdoktor)
Invalid UTF-8 is now correctly rejected when making federation requests (gomatrixserverlib#229, contributed by Pestdoktor)

How we fixed Synapse's scalability!

2020-11-03 — Releases — Matthew Hodgson

Hi all,

We had a major break-through in Synapse 1.22 which we want to talk about in more detail: Synapse now scales horizontally across multiple python processes.

Horizontal scaling means that you can support more users and traffic by adding in more python processes (spread over more machines, if necessary) without there being a single bottleneck which all the traffic is passing through - as opposed to vertical scaling where you make things go faster overall by making the bottleneck go faster.

After many years of having to vertically scale Synapse (by trying to make the main process go faster) we’re now finally at the point where you can configure Synapse so that messages no longer flow through the main process - eliminating the bottleneck entirely. What’s more, the Matrix.org homeserver has now been successfully running in this config and enjoying the massive scalability improvements for the last 2 weeks! Huge kudos goes to Erik and the wider Synapse team for pulling this off.

Some readers might wonder how this ties in with Dendrite entering beta, given one of Dendrite’s design goals is full horizontal scalability. The answer is that we’re very much using Dendrite for experimentation and next-gen stuff at the moment (currently focused more on scaling downwards for P2P rather than scaling upwards for megaservers) - while Synapse is the stable and long-term supported option.

So, that’s the context - now over to Erik with more than you could possibly ever want to know about how we actually did it...

🔗Background

Synapse started life off back in 2014 as a single monolithic python process, and for quite a while we made it scale by adding more and more in-memory caches to speed things up by avoiding hitting the database (at the expense of RAM). It looked like this:

Eventually the caches stopped helping and we needed more than one thread of execution in order to spread CPU across multiple cores. Python’s Global Interpreter Lock (GIL) means that Python can mainly only use one CPU core at a time, so starting more threads doesn’t help with scalability - you have to run multiple processes.

Now, the vast majority of the work that Synapse does is related to “streams”. These are append only sequences of rows, such as the events stream, typing stream, receipts stream, etc. When a new event arrives (for example) we write it to the events stream, and then notify anything waiting that there has been an update. The /sync endpoint, for instance, will wait for updates to streams and send them down to long-polling Matrix clients.

Streams support being added to concurrently, so have a concept of the “persisted up-to position”. This is the point where all rows before that point have finished persisting. Readers only read up to the current “persisted up-to position”, so that they don’t skip updates that haven’t finished persisting at that point. (E.g. if two events A and B get assigned positions 5 and 6, but B finishes persisting first, then the persisted up to position will remain at 4 until A finishes persisting and then it jumps to 6).

To split any meaningful amount of work into separate processes, we need to add a mechanism where processes can be told that updates to streams have happened (otherwise they’d have to repeatedly poll the DB, which would be deeply inefficient). The architecture ended up being one where we had the “main” process that streams updates via a custom replication protocol (initially long-polling HTTP; later custom TCP) to any number of “worker” processes. This meant that we could move sync stream handling (and other read apis) off the main process and onto workers, but also that all database writes had to go through the single main process (as it was a star topology, the main process could talk to all workers but workers could only talk to the main process and not each other).

As an aside: cache invalidations also had to be streamed down the replication connections, which has the side effect that we could only cache things that would only be invalidated on the main process.

We continued to move more and more read APIs out onto separate workers. We also added workers in front of the main process that would e.g. handle the creation of the new events, authenticating, etc, and then call out to the main process with the event for it to persist the event.

🔗Moving writes off the main process

Eventually we ran out of stuff to move out of the main process that didn’t involve writing to the DB. To write stuff from other processes we needed a way for the workers to stream updates to each other. The easiest and most obvious way was to just use Redis and its pub/sub support.

This almost allowed us to move writing of a particular stream to a different worker, except writing to streams generally also meant invalidating caches which in itself requires writing to a stream. We needed a way of writing to the cache invalidation stream from multiple workers at once.

Sharding the cache invalidation thankfully turned out to be easy, as workers would simply call the cache invalidation function whenever they get an invalidation notice over replication. In particular, the ordering of invalidations from different workers doesn’t matter and so there isn’t a need to calculate a single “persisted up-to position”. Sharding then just becomes a case of adding the name of the worker that is writing the update to the replication stream, and then workers reading from it can basically treat the cache stream the same as if they were multiple streams, one per worker.

This then unlocks the ability to move writing of streams off the main process and onto different workers - and so we added the “event persister” worker for offloading the main event stream off the main process:

🔗Sharding the events stream

Eventually the worker responsible for doing nothing but persisting events started maxing out CPU. This meant that we had to look at sharding the events stream, i.e. writing to it from multiple workers.

This is more complicated than sharding the cache invalidation stream as the ordering of the events does matter; we send them down sync streams, in order, with a token that indicates where the sync stream is up to in the events stream. This means that workers need to be able to calculate a “persisted up-to position” when getting updates from different workers.

The easiest way of doing that is to simply set the persisted up-to position as the minimum position received over federation from all active writers. This works, except events would only be processed after all other writers have subsequently written events (to advance the persisted position past the point at which the event was written), which can add a lot of latency depending on how often events are written.

A refinement is to note that if you have a persisted up-to position of 10, then receive updates at sequential positions 11, 12, 13 and 14, you know that everything between 10 and 14 has finished persisting (as you received updates about them), and so can set the persisted up-to position to 14. Annoyingly, it’s not required that positions are sequential without gaps (due to various technical considerations), and so in the worst case this still has the same problems as the naïve solution.

To avoid these problems we change the persisted up-to position to be a vector clock of positions; tracking a vector of positions - one per writer. This still allows answering the query of “get all events after token X” (as events are written with the position and the name of the writer). The persisted up-to position is then calculated by just tracking the last position seen to arrive over replication from each writer.

This allows writing events from multiple workers, while ensuring that other workers can correctly keep track of a “persisted up-to position”. Then it's just a matter of inspecting the code to ensure that it does not assume that it is the only writer to the stream. In the case of writing to the events stream, we note that the function persisting events assumes it's the only writer for a given room, so when sharding we have to ensure that there are no concurrent writes to the same room. This is most easily done by sharding based on room ID, and ensuring that the mapping of room ID to worker does not change (without coordination).

The only thing left is to then encode the vector clock position into the sync tokens. We want to ensure that these tokens are not too long, as they get included as query string parameters (e.g. the since= parameter of /sync). By assigning persistent unique integer IDs to workers the vector clock can be persisted as a sequence of pairs of integers, which is relatively few bytes so long as we don’t have too many workers writing to the events stream. We can further reduce the size of the tokens by calculating an integer “persisted up-to position” as we did before, encoding that and only including positions for workers that are larger than the integer persisted upto position. (The idea here is that most of the time only a small number of workers will be ahead of the calculated persisted up-to position, and so we only need to encode those).

And this is what we have today:

The major limitation of the current situation is that you can’t dynamically add/remove workers which persist events, as the sharding by room ID is calculated at startup, and so changing it requires restarting the whole system. This could be replaced by any system that allowed coordination over which persister is allowed to write to a room at any given point. However this is likely tricky to get right in practice, but would allow dynamic auto scaling of deployments, or automatically recovering from a worker that gets wedged/dies.

Finally, it’s worth noting that sharding event persisters isn’t the only performance work that’s been going on - switching everything over to python 3 and async twisted has helped, along with lots of smaller optimisations on the hot paths, and further rebalancing workers (e.g. moving background jobs off the master process to dedicated workers). We’ve also benefited a lot from the maintainability of rolling out mypy typing throughout the codebase. And next up, we’ll be going back to speeding up the codebase as a whole - starting with algorithmic state resolution improvements! 🎉

🔗Performance

So, how does it stack up?

Here’s the send time heatmap on Matrix.org showing the step change on Oct 16th when we rolled out the second event persister (full disclosure: this also coincides with moving background processes off the main Synapse process to a background worker). As you can see, we go from messages being spread over a huge range of durations (up to several seconds) to the sweet spot being 50ms or less - a spectacular improvement!

Meanwhile, here’s the actual CPU utilisation as we split the traffic from a single event persister (yellow) to two persisters (one yellow, one blue), showing the sharding beautifully horizontally balancing CPU between the two active/active worker processes:

We’ve yet to loadtest to see just how fast we can go now (before we start hitting bottlenecks on the postgres cluster), but it sure feels good to have all our CPU headroom back on Matrix.org again, ready for the next wave of users to arrive.

🔗Conclusion

So there you have it: folks running massive homeservers (50K+ concurrent users) like Matrix.org (and cough various high profile public sector deployments) are no longer held hostage by the bottleneck of the main synapse process and should feel free to experiment with setting up event persister workers to handle high traffic loads. Otherwise, if you can spread your users over smaller servers, that’s also a good bet (assuming they don’t have massively overlapping room membership, like we see on Matrix.org.)

The current worker documentation is up-to-date, although does assume you are already very familiar with how to administer Synapse. It’s also very much subject to change, as we keep adding new workers and improving the architecture. However, now is a pretty good time to get involved if you’re interested in large-scale Matrix deployments.

-- The Synapse Team

Dendrite 0.2.0 released

2020-10-20 — Releases — Matthew Hodgson

Hi all,

It's been over a week since our next-generation homeserver Dendrite entered beta, and it's been a wild rollercoaster ride as the team has been frantically zapping all the initial teething issues that came up - mostly around room federation getting 'stuck' due to needing to fix bugs in how room state is managed. Huge huge thanks to everyone who has spun up a Dendrite to experiment and report bugs!

We're now in an impressively better place, and it's feeling way more stable now (but please don't trust it with your data yet). So we've skipped 0.1.x and jumped straight to 0.2.0.

Now would be a great time for more intrepid explorers to try spinning up a server from https://github.com/matrix-org/dendrite and see how it feels - the more feedback the better. And if you got scared off by weird bugs in 0.1.0, now's the right time to try it again!

Full changelog follows:

🔗Dendrite 0.2.0 (2020-10-20)

🔗Important

This release makes breaking changes for polylith deployments, since they now use the multi-personality binary rather than separate binary files
- Users of polylith deployments should revise their setups to use the new binary - see the Features section below
This release also makes breaking changes for Docker deployments, as are now publishing images to Docker Hub in separate repositories for monolith and polylith
- New repositories are as follows: matrixdotorg/dendrite-monolith and matrixdotorg/dendrite-polylith
- The new latest tag will be updated with the latest release, and new versioned tags, e.g. v0.2.0, will preserve specific release versions
- Sample Compose configs have been updated - if you are running a Docker deployment, please review the changes
- Images for the client API proxy and federation API proxy are no longer provided as they are unsupported - please use nginx (or another reverse proxy) instead

🔗Features

Dendrite polylith deployments now use a special multi-personality binary, rather than separate binaries
- This is cleaner, builds faster and simplifies deployment
- The first command line argument states the component to run, e.g. ./dendrite-polylith-multi roomserver
Database migrations are now run at startup
Invalid UTF-8 in requests is now rejected (contributed by Pestdoktor)
Fully read markers are now implemented in the client API (contributed by Lesterpig)
Missing auth events are now retrieved from other servers in the room, rather than just the event origin
m.room.create events are now validated properly when processing a /send_join response
The roomserver now implements KindOld for handling historic events without them becoming forward extremity candidates, i.e. for backfilled or missing events

🔗Fixes

State resolution v2 performance has been improved dramatically when dealing with large state sets
The roomserver no longer processes outlier events if they are already known
A SQLite locking issue in the previous events updater has been fixed
The client API /state endpoint now correctly returns state after the leave event, if the user has left the room
The client API /createRoom endpoint now sends cumulative state to the roomserver for the initial room events
The federation API /send endpoint now correctly requests the entire room state from the roomserver when needed
Some internal HTTP API paths have been fixed in the user API (contributed by S7evinK)
A race condition in the rate limiting code resulting in concurrent map writes has been fixed
Each component now correctly starts a consumer/producer connection in monolith mode (when using Kafka)
State resolution is no longer run for single trusted state snapshots that have been verified before
A crash when rolling back the transaction in the latest events updater has been fixed
Typing events are now ignored when the sender domain does not match the origin server
Duplicate redaction entries no longer result in database errors
Recursion has been removed from the code path for retrieving missing events
QueryMissingAuthPrevEvents now returns events that have no associated state as if they are missing
Signing key fetchers no longer ignore keys for the local domain, if retrieving a key that is not known in the local config
Federation timeouts have been adjusted so we don't give up on remote requests so quickly
create-account no longer relies on the device database (contributed by ThatNerdyPikachu)

🔗Known issues

Old events can incorrectly appear in /sync as if they are new when retrieving missing events from federated servers, causing them to appear at the bottom of the timeline in clients
Memory can explode when catching up after a federation outage.

Combating abuse in Matrix - without backdoors.

2020-10-19 — General — Matthew Hodgson

🔗UPDATE: Nov 9th 2020

Not only are UK/US/AU/NZ/CA/IN/JP considering mandating backdoors, but it turns out that the Council of the European Union is working on it too, having created an advanced Draft Council Resolution on Encryption as of Nov 6th, which could be approved by the Council as early as Nov 25th if it passes approval. This doesn't directly translate into EU legislation, but would set the direction for subsequent EU policy.

Even though the Draft Council Resolution does not explicitly call for backdoors, the language used...

Competent authorities must be able to access data in a lawful and targeted manner

...makes it quite clear that they are seeking the ability to break encryption on demand: i.e. a backdoor.

Please help us spread the word that backdoors are fundamentally flawed - read on for the rationale, and an alternative approach to combatting online abuse.

Hi all,

Last Sunday (Oct 11th 2020), the UK Government published an international statement on end-to-end encryption and public safety, co-signed by representatives from the US, Australia, New Zealand, Canada, India and Japan. The statement is well written and well worth a read in full, but the central point is this:

We call on technology companies to [...] enable law enforcement access to content in a readable and usable format where an authorisation is lawfully issued, is necessary and proportionate, and is subject to strong safeguards and oversight.

In other words, this is an explicit request from seven of the biggest governments in the world to mandate a backdoor in end-to-end encrypted (E2EE) communication services: a backdoor to which the authorities have a secret key, letting them view communication on demand. This is big news, and is of direct relevance to Matrix as an end-to-end encrypted communication protocol whose core team is currently centred in the UK.

Now, we sympathise with the authorities’ predicament here: we utterly abhor child abuse, terrorism, fascism and similar - and we did not build Matrix to enable it. However, trying to mitigate abuse with backdoors is, unfortunately, fundamentally flawed.

Backdoors necessarily introduce a fatal weak point into encryption for everyone, which then becomes the ultimate high value target for attackers. Anyone who can determine the secret needed to break the encryption will gain full access, and you can be absolutely sure the backdoor key will leak - whether that’s via intrusion, social engineering, brute-force attacks, or accident. And even if you unilaterally trust your current government to be responsible with the keys to the backdoor, is it wise to unilaterally trust their successors? Computer security is only ever a matter of degree, and the only safe way to keep a secret like this safe is for it not to exist in the first place.
End-to-end encryption is nowadays a completely ubiquitous technology; an attempt to legislate against it is like trying to turn back the tide or declare a branch of mathematics illegal. Even if Matrix did compromise its encryption, users could easily use any number of other approaches to additionally secure their conversations - from PGP, to OTR, to using one-time pads, to sharing content in password-protected ZIP files. Or they could just switch to a E2EE chat system operating from a jurisdiction without backdoors.
Governments protect their own data using end-to-end encryption, precisely because they do not want other governments being able to snoop on them. So not only is it hypocritical for governments to argue for backdoors,** it immediately puts their own governmental data at risk of being compromised**. Moreover, creating infrastructure for backdoors sets an incredibly bad precedent to the rest of the world - where less salubrious governments will inevitably use the same technology to the massive detriment of their citizens’ human rights.
Finally, in Matrix’s specific case: Matrix is an encrypted decentralised open network powered by open source software, where anyone can run a server. Even if the Matrix core team were obligated to add a backdoor, this would be visible to the wider world - and there would be no way to make the wider network adopt it. It would just damage the credibility of the core team, push encryption development to other countries, and the wider network would move on irrespectively.

In short, we need to keep E2EE as it is so that it benefits the 99.9% of people who are good actors. If we enforce backdoors and undermine it, then the bad 0.1% percent simply will switch to non-backdoored systems while the 99.9% are left vulnerable.

We’re not alone in thinking this either: the GDPR (the world-leading regulation towards data protection and privacy) explicitly calls out robust encryption as a necessary information security measure. In fact, the risk of US governmental backdoors explicitly caused the European Court of Justice to invalidate the Privacy Shield for EU->US data. The position of the seven governments here (alongside recent communications by the EU commissioner on the ‘problem’ of encryption) is a significant step back on the protection of the fundamental right of privacy.

So, how do we solve this predicament for Matrix?

Thankfully: there is another way.

This statement from the seven governments aims to protect the general public from bad actors, but it clearly undermines the good ones. What we really need is something that empowers users and administrators to identify and protect themselves from bad actors, without undermining privacy.

What if we had a standard way to let users themselves build up and share their own views of whether other users, messages, rooms, servers etc. are obnoxious or not? What if you could visualise and choose which filters to apply to your view of Matrix?

Just like the Web, Email or the Internet as a whole, there is literally no way to unilaterally censor or block content in Matrix. But what we can do is provide first-class infrastructure to let users (and room/community moderators and server admins) make up their own mind about who to trust, and what content to allow. This would also provide a means for authorities to publish reputation data about illegal content, providing a privacy-respecting mechanism that admins/mods/users can use to keep illegal content away from their servers/clients.

The model we currently have in mind is:

Anyone can gather reputation data about Matrix rooms / users / servers / communities / content, and publish it to as wide or narrow an audience as they like - providing their subjective score on whether something in Matrix is positive or negative in a given context.
This reputation data is published in a privacy preserving fashion - i.e. you can look up reputation data if you know the ID being queried, but the data is stored pseudonymised (e.g. indexed by a hashed ID).
Anyone can subscribe to reputation feeds and blend them together in order to inform how they filter their content. The feeds might be their own data, or from their friends, or from trusted sources (e.g. a fact-checking company). Their blended feed can be republished as their own.
To prevent users getting trapped in a factional filter bubble of their own devising, we’ll provide UI to visualise and warn about the extent of their filtering - and make it easy and fun to shift their viewpoint as needed.
Admins running servers in particular jurisdictions then have the option to enforce whatever rules they need on their servers (e.g. they might want to subscribe to reputation feeds from a trusted source such as the IWF, identifying child sexual abuse content, and use it to block it from their server).
This isn’t just about combating abuse - but the same system can also be used to empower users to filter out spam, propaganda, unwanted NSFW content, etc on their own terms.

This forms a relative reputation system. As uncomfortable as it may be, one man’s terrorist is another man’s freedom fighter, and different jurisdictions have different laws - and it’s not up to the Matrix.org Foundation to play God and adjudicate. Each user/moderator/admin should be free to make up their own mind and decide which reputation feeds to align themselves with. That is not to say that this system would help users locate extreme content - the privacy-preserving nature of the reputation data means that it’s only useful to filter out material which would otherwise already be visible to you - not to locate new content.

In terms of how this interacts with end-to-end-encryption and mitigating abuse: the reality is that the vast majority of abuse in public networks like Matrix, the Web or Email is visible from the public unencrypted domain. Abusive communities generally want to attract/recruit/groom users - and that means providing a public front door, which would be flagged by a reputation system such as the one proposed above. Meanwhile, communities which are entirely private and entirely encrypted typically still have touch-points with the rest of the world - and even then, the chances are extremely high that they will avoid any hypothetical backdoored servers. In short, investigating such communities requires traditional infiltration and surveillance by the authorities rather than an ineffective backdoor.

Now, this approach may sound completely sci-fi and implausibly overambitious (our speciality!) - but we’ve actually started successfully building this already, having been refining the idea over the last few years. MSC2313 is a first cut at the idea of publishing and subscribing to reputation data - starting off with simple binary ban rules. It’s been implemented and in production for over a year now, and is used to maintain shared banlists used by both matrix.org and mozilla.org communities. The next step is to expand this to support a blendable continuum of reputation data (rather than just binary banlists), make it privacy preserving, and get working on the client UX for configuring and visualising them.

Finally: we are continuing to hire a dedicated Reputation Team to work full time on building this (kindly funded by Element). This is a major investment in the future of Matrix, and frankly is spending money that we don’t really have - but it’s critical to the long-term success of the project, and perhaps the health of the Internet as a whole. There’s nothing about a good relative reputation system which is particularly specific to Matrix, after all, and many other folks (decentralised and otherwise) are clearly in desperate need of one too. We are actively looking for funding to support this work, so if you’re feeling rich and philanthropic (or a government wanting to support a more enlightened approach) we would love to hear from you at [email protected]!

Here’s to a world where users have excellent tools to protect themselves online - and a world where their safety is not compromised by encryption backdoors.

-- The Matrix.org Core Team

*Comments at HN, lobste.rs, and r/linux, LWN

Dendrite is entering Beta!

2020-10-08 — Releases — Matthew Hodgson

Hi all,

We’re very excited to announce that Dendrite, the next-generation Matrix homeserver from the core Matrix team, is at last exiting alpha development and entering beta testing!

The path we’ve taken to get here has been quite a curious one, and it’s worth recapping to give context on why it’s taken reality a little while to catch up with the dream. :)

The Dendrite project has its roots in 2016 as Dendron: an attempt to write a next-generation homeserver in Golang rather than Python, in order to benefit from Go’s stronger typing, ease of profiling (no twisted stack-shredding via deferredInlineCallbacks), multithreading and faster GC performance. The idea for Dendron was to do a strangler pattern rewrite of Synapse - where we’d insert Dendron in front of Synapse as a load balancer, and incrementally replace Synapse’s API endpoints with ones implemented by Dendron.

However, as the project started to progress, it became clear that this was going to end up with many of Synapse’s architectural choices being baked into the project - particularly the DB schema and data flow architecture, such that the new endpoints could interoperate with the existing Python ones. We got as far as putting Dendron live on matrix.org and moving some of the login/registration APIs over to it… but then work fizzled out due to Synapse demanding more urgent attention as traffic grew on Matrix.org, combined with concerns about whether Dendron was the right approach in general.

So, towards the end of 2016 (after the rush to launch ~~Vector~~ ~~Riot~~ Element that summer), we went back to the drawing board to devise Dendrite—“Dendron done right!”—as opposed to Dendron, which in retrospect was Dendrite done wrong. ;) The new vision was:

Build a massively horizontally scalable architecture, such that large Matrix deployments like matrix.org and big government deployments could run smoothly without the constant scalability headaches we were seeing at the time with Synapse
Do so by splitting the server into well-defined microservice components, each of which could independently horizontally scale, each with its own DB (if desired)
Connect the components together with a set of append-only logs via Kafka or similar, easily letting components shard and maintain their databases from the logs, allowing rolling upgrades, possibly schema upgrades, and all sorts of other niceties. The logs effectively become a primary source of truth rather than putting all the onus on a massive monolithic ever-growing database

Rather than Dendron’s top-down approach, instead Dendrite started bottom-up with the very hardest bit: gomatrixserverlib, a standalone Go library implementing the state resolution algorithms and performing federation requests (such that it might also someday be used as a general purpose way to add Matrix federation support to an existing Go codebase).

Then we started building out the various components to implement the various services, starting with the roomserver (the service which models the history and state of one or more rooms in the server), then the syncserver (the service which implements the /sync API to let clients receive messages), etc. We even implemented a simplified in-memory version of Kafka named naffka—useful for glueing together the microservice components when running them all within a single binary.

Things were looking pretty positive by the summer of 2017: we had the server sending/receiving messages, federating with Synapse, and looking tantalisingly close:

We just sent the first ever synapse->dendrite federated traffic, including full dendrite media API (thumbnailing, fed, etc)!!! :D :D :D pic.twitter.com/sBcM2jMAr6
— Matrix (@matrixdotorg) June 8, 2017

However, we then hit three fairly major obstacles:

Matrix lost its funding
In the ensuing uncertainty, the two lead developers (Mjark & Kegan) went to work elsewhere
Meanwhile, Matrix uptake was starting to explode and Synapse was failing to scale to handle the traffic on matrix.org (and elsewhere)

At first, having formed what would become New Vector (now Element) to keep the rest of the core team hired, we pushed to see if we could get Dendrite finished fast enough to replace Synapse, with Erik & richvdh jumping over from Synapse to pick up the remaining work. However, it became clear that we urgently needed a quicker solution to address all the overloaded Synapses out there, and so they swung back to focus on improving Synapse (taking inspiration from some of the design of Dendrite - e.g. offloading endpoints onto worker processes connected via replication streams, and using OpenTracing to debug traffic as it flows over the various services).

At this point, Dendrite maintenance was in effect valiantly taken over by the community, with Brendan and later Anoa keeping the ball going in 2017, joined by APWhitehat in GSoC 2018 and cnly in GSoC 2019. The fact that Dendrite is now here today is thanks in no small part to their work to keep the project alive in its “wilderness years” between Sept 2017 and Dec 2019.

Meanwhile, it became clear that we were overdue getting Matrix itself out of beta - and the last thing we wanted to do was to split and dilute the implementation work of Matrix 1.0 over both Synapse and Dendrite - so we consciously made the decision to focus all our effort on Synapse for solving the remaining bugs and challenges.

Then, in July 2019, Matrix and Synapse exited beta, and we finally started to see light at the end of the tunnel. In October we started dusting off Dendrite again - looking to use it as a relatively simple and flexible codebase for experimenting with Peer-to-Peer Matrix, not least because being Go it can compile to WebAssembly and run clientside, and because even though Dendrite was originally built with massive deployments in mind, it turns out the elastic scaling means it can also scale down pretty small too—as a part of the iOS P2P demo, we’ve even ran full Dendrite homeservers on iPhones embedded into Element iOS! :)

In Dec 2019, we finally got to the point where Element could fund full-time dedicated development on Dendrite once again, with Neil Alexander joining the project and focusing fulltime on getting Dendrite out of alpha and getting it working for P2P and embedded usage (adding libp2p as a federation transport, and adding SQLite support) - and in Jan 2020 we got Dendrite successfully running clientside in a WASM service worker (just in time for FOSDEM!). Then, in Feb 2020, Kegan returned to the project to work fulltime on Dendrite - and the race began in earnest to get Dendrite ready for beta!

Here’s a pretty picture courtesy of GitHub to visualise the progress:

Throughout 2020 there’s been a huge amount of stabilisation work and polish:

Refactoring much of Dendrite’s foundation to make the codebase more maintainable
Created all-new user server, key server, signing key server microservices
Moving some work from existing microservices (ultimately superseding the former currentstateserver, publicroomsapi and typingserver microservices altogether)
Developing new testing infrastructure:
- Complement - our brand new Golang Matrix integration test harness
- Are We Synapse Yet - an aggregator which parses sytest/complement output to compare how close Dendrite is to passing
All the Matrix 1.0 work - particularly state res v2 & room version support
Making it work with more P2P transports for all the exciting P2P experiments
Supporting backfill and fetching missing events
Fixing up SQLite support to make it work as a first class citizen (with shared storage code where we can!)
Supporting both sending and rejecting invites (even over federation)
E2E encryption support (one-time keys, device lists, send-to-device support)
Improved federation sender logic (resend retries, backoffs, blacklisting, metrics, resetting backoffs when receiving transactions)
Handling both inbound and outbound redactions
User interactive authentication (and implemented on various ‘sudo’ endpoints e.g. deleting devices and changing passwords)
Respecting server ACLs
Rejecting / soft-failing events properly
Support for database schema upgrades

... which brings us at last to the present day (Oct 2020), as we declare Dendrite sufficiently stable that we consider it ready for beta testing!

In practice, this means **Dendrite is now ready for experimentation by adventurous Matrix sysadmins. It is NOT ready for production usage yet, but we need folks to test it and help us iron out the remaining bugs! **Please do not trust it with sensitive data yet, and we don’t recommend trying to run it at scale yet as we haven’t done any serious optimisation work yet.

That said, we do provide the following guarantees:

We’re providing versioned releases from here on in, beginning with 0.1.0
We don’t expect any major breaking changes to the config or architecture before 1.0
Ready for early adopters to try running Dendrite without experiencing ~daily breaking churn
The database schema is now stable and will upgrade itself going forwards - your database should now be here to stay! (assuming we don’t hit any nasty data loss bugs during beta)

In terms of comparison with Synapse, the main things you should get excited about are:

Dendrite aims to provide an efficient, reliable and scalable alternative to Synapse:
- Efficient: A small memory footprint with better baseline performance than an out-of-the-box Synapse
- Reliable: Implements the Matrix specification as written, using the same test suite as Synapse as well as a brand new Go test suite
- Scalable: can run on multiple machines and eventually scale to massive homeserver deployments
This means significantly less memory usage than Synapse (depends on joined rooms, often between 50MB - 400MB resident memory) - although we haven’t tuned this at all yet!
All-new database model, where every microservice instance has its own database tables, letting them scale arbitrarily wide
The ability to efficiently use all your available CPU cores without needing to split into separate processes, thanks to Go and our extensive use of goroutines. No more Python global interpreter lock! :)
Future experimental MSCs are likely to land in Dendrite before Synapse (e.g MSC2753 Peeking via /sync and MSC2444 Peeking over Federation are already being prototyped (#1370 and #1391) in Dendrite rather than Synapse!)

The provisos you should know about however are:

We’re not feature complete yet: sytest reports 56% CS API coverage and 77% Federation coverage. NB: these are always going to be underestimates of how much Dendrite actually performs due to how the tests are spread out, in actuality it’s likely more 70% CS, 95% Fed.
No read receipts, membership lazy-loading, presence, push notifications, search, event context, key backups, cross-signing. See changelog for full limitations.
Not battle-tested in the wild by many people (there are probably only ~10 dendrites on the open network today!) - so there’s likely to be a broad spectrum of bugs at first.
Clients that require more exotic features, like lazy loading, may not behave properly yet
Please use Postgres rather than SQLite wherever possible—it’s faster and has fewer issues regarding concurrency (some requests on SQLite Dendrites may 500 with ‘database is locked’ - though we’ve worked hard to eliminate most of these)
Dendrite can run in either “monolith” or “polylith” mode. In monolith, all the microservices are linked into a single binary - and we recommend running in this configuration wherever possible for now. Monolith mode is extremely capable as it is and has fewer moving parts for things to go wrong and will be the right choice for the majority of beta deployments!
Whilst Dendrite is nearly 100% federation compatible, there may still be situations where it will split-brain and disagree with the current room state that Synapse has calculated. We expect these issues to resolve as we get more user feedback.

Architecture-wise, this is what Dendrite looks like under the hood today:

To get up and running, please install Go and head on over to the Get Started guide at https://github.com/matrix-org/dendrite#get-started to join the fun :)

In terms of where we’re going next:

Read receipts. It’s a major missing feature and impacts UX significantly.
100% Federation coverage (according to sytest). It’s crucial that Dendrite instances play nicely with other servers. This will be the best metric we have for asserting that we are just as capable as Synapse at the fed level.
Optimisation—Dendrite has not been optimised yet for speed or resource utilisation!
- We plan to add benchmarks which will stress test different microservices in the presence of many different scaling factors (number of users, number of rooms, size of room, number of devices per user, number of sync requests, etc). This will hopefully allow us to identify early on bottlenecks and slow algorithms
- Good old fashioned pprof with known slow scenarios to see what’s consuming CPU/memory and fixing issues ad-hoc (which we’ve already done a bit of pre-beta). This may involve adding additional in-memory caches, with a healthy respect for the complexities it may introduce (which Synapse has been bitten by)
We plan to add first class feature flag support for experimental MSCs—experimentation is one thing which makes Dendrite notably different from Synapse, and supporting it more thoroughly going forwards will be important. This may mean adding additional hooks; potentially a dedicated microservice to cleanly separate experiments, we don’t know yet
P2P work will continue with vigour now we have a working, featureful, and relatively stable HS to embed and play with

Longer term, it’s pretty hard to say right now when we expect to exit beta (it took Synapse 5 years to exit beta, after all ;) - but obviously we’ll need Dendrite to have parity with Synapse and have no known serious bugs.

Finally: you’re probably wondering what this means for Synapse. Synapse is here to stay - with tens of thousands of deployments around the world serving tens of millions of users. The majority of the core team is still focused on improving and optimising Synapse, and we’ll be keeping improving it for the foreseeable.

However, we’ll certainly be experimenting with new stuff on Dendrite first - whether that’s P2P, portable accounts, new-style communities, peeking etc. We expect Synapse to be the stable long-term-supported solution, while Dendrite (particularly while in beta) will be the more unstable and experimental platform. In the longer term we’ll provide ways of migrating from Synapse to Dendrite however (probably via portable accounts), and perhaps in future new deployments may choose to use Dendrite - a bit like you might choose to use nginx rather than Apache for a new web server these days. But this will be a long transition—meanwhile we expect to see more and more next-generation homeservers like Conduit, Mascarene or Construct coming of age too.

So, there you have it. If you’re an intrepid sysadmin please spin up a Dendrite and start filing bugs! :)

— Matthew, Neil Alexander, Kegan and the whole Matrix team.

Here’s the official changelog:

🔗Client-Server API Features

🔗Account registration and management

Registration: By password only.
Login: By password only. No fallback.
Logout: Yes.
Change password: Yes.
Link email/msisdn to account: No.
Deactivate account: Yes.
Check if username is available: Yes.
Account data: Yes.
OpenID: No.

🔗Rooms

Room creation: Yes, including presets.
Joining rooms: Yes, including by alias or ?server_name=.
Event sending: Yes, including transaction IDs.
Aliases: Yes.
Published room directory: Yes.
Kicking users: Yes.
Banning users: Yes.
Inviting users: Yes, but not third-party invites.
Forgetting rooms: No.
Room versions: All (v1 - v6)
Tagging: Yes.

🔗User management

User directory: Basic support.
Ignoring users: No.
Groups/Communities: No.

🔗Device management

Creating devices: Yes.
Deleting devices: Yes.
Send-to-device messaging: Yes.

🔗Sync

Filters: Timeline limit only. Rest unimplemented.
Deprecated /events and /initialSync: No.

🔗Room events

Typing: Yes.
Receipts: No.
Read Markers: No.
Presence: No.
Content repository (attachments): Yes.
History visibility: No, defaults to joined.
Push notifications: No.
Event context: No.
Reporting content: No.

🔗End-to-End Encryption

Uploading device keys: Yes.
Downloading device keys: Yes.
Claiming one-time keys: Yes.
Querying key changes: Yes.
Cross-Signing: No.

🔗Misc

Server-side search: No.
Guest access: Partial.
Room previews: No, partial support for Peeking via MSC2753.
Third-Party networks: No.
Server notices: No.
Policy lists: No.

🔗Federation Features

Querying keys (incl. notary): Yes.
Server ACLs: Yes.
Sending transactions: Yes.
Joining rooms: Yes.
Inviting to rooms: Yes, but not third-party invites.
Leaving rooms: Yes.
Content repository: Yes.
Backfilling / get_missing_events: Yes.
Retrieving state of the room (/state and /state_ids): Yes.
Public rooms: Yes.
Querying profile data: Yes.
Device management: Yes.
Send-to-Device messaging: Yes.
Querying/Claiming E2E Keys: Yes.
Typing: Yes.
Presence: No.
Receipts: No.
OpenID: No.