Breaking the 100bps barrier with Matrix, meshsim & coap-proxy

Hi all,

Last month at FOSDEM 2019 we gave a talk about a new experimental ultra-low-bandwidth transport for Matrix which swaps our baseline HTTPS+JSON transport for a custom one built on CoAP+CBOR+Noise+Flate+UDP.  (CoAP is the RPC protocol; CBOR is the encoding; Noise powers the transport layer encryption; Flate compresses everything uses predefined compression maps).

The challenge here was to see if we could demonstrate Matrix working usably over networks running at around 100 bits per second of throughput (where it’d take 2 minutes to send a typical 1500 byte ethernet packet!!) and very high latencies.  You can see the original FOSDEM talk below, or check out the slides here.

Now, it’s taken us a little while to find time to tidy up the stuff we demo’d in the talk to be (relatively) suitable for public consumption, but we’re happy to finally release the four projects which powered the demo:

In order to get up and running, the meshsim README has all the details.

It’s important to understand that this is very much a proof of concept, and shouldn’t be used in production yet, and almost certainly has some glaring bugs.  In fact, it currently assumes you are running on a trusted private network rather than the public Matrix network in order to get away with some of the bandwidth optimisations performed – see coap-proxy’s Limitations section for details.  Particularly, please note that the encryption is homemade and not audited or fully reviewed or tested yet.  Also, we’ve released the code for the low-bandwidth transport, but we haven’t released the “fan-out routing” implementation for Synapse as it needs a rethink to be applicable to the public Matrix network.  You’ll also want to run Riot/Web in low-bandwidth mode if you really wind down the bandwidth (suppressing avatars, read receipts, typing notifs and presence to avoid wasting precious bandwidth).

We also don’t have an MSC for the CoAP-based transport yet, mainly due to lack of time whilst wanting to ensure the limitations are addressed first before we propose it as a formal alternative Matrix transport.  (We also first need to define negotiation mechanisms for entirely alternative CS & SS transports!).  However, the quick overview is:

  • JSON is converted directly into CBOR (with a few substitutions made to shrink common patterns down)
  • HTTP is converted directly into CoAP (mapping the verbose API endpoints down to single-byte endpoints)
  • TLS is swapped out for Noise Pipes (XX + IK noise handshakes).  This gives us 1RTT setup (XX) for the first connection to a host, and 0RTT (IK) for all subsequent connections, and provides trust-on-first-use semantics when connecting to a server.  You can see the Noise state machine we maintain in go-coap’s noise.go.
  • The CoAP headers are hoisted up above the Noise payload, letting us use them for framing the noise pipes without having duplicated framing headers at the CoAP & Noise layers.  We also frame the Noise handshake packets as CoAP with custom message types (250, 251 and 252).  We might be better off using OSCORE for this, however, rather than hand-wrapping a custom encrypted transport…
  • The CoAP payload is compressed via Flate using preshared compression tables derived from compressing large chunks of representative Matrix traffic. This could be significantly improved in future with streaming compression and dynamic tables (albeit seeded from a common set of tables).

The end result is that you end up taking about 90 bytes (including ethernet headers!) to send a typical Matrix message (and about 70 bytes to receive the acknowledgement).  This breaks down as as:

  • 14 bytes of Ethernet headers
  • 20 bytes of IP headers
  • 8 bytes of UDP headers
  • 16 bytes of Noise AEAD
  • 6 bytes of CoAP headers
  • ~26 bytes of compressed and encrypted CBOR

The Noise handshake on connection setup would take an additional 128 bytes (4x 32 byte Curve25519 DH values), either spread over 1RTT for initial setup or 0RTT for subsequent setups.

At 100bps, 90 bytes takes 90*8/100 = 7.2s to send… which is just about usable in an extreme life and death situation where you can only get 100bps of connectivity (e.g. someone at the bottom of a ravine trying to trickle data over one bar of GPRS to the emergency services).  In practice, on a custom network, you could ditch the Ethernet and UDP/IP headers if on a point-to-point link for CS API, and ditch the encryption if the network physical layer was trusted – at which point we’re talking ~32 bytes per request (2.5s to send at 100bps).  Then, there’s still a whole wave of additional work that could be investigated, including…

  • Smarter streaming compression (so that if a user says ‘Hello?’ three times in a row, the 2nd and 3rd messages are just references to the first pattern)
  • Hoisting Matrix transaction IDs up to the CoAP layer (reusing the CoAP msgId+token rather than passing around new Matrix transaction IDs, at the expense of requiring one Matrix txn per request)
  • Switching to CoAP OBSERVE for receiving data from the server (currently we long-poll /sync to receive data)
  • Switching access_tokens for PSKs or similar

…all of which could shrink the payload down even further.  That said, even in its current state, it’s a massive improvement – roughly ~65x better than the equivalent HTTPS+JSON traffic.

In practice, further work on low-bandwidth Matrix is dependent on finding a sponsor who’s willing to fund the team to focus on this, as otherwise it’s hard to justify spending time here in addition to all the less exotic business-as-usual Matrix work that we need to keep the core of Matrix evolving (finishing 1.0, finishing E2E encryption, speeding up Synapse, finishing Dendrite, rewriting Riot/Android etc).  However, the benefits here should be pretty obvious: massively reduced bandwidth and battery-life; resilience to catastrophic network conditions; faster sync times; and even a protocol suitable for push notifications (Matrix as e2e encrypted, decentralised, push!).  If you’re interested in supporting this work, please contact support at matrix.org.

Experiments with payments over Matrix

Hi all,

Heads up that Modular.im (the paid hosting Matrix service provided by New Vector, the company who employs much of the Matrix core team) launched a pilot today for paid Matrix integrations in the form of paid sticker packs.  Yes kids, it’s true – for only $0.50 you can slap Matrix and Riot hex stickers all over your chatrooms. It’s a toy example to test the payments infrastructure and demonstrate the concept – the proceeds go towards funding development work on Matrix.org :)  You can read more about over on Modular’s blog.

We wanted to elaborate on this a bit from the Matrix.org perspective, specifically:

  • We are categorically not baking payments or financial incentives as a first class citizen into Matrix, and we’re not going to start moving stuff behind paywalls or similar.
  • This demo is a proof-of-concept to illustrate how folks could do this sort of thing in general in Matrix – it’s not a serious product in and of itself.
  • What it shows is that an Integration Manager like Modular can be used as a way to charge for services in Matrix – whether that’s digital content within an integration, or bots/bridges/etc.
  • While Modular today gathers payments via credit-card (Stripe), it could certainly support other mechanisms (e.g. cryptocurrencies) in future.
  • The idea in future is for Modular to provide this as a mechanism that anyone can use to charge for content on Matrix – e.g. if you have your own sticker pack and want to sell it to people, you’ll be able to upload it and charge people for it.

Meanwhile, there’s a lot of interesting stuff on the horizon with integration managers in general – see MSC1236 and an upcoming MSC from TravisR (based around https://github.com/matrix-org/matrix-doc/issues/1286) proposing new integration capabilities.  We’re also hoping to implement inline widgets soon (e.g. chatbot buttons for voting and other semantic behaviour) which should make widgets even more interesting!

So, feel free to go stick some hex stickers on your rooms if you like and help test this out.  In future there should be more useful things available :)

Welcome to Matrix, KDE!

Hi all,

We’re very excited to officially welcome the KDE Community on to Matrix as they announce that KDE Community is officially adopting Matrix as a chat platform, and kde.org now has an official Matrix homeserver!

You can see the full announcement and explanation over at https://dot.kde.org/2019/02/20/kde-adding-matrix-its-im-framework.  It is fantastic to see one of the largest Free Software communities out there proactively adopting Matrix as an open protocol, open network and FOSS project, rather than drifting into a proprietary centralised chat system.  It’s also really fun to see Riot 1.0 finally holding its own as a chat app against the proprietary alternatives!

This doesn’t change the KDE rooms which exist in Matrix today or indeed the KDE Freenode IRC channels – so many of the KDE community were already using Matrix, all the rooms already exist and are already bridged to the right places.  All it means is that there’s now a shiny new homeserver (powered by Modular.im) on which KDE folk are welcome to grab an account if they want, rather than sharing the rather overloaded public matrix.org homeserver.  The rooms have been set up on the server to match their equivalent IRC channels – for instance, #kde:kde.org is the same as #kde on Freenode; #kde-devel:kde.org is the same as #kde-devel etc.  The rooms continue to retain their other aliases (#kde:matrix.org, #freenode_#kde:matrix.org etc) as before.

There’s also a dedicated Riot/Web install up at https://webchat.kde.org, and instructions on connecting via other Matrix clients up at https://community.kde.org/Matrix.

This is great news for the Matrix ecosystem in general – and should be particularly welcome for Qt client projects like Quaternion, Spectral and Nheko-Reborn, who may feel a certain affinity to KDE!

So: welcome, KDE!  Hope you have a great time, and please let us know how you get on, so we can make sure Matrix kicks ass for you :)

– the Matrix Core Team

 

Matrix at FOSDEM 2019

Hi all,

We just got back from braving the snow in Brussels at FOSDEM 2019 – Europe’s biggest Open Source conference. I think it’s fair to say we had an amazing time, with more people than ever before wanting to hang out and talk Matrix and discuss their favourite features (and bugs)!

The big news is that we released r0.1 of Matrix’s Server-Server API late on Friday night – our first ever formal stable release of Matrix’s Federation API, having addressed the core of the issues which have kept Federation in beta thus far. We’ll go into more detail on this in a dedicated blog post, but this marks the first ever time that all of Matrix’s APIs have had an official stable release.  All that remains before we declare Matrix out of beta is to release updates of the CS API (0.5) and possibly the IS API (0.2) and then we can formally declare the overall combination as Matrix 1.0 :D

We spoke about SS API r0.1 at length in our main stage FOSDEM talk on Saturday – as well as showing off the Riot Redesign, the E2E Encryption Endgame and giving an update on the French Government deployment of Matrix and the focus it’s given us on finally shipping Matrix 1.0! For those who weren’t there or missed the livestream, here’s the talk!  Slides are available here.

Then, on Sunday we had the opportunity to have a quick 20 minute talk in the Real Time Comms dev room, where we gave a tour of some of the work we’ve been doing recently to scale Matrix down to working on incredibly low bandwidth networks (100bps or less).  It’s literally the opposite of the Matrix 1.0 / France talk in that it’s a quick deep dive into a very specific problem area in Matrix – so, if you’ve been looking forward to Matrix finally having a better transport than HTTPS+JSON, here goes!  Slides are available here.

Huge thanks to everyone who came to the talks, and everyone who came to the stand or grabbed us for a chat! FOSDEM is an amazing way to be reminded in person that folks care about Matrix, and we’ve come away feeling more determined than ever to make Matrix as great as possible and provide a protocol+network which will replace the increasingly threatened proprietary communication silos. :)

Next up: Matrix 1.0…

The 2018 Matrix Holiday Special!

Hi all,

It’s that time again where we break out the mince pies and brandy butter (at least for those of us in the UK) and look back on the year to see how far Matrix has come, as well as anticipate what 2019 may bring!

Overview

It’s fair to say that 2018 has been a pretty crazy year.  We have had one overriding goal: to take the funding we received in January, stabilise and freeze the protocol and get it and the reference implementations out of beta and to a 1.0 – to provide a genuinely open and decentralised mainstream alternative to the likes of Slack, Discord, WhatsApp etc.  What’s so crazy about that, you might ask?

Well, in parallel with this we’ve also seen adoption of Matrix accelerating ahead of our dev plan at an unprecedented speed: with France selecting Matrix to power the communication infrastructure of its whole public sector – first trialling over the summer, and now confirmed for full roll-out as of a few weeks ago.  Meanwhile there are several other similar-sized projects on the horizon which we can’t talk about yet.  We’ve had the growing pains of establishing New Vector as a startup in order to hire the core team and support these projects.  We’ve launched Modular to provide professional-quality SaaS Matrix hosting for the wider community and help fund the team.  And most importantly, we’ve also been establishing the non-profit Matrix.org Foundation to formalise the open governance of the Matrix protocol and protect and isolate it from any of the for-profit work.

However: things have just about come together.  Almost all the spec work for 1.0 is done and we are now aiming to get a 1.0 released in time by the end of January (in time for FOSDEM).  Meanwhile Synapse has improved massively in terms of performance and stability (not least having migrated over to Python 3); Riot’s spectacular redesign is now available for testing right now; E2E encryption is more stable than ever with the usability rework landing as we speak.  And we’ve even got a full rewrite of Riot/Android in the wings.

But it’s certainly been an interesting ride.  Longer-term spec work has been delayed by needing to apply band-aids to mitigate abuse of the outstanding issues.  Riot redesign was pushed back considerably due to prioritising Riot performance over UX. The E2E UX work has forced us to consider E2E holistically… which does not always interact well with structuring the dev work into bite-sized chunks.  Dendrite has generally been idling whilst we instead pour most of our effort into getting to 1.0 on Synapse (rather than diluting 1.0 work across both projects). These tradeoffs have been hard to make, but hopefully we have chosen the correct path in the end.

Overall, as we approach 1.0, the project is looking better than ever – hopefully everyone has seen both Riot and Synapse using less RAM and being more responsive and stable, E2E being more reliable, and anyone who has played with the Riot redesign beta should agree that it is light-years ahead of yesterday’s Riot and something which can genuinely surpass today’s centralised proprietary incumbents. And that is unbelievably exciting :D

We’d like to thank everyone for continuing to support Matrix – especially our Patreon & Liberapay supporters, whose donations continue to be critical for helping fund the core dev team, and also those who are supporting the project indirectly by hosting homeservers with Modular.  We are going to do everything humanly possible to ship 1.0 over the coming weeks, and then the sky will be the limit!

Before going into what else 2019 will hold, however, let’s take the opportunity to give a bit more detail on the various core team projects which landed in 2018…

France

DINSIC (France’s Ministry of Digital, IT & Comms) have been busy building out their massive cross-government Matrix deployment and custom Matrix client throughout most of the year.  After the announcement in April, this started off with an initial deployment over the summer, and is now moving towards the full production rollout, as confirmed at the Paris Open Source Summit a few weeks ago by Mounir Mahjoubi, the Secretary of State of Digital.  All the press coverage about this ended up in French, with the biggest writeup being at CIO Online, but the main mention of Matrix (badly translated from French) is:

Denouncing the use of tools such as WhatsApp; a practice that has become commonplace within ministerial offices, Mounir Mahjoubi announced the launch in production of Tchap, based on Matrix and Riot: an instant messaging tool that will be provided throughout the administrations. So, certainly, developing a product can have a certain cost. Integrating it too. “Free is not always cheaper but it’s always more transparent,” admitted the Secretary of State.

The project really shows off Matrix at its best, with up to 60 different deployments spread over different ministries and departments; multiple clusters per Ministry; end-to-end encryption enabled by default (complete with e2e-aware antivirus scanning); multiple networks for different classes of traffic; and the hope of federating with the public Matrix network once the S2S API is finalised and suitable border gateways are available.  It’s not really our project to talk about, but we’ll try to share as much info as we can as roll-out continues.

The Matrix Specification

A major theme throughout the year has been polishing the Matrix Spec itself for its first full stable release, having had more than enough time to see which bits work in practice now and which bits need rethinking.  This all kicked off with the creation of the Matrix Spec Change process back in May, which provides a formal process for reviewing and accepting contributions from anyone into the spec.  Getting the balance right between agility and robustness has been quite tough here, especially pre-1.0 where we’ve needed to move rapidly to resolve the remaining long-lived sticking points.  However, a process like this risks encouraging the classic “Perfect is the Enemy of Good” problem, as all and sundry jump in to apply their particular brand of perfectionism to the spec (and/or the process around it) and risk smothering it to death with enthusiasm.  So we’ve ended up iterating a few times on the process and hopefully now converged on an approach which will work for 1.0 and beyond. If you haven’t checked out the current proposals guide please give it a look, and feel free to marvel at all the MSCs in flight.  You can also see a quick and dirty timeline of all the MSC status changes since we introduced the process, to get an idea of how it’s all been progressing.

In August we had a big sprint to cut stable “r0” releases of all the APIs of the spec which had not yet reached a stable release (i.e. all apart from the Client-Server API, which has been stable since Dec 2015 – hence in part the large number of usable independent Matrix clients relative to the other bits of the ecosystem).  In practice, we managed to release 3 out of the 4 remaining APIs – but needed more time to solve the remaining blocking issues with the Server-Server API. So since August (modulo operational and project distractions) we’ve been plugging away on the S2S API.  

The main blocking issue for a stable S2S API has been State Resolution. This is the fundamental algorithm used to determine the state of a given room whenever a race or partition happens between the servers participating in it.  For instance: if Alice kicks Bob on her server at the same time as Charlie ops Bob on his server, who should win? It’s vital that all servers reach the same conclusion as to the state of the room, and we don’t want servers to have to replicate a full copy of the room’s history (which could be massive) to reach a consistent conclusion.  Matrix’s original state resolution algorithm dates back to the initial usable S2S implementation at the beginning of 2015 – but over time deficiencies in the algorithm became increasingly apparent. The most obvious issue is the “Hotel California” bug, where users can be spontaneously re-joined to a room they’ve previously left if the room’s current state is merged with an older copy of the room on another server and the ‘wrong’ version wins the conflict – a so-called state-reset.

After a lot of thought we ended up proposing an all new State Resolution algorithm in July 2018, nicknamed State Resolution Reloaded.  This extends the original algorithm to consider the ‘auth chain’ of state events when performing state resolution (i.e. the sequence of events that a given state event cites as evidence of its validity) – as well as addressing a bunch of other issues.  For those wishing to understand in more detail, there’s the MSC itself, the formal terse description of the algorithm now merged into the unstable S2S spec – or alternatively there’s an excellent step-by-step explanation and guided example from uhoreg (warning: contains Haskell :)  Or you can watch Erik and Matthew try to explain it all on Matrix Live back in July.

Since the initial proposal in July, the algorithm has been proofed out in a test jig, iterated on some more to better specify how to handle rejected events, implemented in Synapse, and is now ready to roll.  The only catch is that to upgrade to it we’ve had to introduce the concept of room versioning, and to flush out historical issues we require you to re-create rooms to upgrade them to the new resolution algorithm. Getting the logistics in place for this is a massive pain, but we’ve got an approach now which should be sufficient. Clients will be free to smooth over the transition in the UI as gracefully as possible (and in fact Riot has this already hooked up).  So: watch this space as v2 rooms with all-new state resolution in the coming weeks!

Otherwise, there are a bunch of other issues in the S2S API which we are still working through (e.g. changing event IDs to be hashes of event contents to avoid lying about IDs, switching to use normal X.509 certificates for federation and so resolve problems with Perspectives, etc).  

Once these land and are implemented in Synapse over the coming weeks, we will be able to finally declare a 1.0!

Also on the spec side of things, it’s worth noting that a lot of effort went into improving performance for clients in the form of the Lazy Loading Members MSC.  This ended up consuming a lot of time over the summer as we updated Synapse and the various matrix-*-sdks (and thus Riot) to only calculate and send details to the clients about members who are currently talking in a room, whereas previously we sent the entire state of the room to the client (even including users who had left). The end result however is a 3-5x reduction in RAM on Riot, and a 3-5x speedup on initial sync.  The current MSC is currently being merged as we speak into the main spec (thanks Kitsune!) for inclusion in upcoming CS API 0.5.

The Matrix.org Foundation (CIC!)

Alongside getting the open spec process up and running, we’ve been establishing The Matrix.org Foundation as an independent non-profit legal entity responsible for neutrally safeguarding the Matrix spec and evolution of the protocol.  This kicked off in June with the “Towards Open Governance” blog post, and continued with the formal incorporation of The Matrix.org Foundation in October.  Since then, we’ve spent a lot of time with the non-profit lawyers evolving MSC1318 into the final Articles of Association (and other guidelines) for the Foundation.  This work is basically solved now; it just needs MSC1318 to be updated with the conclusions (which we’re running late on, but is top of Matthew’s MSC todo list).

In other news, we have confirmation that the Community Interest Company (CIC) status for The Matrix.org Foundation has been approved – this means that the CIC Regulator has independently reviewed the initial Articles of Association and approved that they indeed lock the mission of the Foundation to be non-profit and to act solely for the good of the wider online community.  In practice, this means that the Foundation will be formally renamed The Matrix.org Foundation C.I.C, and provides a useful independent safeguard to ensure the Foundation remains on track.

The remaining work on the Foundation is:

  • Update and land MSC1318, particularly on clarifying the relationship between the legal Guardians (Directors) of the Foundation and the technical members of the core spec team, and how funds of the Foundation will be handled.
  • Update the Articles of Association of the Foundation based on the end result of MSC1318
  • Transfer any Matrix.org assets over from New Vector to the Foundation.  Given Matrix’s code is all open source, there isn’t much in the way of assets – we’re basically talking about the matrix.org domain and website itself.
  • Introduce the final Guardians for the Foundation.

We’ll keep you posted with progress as this lands over the coming months.

Riot

2018 has been a bit of a chrysalis year for Riot: the vast majority of work has been going into the massive redesign we started in May to improve usability & cosmetics, performance, stability, and E2E encryption usability improvements.  We’ve consciously spent most of the year feature frozen in order to polish what we already have, as we’ve certainly been guilty in the past of landing way too many features without necessarily applying the needed amount of UX polish.

However, as of today, we’re super-excited to announce that Riot’s redesign is at the point where the intrepid can start experimenting with it – in fact, internally most of the team has switched over to dogfooding (testing) the redesign as of a week or so ago.  Just shut down your current copy of Riot/Web or Desktop and go to https://riot.im/experimental instead if you want to experiment (we don’t recommend running both at the same time).  Please note that it is still work-in-progress and there’s a lot of polish still to land and some cosmetic bugs still hanging around, but it’s definitely at the point of feeling better than the old app.  Most importantly, please provide feedback (by hitting the lifesaver-ring button at the bottom left) to let us know how you get on. See the Riot blog for more details!

Meanwhile, on the performance and stability side of things – Lazy Loading (see above) made a massive difference to performance on all platforms; shrinking RAM usage by 3-5x and similarly speeding up launch and initial sync times.  Ironically, this ended up pushing back the redesign work, but hopefully the performance improvements will have been noticeable in the interim.  We also switched the entire rich text composer from using Facebook’s Draft.js library to instead use Slate.js (which has generally been a massive improvement for stability and maintainability, although took *ages* to land – huge thanks to t3chguy for getting it over the line). Meanwhile Travis has been blitzing through a massive list of key “First Impression” bugs to ensure that as many of Riot’s most glaring usability gotchas are solved.

We also welcomed ever-popular Stickers to the fold – the first instance of per-account rather than per-room widgets, which ended up requiring a lot of new infrastructure in both Riot and the underlying integration manager responsible for hosting the widgets.  But judging by how popular they are, the effort seems to be worth it – and paves the way for much more exciting interactive widgets and integrations in future!

An unexpectedly large detour/distraction came in the form of GDPR back in May – we spent a month or so running around ensuring that both Riot and Matrix are GDPR compliant (including the necessary legal legwork to establish how GDPR even applies to a decentralised technology like Matrix).  If you missed all that fun, you can read about it here.  Hopefully we won’t have to do anything like that again any time soon…

And finally: on the mobile side, much of the team has been distracted helping out France with their Matrix deployment.  However, we’ve been plugging away on Riot/Mobile, keeping pace with the development on Riot/Web – but most excitingly, we’ve also found time to experiment with a complete rewrite of Riot/Android in Kotlin, using Realm and Rx (currently nicknamed Riot X).  The rewrite was originally intended as a test-jig for experimenting with the redesign on mobile, but it’s increasingly becoming a fully fledged Matrix client… which launches and syncs over 5x faster than today’s Riot/Android.  If you’re particularly intrepid you should be able to run the app by checking out the project in Android Studio and hitting ‘run’. We expect the rewrite to land properly in the coming months – watch this space for progress!

E2E Encryption

One of the biggest projects this year has been to get E2E encryption out of beta and turned on by default.  Now, whilst the encryption itself in Matrix has been cryptographically robust since 2016 – its usability has been minimal at best, and we’d been running around polishing the underlying implementation rather than addressing the UX.  However, this year that changed, as we opened a war on about 6 concurrent battlefronts to address the remaining issues. These are:

  • Holistic UX.  Designing a coherent, design-led UX across all of the encryption and key-management functionality.  Nad (who joined Matrix as a fulltime Lead UI/UX designer in August) has been leading the charge on this – you can see a preview of the end result here.  Meanwhile, Dave and Ryan are working through implementing it as we speak.
  • Decryption failures (UISIs).  Whenever something goes wrong with E2E encryption, the symptoms are generally the same: you find yourself unable to decrypt other people’s messages.  We’ve been plugging away chasing these down – for instance, switching from localStorage to IndexedDB in Riot/Web for storing encryption state (to make it harder for multiple tabs to collide and corrupt your encryption state); providing mechanisms to unwedge Olm sessions which have got corrupted (e.g. by restoring from backup); and many others.  We also added full telemetry to track the number of UISIs people are seeing in practice – and the good news is that over the course of the year their occurrence has been steadily reducing.  The bad news is that there are still some edge cases left: so please, whenever you fail to decrypt a message, please make sure you submit a bug report and debug logs from *both* the sender and receiving device.  The end is definitely in sight on these, however, and good news is other battlefronts will also help mitigate UISIs.
  • Incremental Key Backup.  Previously, if you only used one device (e.g. your phone) and you lost that phone, you would lose all your E2E history unless you were in the habit of explicitly manually backing up your keys.  Nowadays, we have the ability to optionally let users encrypt their keys with a passphrase (or recovery key) and constantly upload them to the server for safe keeping.  This was a significant chunk of work, but has actually landed already in Riot/Web and Riot/iOS, but is hidden behind a “Labs” feature flag in Settings whilst we test it and sort out the final UX.
  • Cross-signing Keys. Previously, the only way to check whether you were talking to a legitimate user or an imposter was to independently compare the fingerprints of the keys of the device they claim to be using.  In the near future, we will let users prove that they own their devices by signing them with a per-user identity key, so you only have to do this check once. We’ve actually already implemented one iteration of cross-signing, but this allowed arbitrary devices for a given user to attest each other (which creates a directed graph of attestations, and associated problems with revocations), hence switching to a simpler approach.
  • Improved Verification. Verifying keys by manually comparing elliptic key fingerprints is incredibly cumbersome and tedious.  Instead, we have proposals for using Short Authentication String comparisons and QR-code scanning to simplify the process.  uhoreg is currently implementing these as we speak :)
  • Search.  Solving encrypted search is Hard, but t3chguy did a lot of work earlier in the year to build out matrix-search: essentially a js-sdk bot which you can host, hand your keys to, and will archive your history using a golang full-text search engine (bleve) and expose a search interface that replaces Synapse’s default one.  Of all the battlefronts this one is progressing the least right now, but the hope is to integrate it into Riot/Desktop or other clients so that folks who want to index all their conversations have the means to do so.  On the plus side, all the necessary building blocks are available thanks to t3chguy’s hacking.

So, TL;DR: E2E is hard, but the end is in sight thanks to a lot of blood, sweat and tears.  It’s possible that we may have opened up too many battlefronts in finishing it off rather than landing stuff gradually.  But it should be transformative when it all lands – and we’ll finally be able to turn it on by default for private conversations.  Again, we’re aiming to pull this together by the end of January.

Separately, we’ve been keeping a close eye on MLS – the IETF’s activity around standardising scalable group E2E encryption.  MLS has a lot of potential and could provide algorithmic improvements over Olm & Megolm (whist paving the way for interop with other MLS-encrypted comms systems).  But it’s also quite complicated, and is at risk of designing out support of decentralised environments. For now, we’re obviously focusing on ensuring that Matrix is rock solid with Olm & Megolm, but once we hit that 1.0 we’ll certainly be experimenting a bit with MLS too.

Homeservers

The story of the Synapse team in 2018 has been one of alternating between solving scaling and performance issues to support the ever-growing network (especially the massive matrix.org server)… and dealing with S2S API issues; both in terms of fixing the design of State Resolution, Room Versioning etc (see the Spec section above) and doing stop-gap fixes to the current implementation.

Focusing on the performance side of things, the main wins have been:

  • Splitting yet more functionality out into worker processes which can scale independently of the master Synapse process.
  • Yet more profiling and optimisation (particularly caching).  Between this and the worker split-out we were able to resolve the performance ceiling that we hit over the summer, and as of right now matrix.org feels relatively snappy.
  • Lazy Loading Members.
  • Python 3.  As everyone should have seen by now, Synapse is now Python 3 by default as of 0.34, which provides significant RAM and CPU improvements across the board as well as a path forwards given Python 2’s planned demise at the end of 2019.  If you’re not running your Synapse on Python 3 today, you are officially doing it wrong.

There are also some major improvements which haven’t fully landed yet:

  • State compression.  We have a new algorithm for storing room state which is ~10x more efficient than the current one.  We’ll be migrating to it in by default in future. If you’re feeling particularly intrepid you can actually manually use it today (but we don’t recommend it yet).
  • Incremental state resolution.  Before we got sucked into redesigning state resolution in general, Erik came up with a proof that it’s possible to memoize state resolution and incrementally calculate it whenever state is resolved in a room rather than recalculate it from scratch each time (as is the current implementation).  This would be a significant performance improvement, given non-incremental state res can consume a lot of CPU (making everything slow down when there are lots of room extremities to resolve), and can consume a lot of RAM and has been one of the reasons that synapse’s RAM usage can sometimes spike badly. The good news is that the new state res algorithm was designed to also work in this manner.  The bad news is that we haven’t yet got back to implementing it yet, given the new algorithm is only just being rolled out now.
  • Chunks.  Currently, Synapse models all events in a room as being part of a single DAG, which can be problematic if you can see lots of disconnected sections of the DAG (e.g. due to intermittent connectivity somewhere in the network), as Synapse will frantically try to resolve all these disconnected sections of DAG together.  Instead, a better solution is to explicitly model these sections of DAG as separate entities called Chunks, and not try to resolve state between disconnected Chunks but instead consider them independent fragments of the room – and thus simplify state resolution calculations significantly. It also fixes an S2S API design flaw where previously we trusted the server to state the ordering (depth) of events they provided; with chunks, the receiving server can keep track of that itself by tracking a DAG of chunks (as well as the normal event DAG within the chunks).  Now, most of the work for this happened already, but is currently parked until new state res has landed.

Meanwhile, over on Dendrite, we made the conscious decision to get 1.0 out the door on Synapse first rather than trying to implement and iterate on the stuff needed for 1.0 on both Synapse & Dendrite simultaneously.  However, Dendrite has been ticking along thanks to work from Brendan, Anoa and APWhitehat – and the plan is to use it for more niche homeserver work at first; e.g. constrained resource devices (Dendrite uses 5-10x less RAM than Synapse on Py3), clientside homeservers, experimental routing deployments, etc.  In the longer term we expect it to grow into a fully fledged HS though!

Bridging

2018 was a bit of a renaissance for Bridging, largely thanks to Half-Shot coming on board in the Summer to work on IRC bridging and finally get to the bottom of the stability issues which plagued Freenode for the previous, uh, few years.  Meanwhile the Slack bridge got its first ever release – and more recently there’s some Really Exciting Stuff happening with matrix-appservice-purple; a properly usable bridge through to any protocol that libpurple can speak… and as of a few days ago also supports *native* XMPP bridging via XMPP.js.  There’ll probably be a dedicated blogpost about all of this in the new year, especially when we deploy it all on Matrix.org. Until then, the best bet is to learn more is to watch last week’s Matrix Live and hear it all first hand.

Modular

One of the biggest newcomers of 2018 was the launch of Modular.im in October – the world’s first commercial Matrix hosting service.  Whilst (like Riot), Modular isn’t strictly-speaking a Matrix.org project – it feels appropriate to mention it here, not least because it’s helping directly fund the core Matrix dev team.

So far Modular has seen a lot of interest from folks who want to spin up a super-speedy dedicated homeserver run by us rather than having to spend the time to build one themselves – or folks who have yet to migrate from IRC and want a better-than-IRC experience which still bridges well.  One of the nice bits is that the servers are still decentralised and completely operationally independent of one another, and there have been a bunch of really nice refinements since launch, including the ability to point your own DNS at the server; matrix->matrix migration tools; with custom branding and other goodness coming soon.  If you want one-click Matrix hosting, please give Modular a go :)

Right now we’re promoting Modular mainly to existing Matrix users, but once the Riot redesign is finished you should expect to see some very familiar names popping up on the platform :D

TWIM

Unless you were living under a rock, you’ll hopefully have also realised that 2018 was the year that brought us This Week In Matrix (TWIM) – our very own blog tracking all the action across the whole Matrix community on a weekly basis.  Thank you to everyone who contributes updates, and to Ben for editing it each week. Go flip through the archives to find out what’s been going on in the wider community over the course of the year!  (This blog post is already way too long without trying to cover the rest of the ecosystem too :)

Shapes of Things to Come

Finally, a little Easter egg (it is Christmas, after all) for anyone crazy enough to have read this far: The eagle-eyed amongst you might have noticed that one of our accepted talks for FOSDEM 2019 is “Breaking the 100bps barrier with Matrix” in the Real Time Communications devroom.  We don’t want to spoil the full surprise, but here’s a quick preview of some of the more exotic skunkworks we’ve been doing on low-bandwidth routing and transports recently.  Right now it shamelessly assumes that you’re running within a trusted network, but once we solve that it will of course be be proposed as an MSC for Matrix proper.  A full write-up and code will follow, but for now, here’s a mysterious video…

(If you’re interested in running Matrix over low-bandwidth networks, please get in touch – we’d love to hear from you…)

2019

So, what will 2019 bring?

In the short term, as should be obvious from the above, our focus is on:

  • r0 spec releases across the board (aka Matrix 1.0)
  • Implementing them in Synapse
  • Landing the Riot redesign
  • Landing all the new E2E encryption UX and features
  • Finalising the Matrix.org Foundation

However, beyond that, there’s a lot of possible options on the table in the medium term:

  • Reworking and improving Communities/Groups.
  • Reactions.
  • E2E-encrypted Search
  • Filtering. (empowering users to filter out rooms & content they’re not interested in).
  • Extensible events.
  • Editable messages.
  • Extensible Profiles (we’ve actually been experimenting with this already).
  • Threading.
  • Landing the Riot/Android rewrite
  • Scaling synapse via sharding the master process
  • Bridge UI for discovery of users/rooms and bridge status
  • Considering whether to do a similar overhaul of Riot/iOS
  • Bandwidth-efficient transports
  • Bandwidth-efficient routing
  • Getting Dendrite to production.
  • Inline widgets (polls etc)
  • Improving VoIP over Matrix.
  • Adding more bridges, and improving the current ones..
  • Account portability
  • Replacing MXIDs with public keys

In the longer term, options include:

  • Shared-code cross-platform client SDKs (e.g. sharing a native core library between matrix-{js,ios,android}-sdk)
  • Matrix daemons (e.g. running an always-on client as a background process in your OS which apps can connect to via a lightweight CS API)
  • Push notifications via Matrix (using a daemon-style architecture)
  • Clientside homeservers (i.e. p2p matrix) – e.g. compiling Dendrite to WASM and running it in a service worker.
  • Experimenting with MLS for E2E Encryption
  • Storing and querying more generic data structures in Matrix (e.g. object trees; scene graphs)
  • Alternate use cases for VR, IoT, etc.

Obviously we’re not remotely going to do all of that in 2019, but this serves to give a taste of the possibilities on the menu post-1.0; we’ll endeavour to publish a more solid roadmap when we get to that point.

And on that note, it’s time to call this blogpost to a close. Thanks to anyone who read this far, and thank you, as always, for flying Matrix and continuing to support the project.  The next few months should be particularly fun; all the preparation of 2018 will finally pay off :)

Happy holidays,

Matthew, Amandine & the whole Matrix.org team.