Breaking the 100bps barrier with Matrix, meshsim & coap-proxy

Hi all,

Last month at FOSDEM 2019 we gave a talk about a new experimental ultra-low-bandwidth transport for Matrix which swaps our baseline HTTPS+JSON transport for a custom one built on CoAP+CBOR+Noise+Flate+UDP.  (CoAP is the RPC protocol; CBOR is the encoding; Noise powers the transport layer encryption; Flate compresses everything uses predefined compression maps).

The challenge here was to see if we could demonstrate Matrix working usably over networks running at around 100 bits per second of throughput (where it’d take 2 minutes to send a typical 1500 byte ethernet packet!!) and very high latencies.  You can see the original FOSDEM talk below, or check out the slides here.

Now, it’s taken us a little while to find time to tidy up the stuff we demo’d in the talk to be (relatively) suitable for public consumption, but we’re happy to finally release the four projects which powered the demo:

In order to get up and running, the meshsim README has all the details.

It’s important to understand that this is very much a proof of concept, and shouldn’t be used in production yet, and almost certainly has some glaring bugs.  In fact, it currently assumes you are running on a trusted private network rather than the public Matrix network in order to get away with some of the bandwidth optimisations performed – see coap-proxy’s Limitations section for details.  Particularly, please note that the encryption is homemade and not audited or fully reviewed or tested yet.  Also, we’ve released the code for the low-bandwidth transport, but we haven’t released the “fan-out routing” implementation for Synapse as it needs a rethink to be applicable to the public Matrix network.  You’ll also want to run Riot/Web in low-bandwidth mode if you really wind down the bandwidth (suppressing avatars, read receipts, typing notifs and presence to avoid wasting precious bandwidth).

We also don’t have an MSC for the CoAP-based transport yet, mainly due to lack of time whilst wanting to ensure the limitations are addressed first before we propose it as a formal alternative Matrix transport.  (We also first need to define negotiation mechanisms for entirely alternative CS & SS transports!).  However, the quick overview is:

  • JSON is converted directly into CBOR (with a few substitutions made to shrink common patterns down)
  • HTTP is converted directly into CoAP (mapping the verbose API endpoints down to single-byte endpoints)
  • TLS is swapped out for Noise Pipes (XX + IK noise handshakes).  This gives us 1RTT setup (XX) for the first connection to a host, and 0RTT (IK) for all subsequent connections, and provides trust-on-first-use semantics when connecting to a server.  You can see the Noise state machine we maintain in go-coap’s noise.go.
  • The CoAP headers are hoisted up above the Noise payload, letting us use them for framing the noise pipes without having duplicated framing headers at the CoAP & Noise layers.  We also frame the Noise handshake packets as CoAP with custom message types (250, 251 and 252).  We might be better off using OSCORE for this, however, rather than hand-wrapping a custom encrypted transport…
  • The CoAP payload is compressed via Flate using preshared compression tables derived from compressing large chunks of representative Matrix traffic. This could be significantly improved in future with streaming compression and dynamic tables (albeit seeded from a common set of tables).

The end result is that you end up taking about 90 bytes (including ethernet headers!) to send a typical Matrix message (and about 70 bytes to receive the acknowledgement).  This breaks down as as:

  • 14 bytes of Ethernet headers
  • 20 bytes of IP headers
  • 8 bytes of UDP headers
  • 16 bytes of Noise AEAD
  • 6 bytes of CoAP headers
  • ~26 bytes of compressed and encrypted CBOR

The Noise handshake on connection setup would take an additional 128 bytes (4x 32 byte Curve25519 DH values), either spread over 1RTT for initial setup or 0RTT for subsequent setups.

At 100bps, 90 bytes takes 90*8/100 = 7.2s to send… which is just about usable in an extreme life and death situation where you can only get 100bps of connectivity (e.g. someone at the bottom of a ravine trying to trickle data over one bar of GPRS to the emergency services).  In practice, on a custom network, you could ditch the Ethernet and UDP/IP headers if on a point-to-point link for CS API, and ditch the encryption if the network physical layer was trusted – at which point we’re talking ~32 bytes per request (2.5s to send at 100bps).  Then, there’s still a whole wave of additional work that could be investigated, including…

  • Smarter streaming compression (so that if a user says ‘Hello?’ three times in a row, the 2nd and 3rd messages are just references to the first pattern)
  • Hoisting Matrix transaction IDs up to the CoAP layer (reusing the CoAP msgId+token rather than passing around new Matrix transaction IDs, at the expense of requiring one Matrix txn per request)
  • Switching to CoAP OBSERVE for receiving data from the server (currently we long-poll /sync to receive data)
  • Switching access_tokens for PSKs or similar

…all of which could shrink the payload down even further.  That said, even in its current state, it’s a massive improvement – roughly ~65x better than the equivalent HTTPS+JSON traffic.

In practice, further work on low-bandwidth Matrix is dependent on finding a sponsor who’s willing to fund the team to focus on this, as otherwise it’s hard to justify spending time here in addition to all the less exotic business-as-usual Matrix work that we need to keep the core of Matrix evolving (finishing 1.0, finishing E2E encryption, speeding up Synapse, finishing Dendrite, rewriting Riot/Android etc).  However, the benefits here should be pretty obvious: massively reduced bandwidth and battery-life; resilience to catastrophic network conditions; faster sync times; and even a protocol suitable for push notifications (Matrix as e2e encrypted, decentralised, push!).  If you’re interested in supporting this work, please contact support at matrix.org.

Bridging Matrix with WhatsApp running on a VM

This guide will live with the documentation at https://matrix.org/docs/guides/whatsapp-bridging-mautrix-whatsapp, but you can find the text below.


Matrix is:

an open standard for interoperable, decentralised, real-time communication.

In this article we’ll benefit from all three of these attributes:

  • interoperable: we’ll see how Matrix can be made to interact with WhatsApp
  • decentralised: you can perform this on your own server while still enjoying the benefits of being connected to the rest of the Matrix federation
  • real-time communication: we’ll see how to send and receive messages in real-time

Install your homeserver and install mautrix-whatsapp, the WhatsApp bridge

Firstly, you need to have a Matrix homeserver installed. If you don’t currently have one, take a look at the instructions at Installing Synapse, and also in the Synapse README.

Next, install mautrix-whatsapp by following the instructions at mautrix-whatsapp/wiki.

If you are starting from scratch, I suggest you take a look at matrix-docker-ansible-deploy, as this project will enable you to deploy Synapse, mautrix-whatsapp and other components easily.

For example, if you have an existing deployment using matrix-docker-ansible-deploy, you can add mautrix-whatsapp simply by adding the following line to your configuration file:

matrix_mautrix_whatsapp_enabled: true

… and re-running the setup:

ansible-playbook -i inventory/hosts setup.yml --tags=setup-all

Either way, you will soon have a functioning Matrix Synapse homeserver and mautrix-whatsapp installed with it. Next, we will set up an Android VM.

Set up an Android VM

The best way to run an Android Virtual Machine is to use the Android Studio tools from Google. First, install Android Studio, making sure to follow the post-install steps, as they will install additional tools we need, including AVD Manager.

Once installed, run AVD manager by choosing Tools -> AVD Manager from the menu.

Follow the steps to create a new virtual machine, in this example I have a Nexus 5X running Android 9, but almost any configuration is fine here. Make sure that you give the device access to the Play Store.

Install WhatsApp and sign-in

Launch the Virtual Device, the open the Play Store and sign in. Now use the Play Store to install WhatsApp on the Virtual Device.

You will be asked to verify your phone number, use your number on another device to complete this step.


Setup mautrix-whatsapp bridge

Now that you have WhatsApp working in a VM, and Matrix working on your server, it’s time to bridge them togther!

Per the instructions at mautrix-whatsapp/wiki, you must start a new chat with @whatsappbot:<yourdomain>. Type login to begin the authentication process.

mautrix-whatsapp operates by using the WhatsApp Web feature of WhatsApp – which means it uses a QR code that you must now scan on the device running WhatsApp – which in your case is the AVD. In order to scan the presented QR code, set your AVD camera to passthrough the camera device on your host machine – see the images below.


Once this is complete, you can type sync, to start bridging contacts, and sync --create to automatically create room invites.

And that’s it! You may need to take a little time to watch the sync happen, particularly if you have a very large number of chats on the WhatsApp side, but there is no further configuration needed.

Demo!

 

 

Olm 3.0.0 released!

Olm 3.0.0 has been released, which features several big changes. It can be downloaded from https://git.matrix.org/git/olm/. The npm package for JavaScript can be downloaded from https://matrix.org/packages/npm/olm/olm-3.0.0.tgz

Python

The biggest change is the merge of poljar’s improved Python bindings. These bindings should be much easier to use for Python programmers, and are used by Zil0’s E2E support in the Matrix Python SDK.

Since the binding API has changed, existing Python code will need to be rewritten in order to work with this release.

poljar has also included comprehensive documentation for the new API.

CMake

mujx contributed support for building olm using CMake. This should allow for easier building on different platforms. Currently the library can be built using either make or CMake. In the future, make support may be removed.

JavaScript

The JavaScript bindings now use WebAssembly by default. WebAssembly is much faster than the previous asm.js build, and is supported by recent versions of the most popular browsers. For compatibility with browsers that do not support WebAssembly, the asm.js version is still provided.

Due to adding support for WebAssembly, the API had to be changed slightly.
There is now an init function that must be called before the library can be used. This function will return a promise that will resolve once the library is ready to be used. The matrix-js-sdk has not yet been updated to do this, so users of matrix-js-sdk should continue using olm 2.x until it has been updated.

Key backups

The public key API has been updated to support the proposal for server-side key backups. More details on how to use these functions will be published in the future.

Usage of the matrix-js-sdk

We have a brand new, exciting guide page offering an introduction to matrix-js-sdk. This guide will live with the documentation at https://matrix.org/docs/guides/usage-of-the-matrix-js-sdk,  but you can find the text below.


Matrix allows open real-time communications over the Internet using HTTP and JSON. This makes developing clients to connect to Matrix servers really easy! Because it’s open, and uses simple syntax for messages, you can connect Matrix to anything that communicates over a standard HTTP interface – later projects in this series will explore ideas such as building bots, performing machine learning on message content, and connecting IoT devices such as Philips Hue lights.

Making a Matrix Client

Let’s explore how we would make a very simple Matrix client, with only the ability to perform an initial sync, and to get member lists and the timeline for rooms of our choice.

This article will explore the Matrix Client-Server API, making use of the matrix-js-sdk. Later articles may discuss making the underlying calls. Specifically we will cover:

  • login
  • simple syncing
  • listening for timeline events
  • access the MatrixInMemoryStore
  • sending messages to rooms
  • how to respond to specific messages, as a bot would

We’ll use Node.js as our environment, though the matrix-js-sdk can also be used directly in the browser.

Before we start, make sure you have Node.js and NPM installed: follow instructions at nodejs.org for your platform. Then create a new directory to work in:

mkdir my-first-matrix-client
cd my-first-matrix-client

Let’s write JavaScript

Once you’re ready, the first thing to do is install the matrix-js-sdk from NPM:

npm install matrix-js-sdk

We include the SDK in our source exactly as expected:

import sdk from 'matrix-js-sdk';

Login with an access token

Instantiate a new client object and use an access token to login:

const client = sdk.createClient({
    baseUrl: "https://matrix.org",
    accessToken: "....MDAxM2lkZW50aWZpZXIga2V5CjAwMTBjaWQgZ2Vu....",
    userId: "@USERID:matrix.org"
});
// note that we use the full MXID for the userId value

(jsdoc for createClient)

If you are logged into Riot, you can find an access token for the logged-in user on the Settings page.

If the homeserver you’re logging in to supports logging in with a password, you can also retrieve an access token programmatically using the API. To do this, create a new client with no authentication parameters, then call client.login() with "m.login.password":

const client = sdk.createClient("https://matrix.org");
client.login("m.login.password", {"user": "USERID", "password": "hunter2"}).then((response) => {
    console.log(response.access_token);
});

In any case, once logged in either with a password or an access token, the client can get the current access token via:

console.log(client.getAccessToken());

Note: it is essential to keep this access token safe, as it allows complete access to your Matrix account!

Sync and Listen

Next we start the client, this sets up the connection to the server and allows us to begin syncing:

client.startClient();

Perform a first sync, and listen for the response, to get the latest state from the homeserver:

client.once('sync', function(state, prevState, res) {
    console.log(state); // state will be 'PREPARED' when the client is ready to use
});

Once the sync is complete, we can add listeners for events. We could listen to ALL incoming events, but that would be a lot of traffic, and too much for our simple example. If you want to listen to all events, you can add a listen as follows:

client.on("event", function(event){
    console.log(event.getType());
    console.log(event);
})

Instead, let’s just listen to events happening on the timeline of rooms for which our user is a member:

client.on("Room.timeline", function(event, room, toStartOfTimeline) {
    console.log(event.event);
});

Access the Store

When we created a new client with sdk.createClient(), an instance of the default store, MatrixInMemoryStore was created and enabled. When we sync, or instruct otherwise our client to fetch data, the data is automatically added to the store.

To access the store, we use accessor methods. For example, to get a list of rooms in which our user is joined:

// client.client.getRooms() returns an array of room objects
var rooms = client.getRooms();
rooms.forEach(room => {
    console.log(room.roomId);
});

(jsdoc for client.getRooms)

More usefully, we could get a list of members for each of these rooms:

var rooms = client.getRooms();
rooms.forEach(room => {
    var members = room.getJoinedMembers();
    members.forEach(member => {
        console.log(member.name);
    });
});

For each room, we can inspect the timeline in the store:

var rooms = client.getRooms();
rooms.forEach(room => {
    room.timeline.forEach(t => {
        console.log(JSON.stringify(t.event.content));
    });
});

Send messages to rooms

To send a message, we create a content object, and specify a room to send to. In this case, I’ve taken the room ID of #test:matrix.org, and used it as an example:

var testRoomId = "!jhpZBTbckszblMYjMK:matrix.org";
 
var content = {
    "body": "Hello World",
    "msgtype": "m.text"
};
 
client.sendEvent(testRoomId, "m.room.message", content, "").then((res) => {
   // message sent successfully
}).catch((err) => {
    console.log(err);
}

(jsdoc for client.sendEvent)

Knowing this, we can put together message listening and message sending, to build a bot which just echos back any message starting with a “!”:

var testRoomId = "!jhpZBTbckszblMYjMK:matrix.org";
 
client.on("Room.timeline", function(event, room, toStartOfTimeline) {
    // we know we only want to respond to messages
    if (event.getType() !== "m.room.message") {
        return;
    }
 
    // we are only intested in messages from the test room, which start with "!"
    if (event.getRoomId() === testRoomId && event.getContent().body[0] === '!') {
        sendNotice(event.event.content.body);
    }
});
 
function sendNotice(body) {
    var content = {
        "body": body.substring(1),
        "msgtype": "m.notice"
    };
    client.sendEvent(testRoomId, "m.room.message", content, "", (err, res) => {
        console.log(err);
    });
}

Take a look at the msgtype in the object above. In the previous example, we used “m.text” for this field, but now we’re using “m.notice”. Bots will often use “m.notice” to differentiate their messages. This allows the client to render notices differently, for example Riot, the most popular client, renders notices with a more pale text colour.

Further

There is much, much more to Matrix, the Client-Server API and the matrix-js-sdk, but this guide should give some understanding of simple usage. In subsequent guides we’ll cover more detail and also explore projects you can build on top, such as IoT controls and chatbot interfaces. For now you can take a look at other examples in the matrix-js-sdk itself, and also the Matrix Client-Server API which it implements.

Pre-disclosure: Upcoming critical security fix for Synapse

Hi all,

During the ongoing work to finalise a stable release of Matrix’s Server-Server federation API, we’ve been doing a full audit of Synapse’s implementation and have identified a serious vulnerability which we are going to release a security update to address (Synapse 0.33.3.1) on Thursday Sept 6th 2018 at 12:00 UTC.

We are coordinating with package maintainers to ensure that patched versions of packages will be available at that time – meanwhile, if you run your own Synapse, please be prepared to upgrade as soon as the patched versions are released.  All previous versions of Synapse are affected, so everyone will want to upgrade.

Thank you for your time, patience and understanding while we resolve the issue,

signed_predisclosure.txt

Matrix Spec Update August 2018

Introducing Client Server API 0.4, and the first ever stable IS, AS and Push APIs spec releases!

Hi folks,

As many know, we’ve been on a massive sprint to improve the spec – both fixing omissions where features have been implemented in the reference servers but were never formalised in the spec, and fixing bugs where the spec has thinkos which stop us from being able to ratify it as stable and thus fit for purpose .

In practice, our target has been to cut stable releases of all the primary Matrix APIs by the end of August – effectively declaring Matrix out of beta, at least at the specification level.  For context: historically only one API has ever been released as stable – the Client Server API, which was the result of a similar sprint back in Jan 2016. This means that the Server Server (SS) API, Identity Service (IS) API, Application Service (AS) API and Push Gateway API have never had an official stable release – which has obviously been problematic for those implementing them.

However, as of the end of Friday Aug 31, we’re proud to announce the first ever stable releases of the IS, AS and Push APIs!


To the best of our knowledge, these API specs are now complete and accurately describe all the current behaviour implemented in the reference implementations (sydent, synapse and sygnal) and are fit for purpose. Any deviation from the spec in the reference implementations should probably be considered a bug in the impl. All changes take the form of filling in spec omissions and adding clarifications to the existing behaviour in order to get things to the point that an independent party can implement these APIs without having to refer to anything other than the spec.

This is the result of a lot of work which spans the whole Spec Core Team, but has been particularly driven by TravisR, who has taken the lead on this whole mission to improve the spec.  Huge thanks are due to Travis for his work here, and also massive thanks to everyone who has suffered endured reviewed his PRs and contributed to the releases.  The spec is looking unrecognisably better for it – and Matrix 1.0 is feeling closer than ever!

Alongside the work on the IS/AS/Push APIs, there has also been a massive attempt to plug all the spec omissions in the Client Server API.  Historically the CS API releases have missed some of the newer APIs (and of course always miss the ones which postdate a given release), but we’ve released the APIs which /have/ been specified as stable in order to declare them stable.  However, in this release we’ve tried to go through and fill in as many remaining gaps as possible.

The result is the release of Client Server API version 0.4. This is a huge update – increasing the size of the CS API by ~40%. The biggest new stuff includes fully formalising support for end-to-end encryption (thanks to Zil0!), versioning for rooms (so we can upgrade rooms to new versions of the protocol), synchronised read markers, user directories, server ACLs, MSISDN 3rd party ids, and .well-known server discovery (not that it’s widely used yet), but for the full picture, best bet is to look at the changelog (now managed by towncrier!).  It’s probably fair to say that the CS API is growing alarmingly large at this point – Chrome says that it’d be 223 A4 pages if printed. Our solution to this will be to refactor it somehow (and perhaps switch to a more compact representation of the contents).

Some things got deliberately missed from the CS 0.4 release: particularly membership Lazy Loading (because we’re still testing it out and haven’t released it properly in the wild yet), the various GDPR-specific APIs (because they may evolve a bit as we refine them since the original launch), finalising ID grammars in the overall spec (because this is surprisingly hard and subtle and we don’t want to rush it) and finally Communities (aka Groups), as they are still somewhat in flux.

Meanwhile, on the Server to Server API, there has also been a massive amount of work.  Since the beginning of July it’s tripled in size as we’ve filled in the gaps, over the course of >200 commits (>150 of which from Travis).  If you take a look at the current snapshot it’s pretty unrecognisable from the historical draft; with the main changes being:

  • Adding the new State Resolution algorithm to address flaws in the original one.  This has been where much of our time has gone – see MSC1442 for full details.  Adopting the new algorithm requires rooms to be recreated; we’ll write more about this in the near future when we actually roll it out.
  • Adding room versioning so we can upgrade to the new State Resolution algorithm.
  • Everything is now properly expressed as Swagger (OpenAPI), just like the CS API
  • Adding all the details for E2E encryption (including dependencies like to-device messaging and device-list synchronisation)
  • Improvements in specifying how to authorize inbound events over federation
  • Document federation APIs such as /event_auth and /query_auth and /get_missing_events
  • Document 3rd party invites over federation
  • Document the /user/* federation endpoints
  • Document Server ACLs
  • Document read receipts over federation
  • Document presence over federation
  • Document typing notifications over federation
  • Document content repository over federation
  • Document room directory over federation
  • …and many many other minor bug fixes, omission fixes, and restructuring for coherency – see https://github.com/matrix-org/matrix-doc/issues/1464 for an even longer list :)

However, we haven’t finished it all: despite our best efforts we’re running slightly past the original target of Aug 31.  The current state of play for the r0 release overall (in terms of pending issues) is:…and you can see the full breakdown over at the public Github project dashboard.

The main stuff we still have remaining on the Server/Server API at this point is:

  • Better specifying how we validate inbound events. See MSC1646 for details & progress.
  • Switching event IDs to be hashes. See MSC1640 for details and progress.
  • Various other remaining security considerations (e.g. how to handle malicious auth events in the DAG; how to better handle DoS situations).
  • Merging in the changes to authoring m.room.power_levels (as per MSC1304)
  • Formally specifying the remaining identifiers which lack a formal grammar – MSC1597 and particularly room aliases (MSC1608)

The plan here is to continue speccing and implementing these at top priority (with Travis continuing to work fulltime on spec work), and we’ll obviously keep you up-to-date on progress.  Some of the changes here (e.g. event IDs) are quite major and we definitely want to implement them before speccing them, so we’re just going to have to keep going as fast as we can. Needless to say we want to cut an r0 of the S2S API alongside the others asap and declare Matrix out of beta (at least at the spec level :)

In terms of visualising progress on this spec mission it’s interesting to look at the rate at which we’ve been closing PRs: this graph shows the total number of PRs which are in state ‘open’ or ‘closed’ on any given day:

…which clearly shows the original sprint to get the r0 of the CS API out the door at the end 2015, and then a more leisurely pace until the beginning of July 2018 since which the pace has picked up massively.  Other ways of looking at include the number of open issues…


…or indeed the number of commits per week…


…or the overall Github Project activity for August.  (It’s impressive to see Zil0 sneaking in there on second place on the commit count, thanks to all his GSoC work documenting E2E encryption in the spec as part of implementing it in matrix-python-sdk!)


Anyway, enough numerology.  It’s worth noting that all of the dev for r0 has generally followed the proposed Open Governance Model for Matrix, with the core spec team made up of both historical core team folk (erik, richvdh, dave & matthew), new core team folk (uhoreg & travis) and community folk (kitsune, anoa & mujx) working together to review and approve the changes – and we’ve been doing MSCs (albeit with an accelerated pace) for anything which we feel requires input from the wider community.  Once the Server/Server r0 release is out the door we’ll be finalising the open governance model and switching to a slightly more measured (but productive!) model of spec development as outlined there.

Meanwhile, Matrix 1.0 gets ever closer.  With (almost) all this spec mission done, our plan is to focus more on improving the reference implementations – particularly performance in Synapse, UX in matrix-{react,ios,android}-sdk as used by Riot (especially for E2E encryption), and then declare a 1.0 and get back to implementing new features (particularly Editable Messages and Reactions) at last.

We’d like to thank everyone for your patience whilst we’ve been playing catch up on the spec, and hope you agree it’s been worth the effort :)

Matthew & the core spec team.