General

142 posts tagged with "General" (See all categories)

Atom Category Atom Feed

A Call to Arms: Supporting Matrix!

2017-07-07 — GeneralMatthew Hodgson

Hi folks,

TL;DR: if you like Matrix (and especially if you're building stuff on it), please support us via Patreon or Liberapay to keep the core team able to work on it full-time, otherwise the project is going to be seriously impacted.  And if you're a company who is invested in Matrix (e.g. itching for Dendrite), please get in touch ASAP if you'd like to sponsor core development work from the team.  And if you're a philanthropic billionaire who believes in our ideals of decentralisation, encryption, and open communication as a basic human right - we'd love to hear from you too O:-)

I was expecting this blog post to be the Matrix Summer Special, focusing entirely on the incredible progress and updates we've made in the last few months in Matrix.  However, instead I'm going to talk about something different and literally critical to Matrix's success.

As many people know, Matrix.org development has historically been exclusively and very generously sponsored by a large multinational telecoms infrastructure company for whom most of the core team once built telco messaging apps.  However, despite the project progressing better than ever (more on that later), we have just had our funding dramatically cut by >60%. [Update: as of Aug 2017 it is effectively cut entirely, with enough $ left over to cover until end of October.]

We seem to be suffering from a darkly amusing paradox, as the rationale from our corporate overlords is essentially: “Wow! Matrix is doing great and growing well - and you seem to have all sorts of exciting people and companies using and building on it.  But we've been footing the whole development bill since the outset in May 2014, and this simply doesn't feel fair.  We're happy to keep funding though - but only if others do too!”.  In other words, in some ways we are a victim of our own success...

So we now find ourselves in the situation that despite the project looking better than ever and having a tonne of amazing stuff in the pipeline, we are suddenly missing the funding to keep the core team working on it.  And the team is quite sizeable - reflecting the ambition and size of Matrix: right now we have effectively 11 people working specifically on Matrix itself: 1 on Synapse, 1 on Dendrite, 1 on e2e crypto, 2 on matrix-react-sdk (which powers Riot/Web), 2 on matrix-ios-kit / matrix-ios-sdk, 2 on matrix-android-sdk, 1 on bridges, and me (Matthew) managing the overall project.  (This ignores folks who overlap the team who are working specifically on Riot stuff).

Over the last few years we've had countless people ask if they can financially support Matrix. We haven't been able to accept it for various reasons, but now is the time for us to step towards a more independent setup, and avoid a repeat of the situation we're currently facing by opening up to external support.

So we need help from the community to keep going!  Please head over to Patreon or Liberapay and put some money in the meter (or send some bitcoin to 1LxowEgsquZ3UPZ68wHf8v2MDZw82dVmAE or ether to 0xA5f9a4f9E024F6D727f7afdA9257e22329A97485). In return, you'll get to keep Matrix evolving at a decent rate, be a member of the upcoming +supporters:matrix.org group (complete with flair badges!), and other benefits like access to #matrix-supporters:matrix.org - a new dedicated room for prioritised support, discounted goodies from Riot once paid services arrive, access to a weekly supporters-only status podcast(!), and of course receive our eternal thanks. :)

Meanwhile, if you're a company who depends on Matrix: please get in touch. We have the option for you to sponsor core Matrix development (e.g. Dendrite) or for us to provide you with more targeted support or feature development.  We're already talking to several organisations who want to accelerate Dendrite specifically - and the more support we have there the faster we can go.

We'd also like to thank UpCloud for sponsoring hosting for the Matrix.org synapse instances - UpCloud has been coping impressively with the massive I/O and CPU/RAM requirements we have, and we recommend them unreservedly for folks looking to run their own homeservers.

Finally, one of the longer term plans to help fund Matrix is to get sponsorship from Riot, once Riot starts offering paid services. So, if you're an investor who's interested in the for-profit sides of Riot (paid integrations and paid Matrix hosting) then please get in touch with the Riot team ASAP!

Moving forward we are confident that we can secure funding, through sponsorship and Riot paid services, but in truth this decision caught us by surprise and so we need help both long term but also right now!

And whatever the funding situation, we're of course always looking for contributions for code, bug reports, or just spreading the word about the project too! :)

🔗Status Update

(or scroll to next section to see why this is bigger than "just" decentralised encrypted communication)

Despite the funding issue, the project really is going very well. Our vital stats (as seen through the lens of the matrix.org synapse) are looking like this:

And meanwhile, looking back at the last big update (Holiday Special 2016), we can compare our progress with our goals for 2017 thus far:

  • Getting E2E Encryption out of beta ASAP.
This has progressed massively - we haven't really yelled about it yet, but latest https://riot.im/develop/ now finally implements the ability to share message keys between clients to let them decrypt older history and fix “unable to decrypt” errors (Mobile coming soon).  Meanwhile various root causes of “unable to decrypt” errors have been gradually eliminated; I can't actually remember the last time I saw one! Once key-sharing and improved device verification UX is fully tested and tuned we should be able to declare E2E out of beta.

Work on fixing the final causes of "unable to decrypt" (UISI) errors in E2E is progressing well: here's a sneak preview of things to come!! pic.twitter.com/0oGJjm8ZHT

— Matrix (@matrixdotorg) May 25, 2017

  • Ensuring we can scale beyond Synapse – i.e. implement Dendrite
Likewise, Dendrite is on track: we've implemented all the Hard Stuff which forms the skeleton of Dendrite (core federation, message signing, /sync, message sending, media repository etc) - which takes us to over 50% of Phase 1. After phase 1, we will have an initial usable release for all the core functionality.  Synapse's performance has also improved enormously this year.

New milestone for Dendrite: sending messages over federation BOTH WAYS between dendrite & synapse! A bit more polish & we can cut an alpha!! pic.twitter.com/DWs6rFqZcQ

— Matrix (@matrixdotorg) June 23, 2017

  • Getting as many bots and bridges into Matrix as possible, and doing everything we can to support them, host them and help them be as high quality as possible – making the public federated Matrix network as useful and diverse as possible.
Bridges and bots continue - from the core team we have a ‘puppeting' Telegram bridge (matrix-appservice-tg), and from the wider community we have Discord, Skype, Signal, new Rocket.Chat and more.  Getting them polished and live is certainly an area where we need more manpower though.
  • Supporting Riot's leap to the mainstream, ensuring Matrix has at least one killer app.
Riot has been sprouting new releases every few weeks, with a huge emphasis on proving UX:
  • an entirely new streamlined sign-up process
  • the new concept of home pages
  • a user directory search that actually works
  • internationalised to 27 languages
  • compact layout
  • loads of desktop improvements
  • piwik analytics support; etc.

There is still a lot of UX work to be done, but it's converging fast on being a great entry point into the Matrix ecosystem, driving its growth across different groups and communities..

Meanwhile, a massive update to the iOS & Android apps just landed yesterday, switching to an entirely new UI layout to separate People from Rooms, synchronized Read Markers, and more!

  • Adding the final major missing features:
    • Customisable User Profiles (this is almost done, actually)
This is still hovering at ‘almost done', and will be needed for some of the implementation of Groups (see below)..
  • Groups (i.e. ability to define groups of users, and perform invites, powerlevels, etc. per-group as well as per-user)
Groups are also in testing in Synapse too!  These will probably be the single biggest change to Matrix that we've seen since E2E encryption landed: it changes the dynamic of the whole network, given users can explicitly declare allegiance to different groups, which in turn have their own home pages and directories etc.  It lets users form communities, and declare their participation in those communities (if desired), and also lets rooms be grouped together.  One of our single biggest requests has been “subrooms” and we're incredibly excited to see how well Groups solve this.
  • Threading
Sadly no progress on Threading so far this year.
  • Editable events (and Reactions)
We're hoping to get looking at this (at last!) once Groups are done.
  • Maturing and polishing the spec (we are way overdue a new release)
You'll have noticed that in the “how many people work on Matrix?” stats above, we didn't mention anyone working on the Spec.  Because right now there isn't anyone explicitly maintaining it, unfortunately; updates are done best-effort when everyone's primary responsibilities allow it.  That said, there's quite a lot of good stuff currently unreleased on HEAD. This is something which is obviously critical to fix once we have sustainable funding sorted again.  We can only apologise to folks like the Ruma developers who have suffered from the spec lag. :(
  • Improving VoIP – especially conferencing.
VoIP is improving lots on iOS, thanks to Denis Morozov's GSoC project, and meanwhile we have all new conferencing powered by Jitsi on the horizon in the next few weeks too.
  • Reputation/Moderation management (i.e. spam/abuse filtering).
Lots of thinking about this (see below), but no development yet.
  • Much-needed SDK performance work on matrix-{'{'}react,ios,android{'}'}-sdk.
About 40% of the desired performance work has happened here (although not all has gone live yet).
  • …and a few other things which would be premature to mention right now. :D
All will be revealed in the next week or two - but suffice it to say that Integrations are going to be getting a Lot More Useful™. :)

🔗Reflections

There are very very few people actually working professionally on trying to build general-purpose open communication networks and protocols.  There's us, some XMPP, IRCv3 and GNU Social/Mastodon folks, GNU Ring, Tox, Briar, Secure Scuttlebutt, IPFS, Status.im, Ricochet… and that's literally all the major projects I can think of (sorry if I missed you!).  There's probably only 50 developers in total working in this domain as their day job.

Meanwhile, there are literally hundreds of thousands of folks trudging away building more and more near-indistinguishable proprietary closed communication systems - trapping users inside ever more silos and fragmenting the basic ability to communicate on the ‘net.  It's like a world where the open web was pushed into a tiny underground resistance, and everyone else was trapped in the walled gardens of AOL and Compuserve (or more contemporarily: Facebook, Twitter, WhatsApp etc).

In other words: the whole world of decentralised communication desperately needs your support.  This is a clear case of user choice and freedom: to give users the ability to pick who they trust with their data and metadata, without being forced into unilaterally trusting the Silicon Valley megacorps.  And this, dear Reader, is your chance to fix the world for the greater good. Seriously, the Matrix team is one of a handful in the world in a position to continue to push things in the right direction and avoid us falling into a permanent dystopia where communication is even more closed and proprietary than the Public Telephone Network!

Finally, there's an even bigger issue at stake here than open communication.  As an open network, people can literally publish whatever content they like into Matrix - same as the web or the internet itself.  As a result, there's scope for spam; abusive/malicious content; propaganda; and generally the whole spectrum of the best and worst of humanity.  Now, if we were a centralised system like Facebook, we might hire thousands of content moderators to frantically impose a rulebook on ‘acceptable' content.  Or we might build invisible filter-bubbles for our users based on their social graph, cocooning them from scary unfamiliar content outside their social circles and reinforcing their preconceptions (whilst the resulting self-affirmation keeps them coming back, viewing more and more ads).

But we're decentralised, and we have no absolute moral arbiter, and nor should we - on an open network it should be up to users and users alone to define and manage their own worldview and alignment.  Plus we are not fiscally obligated to keep users coming back to view more ads no matter what.  Instead, we are forced to confront the fundamental problem: building tools which empower users to curate and visualise their own content filters; letting them filter out the stuff they're not interested in or find repellant… while still helping them be aware of their own viewpoint and the shape of the world beyond it.  We haven't really started building this yet, but in the long term our feeling is that these tools will literally be vital for the survival of the human race (e.g. exposing anti-climate-change propaganda for what it is or helping users opt out of World War 3) - let alone the success of decentralisation.  A world where users blindly consume propaganda is doomed, and it's a fascinating situation that the same tools which will allow Matrix users to tune out the rooms, users and conversations they're not interested in could be directly applied to the bigger global problem.

So: Matrix needs you. Please become a supporter on Patreon or Liberapay, and help us save the world :)

  • Matthew, Amandine & the whole Matrix.org team.

Synapse 0.19.3 released

2017-03-21 — GeneralMatthew Hodgson

Hi all,

We've released Synapse 0.19.3-rc2 as 0.19.3 with no changes. This is a slightly unusual release, as 0.19.3-rc2 dates from March 13th and a lot of stuff has landed on the develop branch since then - however, we'll be releasing that as 0.20.0 once it's ready. Instead, 0.19.3 has a set of intermediary performance and bug fixes; the only new feature is a set of admin APIs kindly contributed by @morteza-araby.

The changelog follows - please upgrade from https://github.com/matrix-org/synapse or your OS packages as normal :)

🔗Changes in synapse v0.19.3 (2017-03-20)

No changes since v0.19.3-rc2

🔗Changes in synapse v0.19.3-rc2 (2017-03-13)

Bug fixes:

  • Fix bug in handling of incoming device list updates over federation.

🔗Changes in synapse v0.19.3-rc1 (2017-03-08)

Features:

Changes: Bug fixes:
  • Fix synapse_port_db failure. Thanks to @Pneumaticat! (PR #1904)
  • Fix caching to not cache error responses (PR #1913)
  • Fix APIs to make kick & ban reasons work (PR #1917)
  • Fix bugs in the /keys/changes api (PR #1921)
  • Fix bug where users couldn't forget rooms they were banned from (PR #1922)
  • Fix issue with long language values in pushers API (PR #1925)
  • Fix a race in transaction queue (PR #1930)
  • Fix dynamic thumbnailing to preserve aspect ratio. Thanks to @jkolo! (PR #1945)
  • Fix device list update to not constantly resync (PR #1964)
  • Fix potential for huge memory usage when getting device that have changed (PR #1969)

An Adventure in IRC-Land

2017-03-14 — GeneralKegan Dougal

Hi everyone. I'm Kegan, one of the core developers at matrix.org. This is the first in a series on the matrix.org IRC bridge. The aim of this series is to try to give a behind the scenes look at how the IRC bridge works, what kinds of problems we encountered, and how we plan to scale in the future. This post looks at how the IRC bridge actually works.

Firstly, what is "bridging"? The simple answer is that it is a program which maps between different messaging protocols so that users on different protocols can communicate with each other. Some protocols may have features which are not supported in the other (typing notifications in Matrix, DCC - direct file transfers - in IRC). This means that bridging will always be "inferior" to just using the respective protocol. That being said, where there is common ground a bridge can work well; all messaging protocols support sending and receiving text messages for example. As we'll see however, the devil is in the detail...

A lot of existing IRC bridges for different protocols share one thing in common: they use a single global bot to bridge traffic. This bot listens to all messages from IRC, and sends them to the other network. The bot also listens for messages from users on the other network, and sends messages on their behalf to IRC. This is a lot easier than having to maintain dedicated TCP connections for each user. However, it isn't a great experience for IRC users as they:

  • Don't know who is reading messages on a channel as there is just 1 bot in the membership list.
  • Cannot PM users on the other network.
  • Cannot kick/ban users on the other network without affecting everyone else.
  • Cannot bing/mention users on the other network easily (tab completion).
We made the decision very early on that we would keep dedicated TCP connections for each Matrix user. This means every Matrix user has their own tiny IRC client. This has its own problems:
  • It involves multiple connections to the IRCd so you need special permission to set up an i:line.
  • You need to be able to support identification of individual users (via ident or unique IPv6 addresses).
  • With all these connections to the same IRC channels, you need to have some way to identify which incoming messages have already been handled and which have not.

🔗Mapping Rooms

So now that we have a way to send and receive messages, how do we map the rooms/channels between protocols? This isn't as easy as you may think. We can have a single static one-to-one mapping:

  • All messages to #channel go to !abcdef:matrix.org.
  • All messages from !abcdef:matrix.org go to #channel.
  • All PMs between @alice:matrix.org and Bob go to !wxyz:matrix.org and the respective PM on IRC.
In order to make PMs secure, we need to limit who can access the room. This is done by making the Matrix PM room "invite-only". This can cause problems though if the Matrix user ever leaves that room: they won't be able to ever re-join! The IRC bridges get around this by allowing Matrix users to replace their dedicated PM room with a new room, and by checking to make sure that the Matrix user is inside the room before sending messages.

Then you have problem of "ownership" of rooms. Who should be able to kick users in a bridged room? There are two main scenarios to consider:

  • The IRC channel has existed for a while and there are existing IRC channel operators.
  • The IRC channel does not exist, but there are existing Matrix moderators.
In the first case, we want to defer ownership to the channel operators. This is what happens by default for all bridged IRC channels on matrix.org. The Matrix users have no power in the room, and are at the mercy of the IRC channel operators. The channel operators are represented by virtual Matrix users in the room. However, they do not have any power level: they are at the same level as real Matrix users. Why? The bridge does this because, unlike IRC, it's not possible in Matrix to bring a user to the same level as yourself (e.g +o), and then downgrade them back to a regular user (e.g. -o). Instead, the bridge bot itself acts as a custodian for the room, and performs privileged IRC operations (topic changing, kickbans, etc) on the IRC channel operator's behalf.

In the second case, we want to defer ownership to the Matrix moderators. This is what happens when you "provision a room" in Matrix. The bridge will PM a currently online channel operator and ask for their permission to bridge to Matrix. If they accept, the bridge is made and the power levels in the pre-existing Matrix room are left untouched, giving moderators in Matrix control over the room. However, this power doesn't extend completely to IRC. If a Matrix moderator grants moderator powers to another Matrix user, this will not be mapped to IRC. Why? It's not possible for the bridge to give chanops to any random user on any random IRC channel, so it cannot always honour the request. This relies on the humans on either side of the bridge to communicate and map power accordingly. This is done on purpose as there is no 100% perfect mapping between IRC powers and Matrix powers: it's always going to need to compromise which only a human can make.

Finally, there is the problem of one-to-many mappings. It is possible to have two Matrix rooms bridged to the same IRC channel. The problem occurs when a Matrix user in one room speaks. The bridge can easily map that to IRC, but unless it also maps it back to Matrix, the message will never make it to the 2nd Matrix room. The bridge cannot control/puppet the Matrix user who spoke, so instead it creates a virtual Matrix user to represent that real Matrix user and then sends the message into the 2nd Matrix room. Needless to say, this can be quite confusing and we strongly discourage one-to-many mappings for this reason.

🔗Mapping Messages

Mapping Matrix messages to IRC is rather easy for the most part. Messages are passed from the Homeserver to the bridge via the AS API, and the bridge sends a textual representation of the message to IRC using the IRC connection for that Matrix user. The exact form of the text for images, videos and long text can be quite subjective, and there is inevitably some data loss along the way. For example, you can send big text headings, tables and lists in Matrix, but there is no equivalent on IRC. Thankfully, most Matrix users are sending the corresponding markdown and so the formatting can be reasonably preserved by just sending the plaintext (markdown) body.

Mapping IRC messages to Matrix is more difficult: not because it's hard to represent the message in Matrix, but because of the architecture of the bridge. The bridge maintains separate connections for each Matrix user. This means the bridge might have, for example, 5 users (and hence connections) on the same channel. When an IRC user sends a message, the bridge gets 5 copies of the message. How does the bridge know:

  • If the message has already been sent?
  • If the message is an intentional duplicate?
The IRC protocol does not have message IDs, so the bridge cannot de-duplicate messages as they arrive. Instead, it "nominates" a single user's connection to be responsible for delivering messages from that channel. This introduces another problem though. Long-lived TCP connections are fickle things, and can fail without any kind of visible warning until you try to send bytes down it. If a user's connection drops, another user needs to take over responsibility for delivering messages. This is what the "IRC Event Broker" class does. It allows users to "steal" messages if the bridge has any indication that the connection in charge has dropped. This technique has worked well for us, and gives us the ability to have more robust connections to the channel than with one TCP connection alone.

🔗Admin Rooms

Admin rooms are private Matrix rooms between a real Matrix user and the bridge bot. It allows the Matrix user to control their connection to IRC. It allows:

  • The IRC nick to be changed.
  • The ability to issue /whois commands.
  • The ability to bypass the bridge and send raw IRC commands directly down the TCP connection (e.g. MODE commands).
  • The ability to save a NickServ password for use when the bridge reconnects you.
  • The ability to disconnect from the network entirely.
To perform these actions, Matrix users send a text message which starts with a command name, e.g !whois $ARG. Like all commands, you expect to get a reply once you've issued it. However, IRC makes this extremely difficult to do. There is no request/response pair like there is with HTTP requests. Instead, the IRC server may:
  • Ignore the request entirely.
  • Send an error you're aware of (in the RFC/most servers)
  • Send some information which can be assumed to indicate success.
  • Send an error you're unaware of.
  • Send some information which sometimes indicates success.
This makes it very difficult to know if a request succeeded or failed, and I'll go into more detail in the next post which focuses on problems we've encountered when developing the IRC bridge. This room is also used to inform the Matrix user about general information about their IRC connection, such as when their connection has been lost, or if there are any errors (e.g. "requires chanops to do this action"). The bridge makes no effort to parse these errors, because it doesn't always know what caused the error to happen.

🔗Wrapup

Developing a comprehensive IRC bridge is a very difficult task. This post has outlined a few of the ways in which we've designed our bridge, and some of the general problems in this field. The bridge is constantly improving as we discover new edge cases with the plethora of IRCd implementations out there. The next post will look at some of these edge cases and look back at some previous outages and examine why they occurred.

New bridged IRC network: GIMPNet

2017-03-06 — GeneralKegan Dougal

Hey everyone! As of last week, we are now bridging irc.gimp.org (GIMPNet) for all your GTK+/GNOME needs! It's running a bleeding-edge version of the IRC bridge which supports basic chanops syncing from IRC to Matrix. This means that if an IRC user gives chanops to a Matrix connection, the bridge will give that Matrix user moderator privileges in the room, allowing them to set the room topic/avatar/alias/etc! We hope this will make customising Matrix-bridged rooms a lot easier.

For a more complete list of current and future bridged IRC networks, see the official wishlist.

Load problems on the Matrix.org homeserver

2017-02-17 — GeneralMatthew Hodgson

Hi folks,

Since FOSDEM we've seen even more interest in Matrix than normal, and we've been having some problems getting the Matrix.org homeserver to keep up with demand.  This has resulted in performance being slightly slower than normal at peak times, but the main impact has been the additional traffic exacerbating outages on the homeserver - either by revealing new failure modes, or making it harder to recover rapidly after something goes wrong.

Specifically: on Friday afternoon we had a service disruption caused by someone sending an unusual event into Matrix HQ.  It turns out that both matrix-android-sdk and matrix-ios-sdk based clients (e.g. Riot/Android and iOS) handled this naively by simply resyncing the room state... which has been fine in the past, but not when you have several hundred clients actively syncing the room, and resulted in a thundering herd effect which overloaded the server for ~10 mins or so whilst they all resynced the room (which, in turn, nowadays, involves calculating and syncing several MB of JSON state to each client).  The traffic load was then high enough that it took the server a further 10-20 minutes for the server to fully catch up and recover after the herd had dissipated.  We then had a repeat performance on Monday morning of the same failure mode.

Similarly, we had disruption last night after a user who hadn't used the service for ages logged on for the first time and rapidly caught up on a few rooms which literally had millions of unread messages in them.  Generally this would be okay, but the combination of loaded DB and the sheer number of notifications being deleted ended up with 4 long-running DB deletes in parallel.  This seems to have caused postgres to lock the event_actions_table more aggressively than we'd expect, blocking other queries which were trying to access it... causing most requests to block until the deletes were over.  At the current traffic volumes this meant that the main synapse process tried to serve thousands of simultaneous requests as they stacked up and ran out of filehandles within about 10 minutes and wedged the whole synapse solid before the DB could unblock.  Irritatingly, it turns out our end-to-end monitoring has a bug where it in turn can crash on receiving a 500 from synapse, so despite having PagerDuty all set up and running (and having been receiving pages for traffic delays over the last few weeks)... we didn't get paged when we got actual failed traffic rather than slow traffic, which delayed resolving the issue.  Finally, whilst rolling out a fix this afternoon, we again hit issues with the traffic load causing more problems than we were expecting, making a routine redeploy distinctly more disruptive.

So, what are we doing about this?

  1. Fix the root causes:
    • The 'android/iOS thundering herd' bug is being worked on both the android/iOS side (fixing the naive behaviour) and the server side.  A temporary mitigation is in now place which moves the server-side code to worker processes so that worst case it can't take out the main synapse process and can scale better.
    • The 'event_push_actions table is inefficient' bug had already been fixed - so this was a matter of rushing through the hotfix to matrix.org before we saw a recurrence.
  2. Move to faster hardware.  Our current DB master is a "fast when we bought it 5 years ago" machine whose IO is simply starting to saturate (6x 300GB 10krpm disks in RAID5, fwiw), which is maxing out at around 500IOPS and 20MB/s of random access, and acting as a *very* hard limit to the current synapse performance.  We're currently in the process of evaluating SSD-backed IO for the DB (in fact, we're already running a DB slave), and assuming this tests out okay we're hoping to migrate next week, which should give us a 10x-20x speed up on disk IO and buy considerable headroom.  Watch this space for details.
  3. Make synapse faster.  We're continuing to plug away at optimisations (e.g. stuff like this), but these are reaching the point of diminishing returns, especially relative to the win from faster hardware.
  4. Fix the end-to-end monitoring.  This already happened.
  5. Load-test before deploying.  This is hard, as you really need to test against precisely the same traffic profile as live traffic, and that's hard to simulate.  We're thinking about ways of fixing this, but the best solution is probably going to be clustering and being able to do incremental redeploys to gradually test new changes.  On which note:
  6. Fix synapse's architectural deficiencies to support clustering, allowing for rolling zero-downtime redeploys, and better horizontal scalability to handle traffic spikes like this.  We're choosing not to fix this in synapse, but we are currently in full swing implementing dendrite as a next-generation homeserver in Golang, architected from the outset for clustering and horizontal scalability.  N.B. most of the exciting stuff is happening on feature branches and gomatrixserverlib atm. Also, we're deliberately taking the time to try to get it right this time, unlike bits of synapse which were something of a rush job.  It'll be a few weeks before dendrite is functional enough to even send a message (let alone finish the implementation), but hopefully faster hardware will give the synapse deployment on matrix.org enough headroom for us to get dendrite ready to take over when the time comes!
The good news of course is that you can run your own synapse today to avoid getting caught up in this operational fun & games, and unless you're planning to put tens of thousands of daily active users on the server you should be okay!

Meanwhile, please accept our apologies for the instability and be assured that we're doing everything we can to get out this turbulence as rapidly as possible.

Matthew

Synapse 0.19.1 released

2017-02-14 — GeneralMatthew Hodgson

Hi folks,

We're a little late with this, but Synapse 0.19.1 was released last week. The only change is a bugfix to a regression in room state replication that snuck in during the performance improvements that landed in 0.19.0. Please upgrade if you haven't already. We've also fixed the Debian repository to make installing Synapse easier on Jessie by including backported packages for stuff like Twisted where we're forced to use the latest releases.

You can grab it from https://github.com/matrix-org/synapse/ as always.

🔗Changes in synapse v0.19.1 (2017-02-09)

  • Fix bug where state was incorrectly reset in a room when synapse received an event over federation that did not pass auth checks (PR #1892)

When Ericsson discovered Matrix...

2016-11-23 — GeneralMatthew Hodgson

As something completely different, we've invited Stefan Ålund and his team at Ericsson to write a guest blog post about the really cool stuff Ericsson is doing with Matrix.  This is a fascinating glimpse into how major folks are already launching commercial products on top of Matrix - whilst also making significant contributions back to the projects and the community.  We'd like to thank Stefan and Ericsson for all their support and perseverance, and we wish them the very best with the Ericsson Contextual Communication Cloud!

-- Matthew

At the end of 2014, my colleague Adam Bergkvist and I attended the WebRTC Expo in Paris, partly to promote our Open Source project OpenWebRTC, but also to meet the rest of the European WebRTC community and see what others were working on.

At Ericsson Research we had been working on WebRTC for quite some time. Not only on the client-side framework and how those could enable some truly experimental stuff, but more importantly how this emerging technology could be used to build new kinds of communication services where communication is not the service (A calling B), but is integrated as part of some other service or context. A simple example would be a health care solution, where the starting point could be the patient records and communication technologies are integrated to enable remote discussions between patients and their doctors.

Our research in this area, that we started calling “contextual communication”, pointed in a different direction from Ericsson's traditional communication business, therefore making it hard for us to transfer our ideas and technologies out from Ericsson Research. We increasingly had the feeling that we needed to build something new and start from a clean slate, so to speak.

Some of our guiding principles:

  • Flexibility - the communication should be able to integrate anywhere
  • Fast iterations - browsers and WebRTC are moving targets
  • Open - interoperability is important for communication systems
  • Low cost - operations for the core communication should approach 0
  • Trust - build on the Ericsson brand and technology leadership
We had a pretty good idea about what we wanted to build, but even though Ericsson is a big company, the team working in this area was relatively small and also had a number of other commitments that we couldn't abandon.

I think that is enough of a background, let's circle back to the WebRTC Expo and the reason why I am writing this post on the Matrix blog.

Adam and I were pretty busy in our booth talking to people and giving demos, so we actually missed when Matrix won the Best Innovation Award. Nonetheless we finally got some time to walk around and I started chatting with Matthew and Amandine who were manning the Matrix booth. Needless to say, I was really impressed with their vision and what they wanted to build. The comparison to email and how they wanted to make it possible to build an interoperable bridge between communication “islands”, all in an open (source) manner, really appealed to me.

To be honest, the altruistic aspects of decentralising communication was not the most important part for us, even if we were sympathetic to the cause, working for a company that was founded from "the belief that communication is a basic human need". We ultimately wanted to build a new kind of communication offering from Ericsson, and it looked like Matrix might be able to play a part in that.

I had recently hired a couple of interns and as soon as I came back from Paris, we set them to work evaluating Matrix. They were quickly able to port an existing WebRTC service (developed and used in-house) to use Matrix signalling and user management. We initially had some concerns about the maturity of the reference Home Server implementation (remember, this was almost 2 years ago) and we didn't want to start developing our own since we were still a small team. However, Matthew and the rest of the Matrix team worked closely with us, helping to answer all our (dumb) questions and we finally got to a point where we had the confidence to say “screw it, let's try this and see if it flies”. ?

Ericsson had recently launched the Ericsson Garage where employees could pitch ideas for how to build new business. So we decided to give the process a try and presented an idea on how Ericsson could start selling contextual communication as-a-Service, directly to enterprises that wanted help integrating communication into their business processes, but didn't necessarily have the competence or business interest to run their own communication services. We got accepted and moved (physically) out of Research to sit in the Garage for the next 4 months, developing a MVP.

Since the primary interface to our offering would be through SDKs on various platforms, we decided early on to develop our own. The SDKs were implementing the standard Matrix specification, but we put a lot of time in increasing the robustness and flexibility in the WebRTC call handling and eventually with added peer-2-peer data and collaboration features, on top of the secure WebRTC DataChannel. On the server side, our initialconcerns about Synapse were eventually removed completely as the Matrix team relentlessly kept working on fixing performance issues, patching security holes and provided a story on how to scale. Over the years we have contributed with several patches to Synapse (SAML auth and auth improvements; application service improvements) and provided input to the Matrix specification. We have always found the Matrix team be very inclusive and easy to work with.

The project graduated successfully from the Ericsson Garage and moved in to Ericsson's Business Unit IT & Cloud Products, where we started to increase the size of the team and just last month signed a contract with our first customer. We call the solution Ericsson Contextual Communication Cloud, or ECCC for short, and it can be summarised on a high level by the following picture:

ECCC in a nutshell

If you are interested in ECCC, feel free to reach out at https://discuss.c3.ericsson.net

As with any project developed in the open, it is essential to have a healthy community around it. We have received excellent support from the Matrix project and they have always been open for discussion, engaged our developers and listened to our needs. We depend on Matrix now and we see great potential for the future. We hope that others will adopt the technology and help make the community grow even stronger.

  • Stefan Ålund and the Ericsson ECCC Team

Matrix’s ‘Olm’ End-to-end Encryption security assessment released - and implemented cross-platform on Riot at last!

2016-11-21 — GeneralMatthew Hodgson

TL;DR: We're officially starting the cross-platform beta of end-to-end encryption in Matrix today, with matrix-js-sdk, matrix-android-sdk and matrix-ios-sdk all supporting e2e via the Olm and Megolm cryptographic ratchets.  Meanwhile, NCC Group have publicly released their security assessment of the underlying libolm library, kindly funded by the Open Technology Fund, giving a full and independent transparent report on where the core implementation is at. The assessment was incredibly useful, finding some interesting issues, which have all been solved either in libolm itself or at the Matrix client SDK level.

If you want to get experimenting with E2E, the flagship Matrix client Riot has been updated to use the new SDK on Web, Android and iOS… although the iOS App is currently stuck in “export compliance” review at Apple. However, iOS users can mail [email protected] to request being added to the TestFlight beta to help us test!  Update: iOS is now live and approved by Apple (as of Thursday Nov 24.  You can still mail us if you want to get beta builds though!)

We are ridiculously excited about adding an open decentralised e2e-encrypted pubsub data fabric to the internet, and we hope you are too! :D


Ever since the beginning of the Matrix we've been promising end-to-end (E2E) encryption, which is rather vital given conversations in Matrix are replicated over every server participating in a room.  This is no different to SMTP and IMAP, where emails are typically stored unencrypted in the IMAP spools of all the participating mail servers, but we can and should do much better with Matrix: there is no reason to have to trust all the participating servers not to snoop on your conversations.  Meanwhile, the internet is screaming out for an open decentralised e2e-encrypted pubsub data store - which we're now finally able to provide :)

Today marks the start of a formal public beta for our Megolm and Olm-based end-to-end encryption across Web, Android and iOS. New builds of the Riot matrix client have just been released on top of the newly Megolm -capable matrix-js-sdk, matrix-ios-sdk and matrix-android-sdk libraries .  The stuff that ships today is:

  • E2E encryption, based on the Olm Double Ratchet and Megolm ratchet, working in beta on all three platforms.  We're still chasing a few nasty bugs which can cause ‘unknown inbound session IDs', but in general it should be stable: please report these via Github if you see them.
  • Encrypted attachments are here! (limited to ~2MB on web, but as soon as https://github.com/matrix-org/matrix-react-sdk/pull/562 lands this limit will go away)
  • Encrypted VoIP signalling (and indeed any arbitrary Matrix events) are here!
  • Tracking whether the messages you receive are from ‘verified' devices or not.
  • Letting you block specific target devices from being able to decrypt your messages or not.
  • The Official Implementor's Guide.  If you're a developer wanting to add Olm into your Matrix client/bot/bridge etc, this is the place to start.
Stuff which remains includes:
  • Speeding up sending the first message after adds/removes a device from a room (this can be very slow currently - e.g. 10s, but we can absolutely do better).
  • Proper device verification.  Currently we compare out-of-band device fingerprints, which is a terrible UX.  Lots of work to be done here.
  • Turning on encryption for private rooms by default.  We're deliberately keeping E2E opt-in for now during beta given there is a small risk of undecryptable messages, and we don't want to lull folks into a false sense of security.  As soon as we're out of beta, we'll obviously be turning on E2E for any room with private history by default.  This also gives the rest of the Matrix ecosystem a chance to catch up, as we obviously don't want to lock out all the clients which aren't built on matrix-{'{'}js,ios,android{'}'}-sdk.
  • We're also considering building a simple Matrix proxy to aid migration that you can run on localhost that E2Es your traffic as required (so desktop clients like WeeChat, NaChat, Quaternion etc would just connect to the proxy on localhost via pre-E2E Matrix, which would then manage all your keys & sessions & ratchets and talk E2E through to your actual homeserver.
  • Matrix clients which can't speak E2E won't show encrypted messages at all.
  • ...lots and lots of bugs :D .  We'll be out of beta once these are all closed up.
In practice the system is working very usably, especially for 1:1 chats.  Big group chats with lots of joining/parting devices are a bit more prone to weirdness, as are edge cases like running multiple Riot/Webs in adjacent tabs on the same account.  Obviously we don't recommend using the E2E for anything mission critical requiring 100% guaranteed privacy whilst we're still in beta, but we do thoroughly recommend everyone to give it a try and file bugs!

In Riot you can turn it on a per-room basis if you're an administrator that room by flipping the little padlock button in Room Settings.  Warning: once enabled, you cannot turn it off again for that room (to avoid the race condition of people suddenly decrypting a room before someone says something sensitive):

screen-shot-2016-11-21-at-15-21-15

The journey to end-to-end encryption has been a bit convoluted, with work beginning in Feb 2015 by the Matrix team on Olm: an independent Apache-licensed implementation in C/C++11 of the Double Ratchet algorithm designed by Trevor Perrin and Moxie Marlinspike ( https://github.com/trevp/double_ratchet/wiki - then called ‘axolotl').  We picked the Double Ratchet in its capacity as the most ubiquitous, respected and widely studied e2e algorithm out there - mainly thanks to Open Whisper Systems implementing it in Signal, and subsequently licensing it to Facebook for WhatsApp and Messenger, Google for Allo, etc.  And we reasoned that if we are ever to link huge networks like WhatsApp into Matrix whilst preserving end-to-end encrypted semantics, we'd better be using at least roughly the same technology :D

One of the first things we did was to write a terse but formal spec for the Olm implementation of the Double Ratchet, fleshing out the original sketch from Trevor & Moxie, especially as at the time there wasn't a formal spec from Open Whisper Systems (until yesterday! Congratulations to Trevor & co for publishing their super-comprehensive spec :).  We wrote a first cut of the ratchet over the course of a few weeks, which looked pretty promising but then the team got pulled into improving Synapse performance and features as our traffic started to accelerate faster than we could have possibly hoped.  We then got back to it again in June-Aug 2015 and basically finished it off and added a basic implementation to matrix-react-sdk (and picked up by Vector, now Riot)… before getting side-tracked again.  After all, there wasn't any point in adding e2e to clients if the rest of the stack is on fire!

Work resumed again in May 2016 and has continued ever since - starting with the addition of a new ratchet to the mix.  The Double Ratchet (Olm) is great at encrypting conversations between pairs of devices, but it starts to get a bit unwieldy when you use it for a group conversation - especially the huge ones we have in Matrix.  Either each sender needs to encrypt each message N times for every device in the room (which doesn't scale), or you need some other mechanism.

For Matrix we also require the ability to explicitly decide how much conversation history may be shared with new devices.  In classic Double Ratchet implementations this is anathema: the very act of synchronising history to a new device is a huge potential privacy breach - as it's deliberately breaking perfect forward secrecy.  Who's to say that the device you're syncing your history onto is not an attacker?  However, in practice, this is a very common use case.  If a Matrix user switches to a new app or device, it's often very desirable that they can decrypt old conversation history on the new device.  So, we make it configurable per room.  (In today's implementation the ability to share history to new devices is still disabled, but it's coming shortly).

The end result is an entirely new ratchet that we've called Megolm - which is included in the same libolm library as Olm.  The way Megolm works is to give every sender in the room its own encrypted ratchet (‘outbound session'), so every device encrypts each message once based on the current key given by their ratchet (and then advances the ratchet to generate a new key).  Meanwhile, the device shares the state of their ‘outbound session' to every other device in the room via the normal Olm ratchet in a 1:1 exchange between the devices.  The other devices maintain an ‘inbound session' for each of the devices they know about, and so can decrypt their messages.  Meanwhile, when new devices join a room, senders can share their sessions according to taste to the new device - either giving access to old history or not depending on the configuration of the room.  You can read more in the formal spec for Megolm.

We finished the combination of Olm and Megolm back in September 2016, and shipped the very first implementation in the matrix-js-sdk and matrix-react-sdk as used in Riot with some major limitations (no encrypted attachments; no encrypted VoIP signalling; no history sharing to new devices).

Meanwhile, we were incredibly lucky to receive a public security assessment of the Olm & Megolm implementation in libolm from NCC Group Cryptography Services - famous for assessing the likes of Signal, Tor, OpenSSL, etc and other Double Ratchet implementations. The assessment was very generously funded by the Open Technology Fund (who specialise in paying for security audits for open source projects like Matrix).  Unlike other Double Ratchet audits (e.g. Signal), we also insisted that the end report was publicly released for complete transparency and to show the whole world the status of the implementation.

NCC Group have released the public report today - it's pretty hardcore, but if you're into the details please go check it out.  The version of libolm assessed was v1.3.0, and the report found 1 high risk issue, 1 medium risk, 6 low risk and 1 informational issues - of which 3 were in Olm and 6 in Megolm.  Two of these (‘Lack of Backward Secrecy in Group Chats' and ‘Weak Forward Secrecy in Group Chats') are actually features of the library which power the ‘configurable privacy per-room' behaviour mentioned a few paragraphs above - and it's up to the application (e.g. matrix-js-sdk) to correctly configure privacy-sensitive rooms with the appropriate level of backward or forward secrecy; the library doesn't enforce it however.  The most interesting findings were probably the fairly exotic Unknown Key Share attacks in both Megolm and Olm - check out NCC-Olm2016-009 and NCC-Olm2016-010 in the report for gory details!

Needless to say all of these issues have been solved with the release of libolm 2.0.0 on October 25th and included in today's releases of the client SDKs and Riot.  Most of the issues have been solved at the application layer (i.e. matrix-js-sdk, ios-sdk and android-sdk) rather than in libolm itself.  Given the assessment was specifically for libolm, this means that technically the risks still exist at libolm, but given the correct engineering choice was to fix them in the application layer we went and did it there. (This is explains why the report says that some of the issues are ‘not fixed' in libolm itself).

Huge thanks to Alex Balducci and Jake Meredith at NCC Group for all their work on the assessment - it was loads of fun to be working with them (over Matrix, of course) and we're really happy that they caught some nasty edge cases which otherwise we'd have missed.  And thanks again to Dan Meredith and Chad Hurley at OTF for funding it and making it possible!

Implementing decentralised E2E has been by far the hardest thing we've done yet in Matrix, ending up involving most of the core team.  Huge kudos go to: Mark Haines for writing the original Olm and matrix-js-sdk implementation and devising Megolm, designing attachment encryption and implementing it in matrix-{'{'}js,react{'}'}-sdk, Richard van der Hoff for taking over this year with implementing and speccing Megolm, finalising libolm, adding all the remaining server APIs (device management and to_device management for 1:1 device Olm sessions), writing the Implementor's Guide, handling the NCC assessment, and pulling together all the strands to land the final implementation in matrix-js-sdk and matrix-react-sdk.  Meanwhile on Mobile, iOS & Android wouldn't have happened without Emmanuel Rohée, who led the development of E2E in matrix-ios-sdk and OLMKit (the iOS wrappers for libolm based on the original work by Chris Ballinger at ChatSecure - many thanks to Chris for starting the ball rolling there!), Pedro Contreiras and Yannick Le Collen for doing all the Android work, Guillaume Foret for all the application layer iOS work and helping coordinate all the mobile work, and Dave Baker who got pulled in at the last minute to rush through encrypted attachments on iOS (thanks Dave!).  Finally, eternal thanks to everyone in the wider community who's patiently helped us test the E2E whilst it's been in development in #megolm:matrix.org; and to Moxie, Trevor and Open Whisper Systems for inventing the Double Ratchet and for allowing us to write our own implementation in Olm.

It's literally the beginning for end-to-end encryption in Matrix, and we're unspeakably excited to see where it goes.  More now than ever before the world needs an open communication platform that combines the freedom of decentralisation with strong privacy guarantees, and we hope this is a major step in the right direction.

-- Matthew, Amandine & the whole Matrix team.

Further reading: