Matthew Hodgson

161 posts tagged with "Matthew Hodgson" (See all Author)

Synapse 0.19.1 released

14.02.2017 00:00 — General Matthew Hodgson

Hi folks,

We're a little late with this, but Synapse 0.19.1 was released last week. The only change is a bugfix to a regression in room state replication that snuck in during the performance improvements that landed in 0.19.0. Please upgrade if you haven't already. We've also fixed the Debian repository to make installing Synapse easier on Jessie by including backported packages for stuff like Twisted where we're forced to use the latest releases.

You can grab it from https://github.com/matrix-org/synapse/ as always.

🔗Changes in synapse v0.19.1 (2017-02-09)

  • Fix bug where state was incorrectly reset in a room when synapse received an event over federation that did not pass auth checks (PR #1892)

FOSDEM 2017 report

06.02.2017 00:00 — General Matthew Hodgson

Hi all,

FOSDEM this year was even more crazy and incredible than ever - with attendance up from 6,000 to 9,000 folks, it's almost impossible to describe the atmosphere. Matt Jordan from Asterisk describes it as DisneyWorld for OSS Geeks, but it's even more than that: it's basically a corporeal representation of the whole FOSS movement.  There is no entrance fee; there is no intrusive sponsorship; there is no corporate presence: it's just a venue for huge numbers of FOSS projects and their users and communities to come together in one place (the Université Libre de Bruxelles) and talk and learn.  Imagine if someone built a virtual world with storefronts for every open source project imaginable, where you could chat to the core team, geek out with other users, or gather in auditoriums to hear updates on the latest projects & ideas.  Well, this is FOSDEM... except even better, it's in real life.  With copious amounts of Belgian beer.

Anyway: this year we had our normal stand on the 2nd floor of K building, sharing the Realtime Lounge chill-out space with the XMPP Standards Foundation.  This year we had a larger representation than ever before with Matthew, Erik and Luke from the London team as well as Manu & Yannick from Rennes - which is just as well given all 5 of us ended up speaking literally non-stop from 10am to 6pm on both Saturday & Sunday (and then into the night as proceedings deteriorated/evolved into an impromptu Matrix meetup with Coffee, uhoreg, tadzik, realitygaps and others!).  The level of interest at the Matrix booth was frankly phenomenal: a major change from the last two FOSDEMs in that this year pretty much everyone had already heard of Matrix, and were most likely to want to enthuse about features and bugs in Synapse or Riot, or geek out about writing new bridges/bots/clients, or trying to work out a way to incorporate Matrix into their own projects or companies.

#RTCLounge with @ara4n from @matrixdotorg busy demoing cool stuff pic.twitter.com/Vc0uLEceQP

— miconda (@miconda) February 4, 2017

Synapse 0.19 and Riot 0.9.7 were also released on Saturday to try to ensure that anyone joining Matrix for the best time at FOSDEM were on the latest & greatest code - especially given the performance and E2E fixes present in both.  Amazingly the last-minute release didn't backfire: if you haven't upgraded to Synapse 0.19 we recommend going so asap.  And if you're a Riot user, make sure you're on the latest version :)

We were very lucky to have two talks accepted this year: the main one in the Security Track on the Jansen main stage telling the tale of how we added end-to-end encryption to Matrix via Olm & Megolm - and the other in the Decentralised Internet room (AW1.125), focusing on the unsolved future problems of decentralised accounts, identity, reputation in Matrix.  Both talks were well attended, with huge queues for the Decentralised Internet room: we can only apologise to everyone who queued for 20+ minutes only to still not be able to get in.  Hopefully next year FOSDEM will allocate a larger room for decentralisation!  On the plus side, this year FOSDEM did an amazing job of videoing the sessions - livestreaming every talk, and automatically publishing the recordings (via a fantastic 'publish your own talk' web interface) - so many of the people who couldn't get into the room (as well as the rest of the world) were able to watch it live anyway by the stream.

This is how popular decentralised communication with @matrixdotorg is at #fosdem2017. pic.twitter.com/6T5PK6RRJE

— Jan Weisensee (@ilumium) February 5, 2017

Security track at #FOSDEM: @matrixdotorg project & @ara4n pic.twitter.com/QwroHSNh8Z

— miconda (@miconda) February 5, 2017

https://t.co/x0x7xuzlH2 presentation at the Decentralized Internet Devroom @fosdem pic.twitter.com/J2Wxo9SZ8H

— Tristan Nitot (@nitot) February 5, 2017

You can watch the video of the talks from the FOSDEM website here and here.  Both talks necessarily include the similar exposition for folks unfamiliar with Matrix, so apologies for the duplication - also, the "future of decentralised communication" talk ended up a bit rushed; 20 minutes is not a lot of time to both explain Matrix and give an overview of the challenges we face in fixing spam, identity, moderation etc.  But if you like hearing overenthusiastic people talking too fast about how amazing Matrix is, you may wish to check out the videos :)  You can also get at the slides as PDF here (E2E Encryption) and here (Future of Decentralisation).

Huge thanks to evevryone who came to the talks or came and spoke to us at the stand or around the campus.  We had an amazing time, and are already looking forward to next year!

Matthew & the team

Synapse 0.19 is here, just in time for FOSDEM!

04.02.2017 00:00 — Tech Matthew Hodgson

Hi all,

We're happy to announce the release of Synapse 0.19.0 (same as 0.19.0-rc4) today, just in time for anyone discovering Matrix for the first time at FOSDEM 2017!  In fact, here's Erik doing the release right now (with moral support from Luke):

release

This is a pretty big release, with a bunch of new features and lots and lots of debugging and optimisation work following on some of the dramas that we had with 0.18 over the Christmas break.  The biggest things are:

  • IPv6 Support (unless you have an IPv6 only resolver), thanks to contributions from Glyph from Twisted and Kyrias!
  • A new API for tracking the E2E devices present in a room (required for fixing many of the remaining E2E bugs...)
  • Rewrite the 'state resolution' algorithm to be orders of magnitude more performant
  • Lots of tuning to the caching logic.
If you're already running a server, please upgrade!  And if you're not, go grab yourself a brand new Synapse from Github. Debian packages will follow shortly (as soon as Erik can figure out the necessary backporting required for Twisted 16.6.0)

And here's the full changelog...

🔗Changes in synapse v0.19.0 (2017-02-04)

No changes since RC 4.

🔗Changes in synapse v0.19.0-rc4 (2017-02-02)

  • Bump cache sizes for common membership queries (PR #1879)

🔗Changes in synapse v0.19.0-rc3 (2017-02-02)

  • Fix email push in pusher worker (PR #1875)
  • Make presence.get_new_events a bit faster (PR #1876)
  • Make /keys/changes a bit more performant (PR #1877)

🔗Changes in synapse v0.19.0-rc2 (2017-02-02)

  • Include newly joined users in /keys/changes API (PR #1872)

🔗Changes in synapse v0.19.0-rc1 (2017-02-02)

Features:

  • Add support for specifying multiple bind addresses (PR #1709, #1712, #1795, #1835). Thanks to @kyrias!
  • Add /account/3pid/delete endpoint (PR #1714)
  • Add config option to configure the Riot URL used in notification emails (PR #1811). Thanks to @aperezdc!
  • Add username and password config options for turn server (PR #1832). Thanks to @xsteadfastx!
  • Implement device lists updates over federation (PR #1857, #1861, #1864)
  • Implement /keys/changes (PR #1869, #1872)
Changes:
  • Improve IPv6 support (PR #1696). Thanks to @kyrias and @glyph!
  • Log which files we saved attachments to in the media_repository (PR #1791)
  • Linearize updates to membership via PUT /state/ to better handle multiple joins (PR #1787)
  • Limit number of entries to prefill from cache on startup (PR #1792)
  • Remove full_twisted_stacktraces option (PR #1802)
  • Measure size of some caches by sum of the size of cached values (PR #1815)
  • Measure metrics of string_cache (PR #1821)
  • Reduce logging verbosity (PR #1822, #1823, #1824)
  • Don't clobber a displayname or avatar_url if provided by an m.room.member event (PR #1852)
  • Better handle 401/404 response for federation /send/ (PR #1866, #1871)
Fixes:
  • Fix ability to change password to a non-ascii one (PR #1711)
  • Fix push getting stuck due to looking at the wrong view of state (PR #1820)
  • Fix email address comparison to be case insensitive (PR #1827)
  • Fix occasional inconsistencies of room membership (PR #1836, #1840)
Performance:
  • Don't block messages sending on bumping presence (PR #1789)
  • Change device_inbox stream index to include user (PR #1793)
  • Optimise state resolution (PR #1818)
  • Use DB cache of joined users for presence (PR #1862)
  • Add an index to make membership queries faster (PR #1867)

Matrix.org homeserver outage (25th Jan 2017)

25.01.2017 00:00 — Tech Matthew Hodgson

Hi folks,

As many will have noticed there was a major outage on the Matrix homeserver for matrix.org last night (UK-time). This impacted anyone with an account on the matrix.org server, as well as anyone using matrix.org-hosted bots & bridges. As Matrix rooms are shared over all participants, rooms with participants on other servers were unaffected (for users on those servers). Here's a quick explanation of what went wrong (times are UTC):

  • 2017-01-24 16:00 - We notice that we're badly running out of diskspace on the matrix.org backup postgres replica. (Turns out the backup box, whilst identical hardware to the master, had been built out as RAID-10 rather than RAID-5 and so has less disk space).
  • 2017-01-24 17:00 - We decide to drop a large DB index: event_push_actions(room_id, event_id, user_id, profile_tag), which was taking up a disproportionate amount of disk space, on the basis that it didn't appear to be being used according to the postgres stats. All seems good.
  • 2017-01-24 ~23:00 - The core matrix.org team go to bed.
  • 2017-01-24 23:33 - Someone redacts an event in a very active room (probably #matrix:matrix.org) which necessitates redacting the associated push notification from the event_push_actions table. This takes out a lock within persist_event, which is then blocked on deleting the push notification. It turns out that this deletion requires the missing DB constraint, causing the query to run for hours whilst holding the transaction lock. The symptoms are that anything reading events from the DB was blocked on the transaction, causing messages not to be relayed to other clients or servers despite appearing to send correctly. Meanwhile, the fact that events are being received by the server fine (including over federation) makes the monitoring graphs look largely healthy.
  • 2017-01-24 23:35 - End-to-end monitoring detects problems, and sends alerts into pagerduty and various Matrix rooms. Unfortunately we'd failed to upgrade the pageduty trial into a paid account a few months ago, however, so the alerts are lost.
  • 2017-01-25 08:00 - Matrix team starts to wake up and spot problems, but confusion over the right escalation process (especially with Matthew on holiday) means folks assume that other members of the team must already be investigating.
  • 2017-01-25 09:00 - Server gets restarted, service starts to resume, although box suffers from load problems as traffic tries to catch up.
  • 2017-01-25 09:45 - Normal service on the homeserver itself is largely resumed (other than bridges; see below)
  • 2017-01-25 10:41 - Root cause established and the redaction path is patched on matrix.org to stop a recurrence.
  • 2017-01-25 11:15 - Bridges are seen to be lagging and taking much longer to recover than expected. Decision made to let them continue to catch up normally rather than risk further disruption (e.g. IRC join/part spam) by restarting them.
  • 2017-01-25 13:00 - All hosted bridges returned to normal.

Obviously this is rather embarrassing, and a huge pain for everyone using the matrix.org homeserver - many apologies indeed for the outage. On the plus side, all the other Matrix homeservers out there carried on without noticing any problems (which actually complicated spotting that things had broken, given many of the core team primarily use their personal homeservers).

In some ways the root cause here is that the core team has been focusing all its energy recently on improving the overall Matrix codebase rather than operational issues on matrix.org itself, and as a result our ops practices have fallen behind (especially as the health of the Matrix ecosystem as a whole is arguably more important than the health of a single homeserver deployment). However, we clearly need to improve things here given the number of people (>750K at the last count) dependent on the Matrix.org homeserver and its bridges & bots.

Lessons learnt on our side are:

  • Make sure that even though we had monitoring graphs & thresholds set up on all the right things... monitoring alerts actually have to be routed somewhere useful - i.e. phone calls to the team's phones. Pagerduty is now set up and running properly to this end.
  • Make sure that people know to wake up the right people anyway if the monitoring alerting system fails.
  • To be even more paranoid about hotfixes to production at 5pm, especially if they can wait 'til the next day (as this one could have).
  • To investigate ways to rapidly recover bridges without causing unnecessary disruption.

Apologies again to everyone who was bitten by this - we're doing everything we can to ensure it doesn't happen again.

Matthew & the team.

Synapse 0.18.7 is out - Please upgrade, especially if on 0.18.5 or 0.18.6.

06.01.2017 00:00 — Tech Matthew Hodgson

Hi all,

TL;DR: Please upgrade to Synapse 0.18.6, especially if you are on 0.18.5 which is a bad release.

TL;DR: Please upgrade to Synapse 0.18.7 - especially if you are on 0.18.5 or 0.18.6 which both have serious federation bugs.

Synapse 0.18.5 contained a really nasty regression in the federation code which causes servers to echo transactions that they receive back out to the other servers participating in a room. This has effectively resulted in a gradual amplification of federation traffic as more people have installed 0.18.5, causing every transaction to be received N times over where N is the number of servers in the room.

We'll do a full write-up once we're happy we've tracked down all the root problems here, but the short story is that this hit critical mass around Dec 26, where typical Synapses started to fail to keep up with the traffic - especially when requests hit some of the more inefficient or buggy codepaths in Synapse.  As servers started to overload with inbound connections, this in turn started to slow down and consume resources on the connecting servers - especially due to an architectural mistake in Synapse which blocks inbound connections until the request has been fully processed (which could require the receiving server in turn to make outbound connections), rather than releasing the inbound connection asap.  This hit the point that servers were running out of file descriptors due to all the outbound and inbound connections, at which point they started to entirely tarpit inbound connections, resulting in a slow feedback loop making the whole situation even worse.

We've spent the last two weeks hunting all the individual inefficient requests which were mysteriously starting to cause more problems than they ever had before; then trying to understand the feedback misbehaviour; before finally discovering the regression in 0.18.5 as the plausible root cause of the problem.  Troubleshooting has been complicated by most of the team having unplugged for the holidays, and because this is the first (and hopefully last!) failure mode to be distributed across the whole network, making debugging something of a nightmare - especially when the overloading was triggering a plethora of different exotic failure modes.  Huge thanks to everyone who has shared their server logs with the team to help debug this.

Some of these failure modes are still happening (and we're working on fixing them), but we believe that if everyone upgrades away from the bad 0.18.5 release most of the symptoms will go away, or at least go back to being as bad as they were before.  Meanwhile, if you find your server suddenly grinding to a halt after upgrading to 0.18.6 0.18.7 please come tell us in #matrix-dev:matrix.org.

We're enormously sorry if you've been bitten by the federation instability this has caused - and many many thanks for your patience whilst we've hunted it down.  On the plus side, it's given us a lot of very useful insight into how to implement federation in future homeservers to not suffer from any of these failure modes.  It's also revealed the root cause of why Synapse's RAM usage is quite so bad - it turns out that it actually idles at around 200MB with default caching, but there's a particular codepath which causes it to spike temporarily by 1GB or so - and that RAM is then not released back to the OS.  We're working on a fix for this too, but it'll come after 0.18.7.

Unfortunately the original release of 0.18.6 still exhibits the root bug, but 0.18.7 (originally released as 0.18.7-rc2) should finally fix this.  Sorry for all the upgrades :(

So please upgrade as soon as possible to 0.18.7. Debian packages are available as normal.

thanks,

Matthew

🔗Changes in synapse v0.18.7 (2017-01-09)

  • No changes from v0.18.7-rc2

🔗Changes in synapse v0.18.7-rc2 (2017-01-07)

Bug fixes:

  • Fix error in rc1's discarding invalid inbound traffic logic that was incorrectly discarding missing events

🔗Changes in synapse v0.18.7-rc1 (2017-01-06)

Bug fixes:

  • Fix error in #PR 1764 to actually fix the nightmare #1753 bug.
  • Improve deadlock logging further
  • Discard inbound federation traffic from invalid domains, to immunise against #1753

🔗Changes in synapse v0.18.6 (2017-01-06)

Bug fixes:

  • Fix bug when checking if a guest user is allowed to join a room - thanks to Patrik Oldsberg (PR #1772)

🔗Changes in synapse v0.18.6-rc3 (2017-01-05)

Bug fixes:

  • Fix bug where we failed to send ban events to the banned server (PR #1758)
  • Fix bug where we sent event that didn't originate on this server to other servers (PR #1764)
  • Fix bug where processing an event from a remote server took a long time because we were making long HTTP requests (PR #1765, PR #1744)
Changes:
  • Improve logging for debugging deadlocks (PR #1766, PR #1767)

🔗Changes in synapse v0.18.6-rc2 (2016-12-30)

Bug fixes:

  • Fix memory leak in twisted by initialising logging correctly (PR #1731)
  • Fix bug where fetching missing events took an unacceptable amount of time in large rooms (PR #1734)

🔗Changes in synapse v0.18.6-rc1 (2016-12-29)

Bug fixes:

  • Make sure that outbound connections are closed (PR #1725)

The Matrix Holiday Special! (2016 edition)

26.12.2016 00:00 — General Matthew Hodgson

We seem to have fallen into the pattern of giving seasonal 'state of the union' updates on the Matrix blog, despite best intentions to blog more frequently... although given the Autumn Update ended up being posted in November this one is going to be a relatively incremental update.  Let's jump straight in:

🔗E2E Encryption

Unless you've been in a coma for the last month you'll have hopefully noticed that we launched the formal beta for E2E Encryption across matrix-{'{'}js,ios,android{'}'}-sdk (and thus Riot/{'{'}Web, iOS, Android{'}'}) in November, complete with the successful independent public security assessment of our Olm and Megolm cryptography library from NCC Group.  So far the beta has gone well in parts: the core Olm/Megolm crypto library has held up well with no bugfixes at all required since the audit (yay!).  However, we've hit a lot of different edge cases in the wild where devices can fail to share their outbound session ratchet state to other devices present in the room.  This results in the infamous "Unknown Inbound Session ID" (UISI) errors which many folks will have seen (now renamed to the more meaningful "Unable to decrypt: The sender's device has not sent us the keys for this message" error).

Unfortunately there's a bunch of entirely different causes for this, both platform-specific and cross-platform, and we've been running around untangling all the error reports and getting to the bottom of it.  The good news is that we think we now know the vast majority of the causes, and fixes are starting to land.  We've also just finished a fairly time-consuming formal crypto code-review on the three application SDK implementations (JS, iOS & Android) to shake out any other issues.  Meanwhile some new features have also landed - e.g. the ability for guests to use E2E!  The remaining stuff at this point before we can consider declaring E2E out of beta is:

  • Finish fixing the UISI errors (in progress)
  • Warn when unverified devices are added to a room
  • Implement passphrased backup & restore for E2E state, so that folks can avoid losing their E2E history when they logout or switch to a new device
  • Improve device verification.
Thanks to everyone who's been using E2E and reporting issues - given the number of different UISI error causes out there, it's been really useful to go through the different bug reports that folks have submitted.  Please continue to submit them when you see unexpected problems (especially over the coming months as stability improves!)

🔗New Projects!

There have been a tonne of new projects popping up from all over the place since the last update.  Looking at the git history of the projects page, we've been adding one every few days!  Highlights include:

🔗Bridges:

🔗Clients

🔗Other projects

🔗Bots and Bridges

There's been a bunch of work from the core team on bots & bridges infrastructure over the last month:

Rearchitecting the slack and gitter bridges to optionally support 'puppeting users'.  This is in some ways the ultimate flavour of bridging - where you authenticate with the remote service using your "real" gitter/slack/... credentials, and then the bridge has access to synchronise your full spectrum of data with Matrix.  This is in contrast to the current implementations where the bridge creates virtual users (e.g. Slack webhook bots or IRC virtual user bots) or uses a predefined bot (e.g. matrixbridge on gitter) to link the rooms.

This has some huge advantages: e.g. ability to bridge Slack and Gitter DMs through properly to Matrix; bridging presence and typing notifications correctly, not requiring any custom bots or integrations to be configured; not proxying via a crappy bridge bot as per gitter today; letting Matrixed users be completely indistinguishable from their native selves on the remote platform - so supporting tab complete in Slack, profiles, presence, etc.  The main disadvantage is that you have to have an account on the platform already (although you could argue this is a feature, especially from the remote network's perspective!) and that you are delegating access to that account through to the bridge, so you'd better trust it.  However, you can always run your own bridge if trust is an issue.

The work on this is mid-progress currently, but we're really looking forward to seeing the official Slack, Gitter and other bridges support this mode of operation in the new year!

We've also been spending some time working with bridges written by the wider community (e.g. Half-Shot's twitter bridge) to get them deployed on the matrix.org homeserver itself, to help folks who can't run their own.

Meanwhile there's been a lot of work going into supporting the IRC bridge. Main highlights there are:

  • The release of matrix-appservice-irc 0.7, with all sorts of major new features
  • Turning on bi-directional membership list syncing at last for all networks other than Freenode!  In theory, at least, you should finally see the same list of room members in both IRC and Matrix!!
  • Handling IRC PM botspam from Freenode and OFTC, which bridge through as invite spam into Matrix.  Sorry if you've been bitten by this.  We've worked around it for now by setting appropriate umodes on the IRC bots, and by implementing a 'bulk reject' button on Riot (under in Settings).  This caused a few nasty outages on Freenode and OFTC. On the plus side, at least it shows that Riot scales up to receiving 2000+ invites without exhibiting ill effects...
  • Considering how to improve history visibility on IRC to avoid scenarios where channel history is shared between users in the same room (even if their IRC bot has temporarily disconnected).  This was a major problem during the Freenode/OFTC outages mentioned earlier.
Last but not least, we've just released gomatrix - a new official Matrix client SDK for golang!  Go-neb (the reference golang Matrix bot framework) has been entirely refactored to use gomatrix, which should keep it honest as a 1st class Matrix client SDK for those in the Golang community.  We highly recommend all Golang nuts to go read the documentation and give it a spin!

🔗Riot Desktop

Riot development has been largely preoccupied with E2E debugging in the respective Matrix Client SDKs, but 0.9.3 was released last week adding in Electron-based desktop app support.  (Remember, if you hate Electron-style desktop apps which provide a desktop app by embedded a webbrowser, you can always use another Matrix client!).  If you've been missing having Riot as a proper desktop app, go get involved!

screen-shot-2016-12-26-at-01-00-12

🔗Next Generation Homeservers

🔗Ruma

Ruma is a project led by Jimmy Cuadra to build a Matrix homeserver in Rust - the project has been ploughing steadily onwards through 2016 with a bit of an acceleration during December.  You can follow progress at the excellent This Week in Ruma blog, watching the project on Github, and tracking the API status dashboard.  Some of the latest PRs are looking very promising in terms of getting the core remaining CS APIs working, e.g:

Needless to say, we've been keeping an eye on Ruma with extreme interest, not least as some of the Matrix core team are rabid Rustaceans too :)  We can't wait to see it exposing a usable CS API in the hopefully not-too-distant future!!

🔗Dendrite

Meanwhile, in the core team, we've been doing some fairly serious experimentation on next-generation homeservers.  Synapse is in a relatively stable state currently, and we've implemented most of the horizontal scalability tricks available to us there (e.g. splitting out worker processes).  Instead we're starting to hit some fundamental limitations of the architecture: the fact that the whole codebase effectively assumes that it's talking to a single consistent database instance; python's single-threadedness and memory inefficiency; twisted's lack of profiling; being limited to sqlite's featureset; the fact that the schema has grown organically and is difficulty to refactor aggressively; the fact the app papers over SQL problems by caching everything in RAM (resulting in synapse's high RAM requirements); the constant bugs caused by lack of type safety; etc.

We started an experiment in Golang to fix some of this a year ago in the form of Dendron - a "strangler pattern" homeserver skeleton intended to sit in front of a synapse and slowly port endpoints over to Go.  In practice, Dendron ended up just being a rather dubious Matrix-aware loadbalancer, and meanwhile no endpoints got moved into it (other than /login, which then got moved out again due to the extreme confusion of having to maintain implementations in both Dendron & Synapse).  The main reasons for Dendron's failure are a) we had enough on our hands supporting Synapse; b) there were easier scalability improvements (e.g. workers) to be had on Synapse; c) the gradual migration approach looked like it would end up sharing the same storage backend as Synapse anyway, and potentially end up inheriting a bunch of Synapse's woes.

So instead, a month or so ago we started a new project codenamed Dendrite (aka Dendron done right ;D) - this time an entirely fresh standalone Golang codebase for rapid development and iteration on the platonic ideal of a next-generation homeserver (and an excuse to audit and better document & spec some of the murkier bits of Matrix).  The project is still very early and there's no doc or code to be seen yet, but it's looking cautiously optimistic (especially relative to Dendron!).  The project goals are broadly:

  1. To build a new HS capable of supporting the exponentially increasing load on matrix.org ASAP (which is currently at 600K accounts, 50K rooms, 5 messages/s and growing fast).
  2. To architecturally support full horizontal scalability through clustering and sharding from the outset - i.e. no single DBs or DB writer processes.
  3. To optimise for Postgres rather than be constrained by SQLite, whilst still aiming for a simple but optimal schema and storage layer.  Optimising for smaller resource footprints (e.g. environments where a Postgres is overkill) will happen later - but the good news is that the architecture will support it (unlike Synapse, which doesn't scale down nicely even with SQLite).
It's too early to share more at this stage, but thought we should give some visibility on where things are headed!  Needless to say, Synapse is here for the foreseeable - we think of it as being the Matrix equivalent of the role Apache httpd played for the Web.  It's not enormously efficient, but it's popular and relatively mature, and isn't going away.  Meanwhile, new generations of servers like Ruma and Dendrite will come along for those seeking a sleeker but more experimental beast, much as nginx and lighttpd etc have come along as alternatives to Apache.  Time will tell how the server ecosystem will evolve in the longer term, but it's obviously critical to the success of Matrix to have multiple active independent server implementations, and we look forward to seeing how Synapse, Ruma & Dendrite progress!

🔗2017

Looking back at where we were at this time last year, 2016 has been a critical year for Matrix as the ecosystem has matured - rolling out E2E encryption; building out proper bot & bridge infrastructure; stabilising and tuning Synapse to keep up with the exponential traffic growth; seeing the explosion of contributors and new projects; seeing Riot edging closer to becoming a viable mainstream communication app.

2017 is going to be all about scaling Matrix - both the network, the ecosystem, and the project.  Whilst we've hopefully transitioned from being a niche decentralisation initiative to a relatively mainstream FOSS project, our ambition is unashamedly to become a mainstream communication (meta)network usable for the widest possible audience (whilst obviously still supporting our current community of FOSS & privacy advocates!).  With this in mind, stuff on the menu for 2017 includes:

  • Getting E2E Encryption out of beta asap.
  • Ensuring we can scale beyond Synapse - see Dendrite, above.
  • Getting as many bots and bridges into Matrix as possible, and doing everything we can to support them, host them and help them be as high quality as possible - making the public federated Matrix network as useful and diverse as possible.
  • Supporting Riot's leap to the mainstream, ensuring Matrix has at least one killer app.
  • Adding the final major missing features:
    • Customisable User Profiles (this is almost done, actually)
    • Groups (i.e. ability to define groups of users, and perform invites, powerlevels etc per-group as well as per-user)
    • Threading
    • Editable events (and Reactions)
  • Maturing and polishing the spec (we are way overdue a new release)
  • Improving VoIP - especially conferencing.
  • Reputation/Moderation management (i.e. spam/abuse filtering).
  • Much-needed SDK performance work on matrix-{'{'}react,ios,android{'}'}-sdk.
  • ...and a few other things which would be premature to mention right now :D
This is going to be an incredibly exciting ride (right now, it feels a bit like being on a toboggan which has made its way onto a fairly steep ski slope...) and we can only thank you: the community, for getting the project to this point - whether you're hacking on Matrix, contributing pull requests, filing issues, testing apps, spreading the word, or just simply using it.

See you in 2017, and thanks again for flying Matrix.

  • Matthew, Amandine & the Matrix Team.

matrix-appservice-irc 0.7.0 is out!

19.12.2016 00:00 — Tech Matthew Hodgson

Also, we've just released a major update to the IRC bridge codebase after trialling it on the matrix.org-hosted bridges for the last few days.

The big news is:

  • The bridge uses Synapse 0.18.5's new APIs for managing the public room list (improving performance a bunch)
  • Much faster startup using the new /joined_rooms and /joined_members APIs in Synapse 0.18.5
  • The bridge will now remember your NickServ password (encrypted at rest) if you want it to via the !storepass command
  • You can now set arbitrary user modes for IRC clients on connection (to mitigate PM spam etc)
  • After a split, the bridge will drop Matrix->IRC messages older than N seconds, rather than try to catch the IRC room up on everything they missed on Matrix :S
  • Operational metrics are now implemented using Prometheus rather than statsd
  • New !quit command to nuke your user from the remote IRC network
  • Membership list syncing for IRC->Matrix is enormously improved, and enabled for all matrix.org-hosted bridges apart from Freenode.  <b>At last, membership lists should be in sync between IRC and Matrix; please let us know if they're not</b>.
  • Better error logging
For full details, please see the changelog.

With things like NickServ-pass storing, !quit support and full bi-directional membership list syncing, it's never been a better time to run your own IRC bridge.  Please install or upgrade today from https://github.com/matrix-org/matrix-appservice-irc!

Synapse 0.18.5 released!

19.12.2016 00:00 — Tech Matthew Hodgson

Hi folks,

We released synapse 0.18.5 on Friday.  This is mainly about fixing performance problems with the unread room counts and the public room directory; polishing the E2E endpoints based on beta feedback; and general minor bits and bobs.

Get it whilst it's (almost) hot from https://github.com/matrix-org/synapse!  Changelog follows:

🔗Changes in synapse v0.18.5 (2016-12-16)

Bug fixes:

  • Fix federation /backfill returning events it shouldn't (PR #1700)
  • Fix crash in url preview (PR #1701)

🔗Changes in synapse v0.18.5-rc3 (2016-12-13)

Features:

  • Add support for E2E for guests (PR #1653)
  • Add new API appservice specific public room list (PR #1676)
  • Add new room membership APIs (PR #1680)
Changes:
  • Enable guest access for private rooms by default (PR #653)
  • Limit the number of events that can be created on a given room concurrently (PR #1620)
  • Log the args that we have on UI auth completion (PR #1649)
  • Stop generating refresh_tokens (PR #1654)
  • Stop putting a time caveat on access tokens (PR #1656)
  • Remove unspecced GET endpoints for e2e keys (PR #1694)
Bug fixes:
  • Fix handling of 500 and 429's over federation (PR #1650)
  • Fix Content-Type header parsing (PR #1660)
  • Fix error when previewing sites that include unicode, thanks to @kyrias (PR #1664)
  • Fix some cases where we drop read receipts (PR #1678)
  • Fix bug where calls to /sync didn't correctly timeout (PR #1683)
  • Fix bug where E2E key query would fail if a single remote host failed (PR #1686)

🔗Changes in synapse v0.18.5-rc2 (2016-11-24)

Bug fixes:

  • Don't send old events over federation, fixes bug in -rc1.

🔗Changes in synapse v0.18.5-rc1 (2016-11-24)

Features:

  • Implement "event_fields" in filters (PR #1638)
Changes:
  • Use external ldap auth package (PR #1628)
  • Split out federation transaction sending to a worker (PR #1635)
  • Fail with a coherent error message if /sync?filter= is invalid (PR #1636)
  • More efficient notif count queries (PR #1644)

When Ericsson discovered Matrix...

23.11.2016 00:00 — General Matthew Hodgson

As something completely different, we've invited Stefan Ålund and his team at Ericsson to write a guest blog post about the really cool stuff Ericsson is doing with Matrix.  This is a fascinating glimpse into how major folks are already launching commercial products on top of Matrix - whilst also making significant contributions back to the projects and the community.  We'd like to thank Stefan and Ericsson for all their support and perseverance, and we wish them the very best with the Ericsson Contextual Communication Cloud!

-- Matthew

At the end of 2014, my colleague Adam Bergkvist and I attended the WebRTC Expo in Paris, partly to promote our Open Source project OpenWebRTC, but also to meet the rest of the European WebRTC community and see what others were working on.

At Ericsson Research we had been working on WebRTC for quite some time. Not only on the client-side framework and how those could enable some truly experimental stuff, but more importantly how this emerging technology could be used to build new kinds of communication services where communication is not the service (A calling B), but is integrated as part of some other service or context. A simple example would be a health care solution, where the starting point could be the patient records and communication technologies are integrated to enable remote discussions between patients and their doctors.

Our research in this area, that we started calling “contextual communication”, pointed in a different direction from Ericsson's traditional communication business, therefore making it hard for us to transfer our ideas and technologies out from Ericsson Research. We increasingly had the feeling that we needed to build something new and start from a clean slate, so to speak.

Some of our guiding principles:

  • Flexibility - the communication should be able to integrate anywhere
  • Fast iterations - browsers and WebRTC are moving targets
  • Open - interoperability is important for communication systems
  • Low cost - operations for the core communication should approach 0
  • Trust - build on the Ericsson brand and technology leadership
We had a pretty good idea about what we wanted to build, but even though Ericsson is a big company, the team working in this area was relatively small and also had a number of other commitments that we couldn't abandon.

I think that is enough of a background, let's circle back to the WebRTC Expo and the reason why I am writing this post on the Matrix blog.

Adam and I were pretty busy in our booth talking to people and giving demos, so we actually missed when Matrix won the Best Innovation Award. Nonetheless we finally got some time to walk around and I started chatting with Matthew and Amandine who were manning the Matrix booth. Needless to say, I was really impressed with their vision and what they wanted to build. The comparison to email and how they wanted to make it possible to build an interoperable bridge between communication “islands”, all in an open (source) manner, really appealed to me.

To be honest, the altruistic aspects of decentralising communication was not the most important part for us, even if we were sympathetic to the cause, working for a company that was founded from "the belief that communication is a basic human need". We ultimately wanted to build a new kind of communication offering from Ericsson, and it looked like Matrix might be able to play a part in that.

I had recently hired a couple of interns and as soon as I came back from Paris, we set them to work evaluating Matrix. They were quickly able to port an existing WebRTC service (developed and used in-house) to use Matrix signalling and user management. We initially had some concerns about the maturity of the reference Home Server implementation (remember, this was almost 2 years ago) and we didn't want to start developing our own since we were still a small team. However, Matthew and the rest of the Matrix team worked closely with us, helping to answer all our (dumb) questions and we finally got to a point where we had the confidence to say “screw it, let's try this and see if it flies”. ?

Ericsson had recently launched the Ericsson Garage where employees could pitch ideas for how to build new business. So we decided to give the process a try and presented an idea on how Ericsson could start selling contextual communication as-a-Service, directly to enterprises that wanted help integrating communication into their business processes, but didn't necessarily have the competence or business interest to run their own communication services. We got accepted and moved (physically) out of Research to sit in the Garage for the next 4 months, developing a MVP.

Since the primary interface to our offering would be through SDKs on various platforms, we decided early on to develop our own. The SDKs were implementing the standard Matrix specification, but we put a lot of time in increasing the robustness and flexibility in the WebRTC call handling and eventually with added peer-2-peer data and collaboration features, on top of the secure WebRTC DataChannel. On the server side, our initialconcerns about Synapse were eventually removed completely as the Matrix team relentlessly kept working on fixing performance issues, patching security holes and provided a story on how to scale. Over the years we have contributed with several patches to Synapse (SAML auth and auth improvements; application service improvements) and provided input to the Matrix specification. We have always found the Matrix team be very inclusive and easy to work with.

The project graduated successfully from the Ericsson Garage and moved in to Ericsson's Business Unit IT & Cloud Products, where we started to increase the size of the team and just last month signed a contract with our first customer. We call the solution Ericsson Contextual Communication Cloud, or ECCC for short, and it can be summarised on a high level by the following picture:

ECCC in a nutshell

If you are interested in ECCC, feel free to reach out at https://discuss.c3.ericsson.net

As with any project developed in the open, it is essential to have a healthy community around it. We have received excellent support from the Matrix project and they have always been open for discussion, engaged our developers and listened to our needs. We depend on Matrix now and we see great potential for the future. We hope that others will adopt the technology and help make the community grow even stronger.

  • Stefan Ålund and the Ericsson ECCC Team

Synapse 0.18.4

22.11.2016 00:00 — Tech Matthew Hodgson

Uncharacteristically, we're actually remembering to announce a new release of Synapse!

Major performance fixes on federation, as well as the changes required to support E2E encrypted attachments (yay!)

Please install or upgrade from https://github.com/matrix-org/synapse :)

🔗Changes in synapse v0.18.4 (2016-11-22)

Bug fixes:

  • Add workaround for buggy clients that the fail to register (PR #1632)

🔗Changes in synapse v0.18.4-rc1 (2016-11-14)

Changes:

  • Various database efficiency improvements (PR #1188, #1192)
  • Update default config to blacklist more internal IPs, thanks to Euan Kemp @euank (PR #1198)
  • Allow specifying duration in minutes in config, thanks to Daniel Dent @DanielDent (PR #1625)
Bug fixes:
  • Fix media repo to set CORs headers on responses (PR #1190)
  • Fix registration to not error on non-ascii passwords (PR #1191)
  • Fix create event code to limit the number of prev_events (PR #1615)
  • Fix bug in transaction ID deduplication (PR #1624)