Releases

148 posts tagged with "Releases" (See all Category)

Atom Feed

Synapse 1.29.0 released

08.03.2021 22:26 — Releases Dan Callahan
Last update: 08.03.2021 17:42

Synapse 1.29.0 is now available!

This release includes several useful new configuration options for administrators of federated home servers. In all cases, the defaults match Synapse's prior behavior.

  • AndrewFerr implemented include_profile_data_on_invite and allow_profile_lookup_over_federation which can limit disclosure of your users' profile information. These both default to True.
  • We've also implemented user_directory.prefer_local_users which weights users on the same homeserver higher in directory searches. This defaults to False.

Synapse is now easier to run in proxied environments, with tzyl implementing support for the NO_PROXY environment variable, as well as recognizing lowercase variants of that and related proxy variables.

Under the hood, we've been steadily improving our type hints, especially in light of the recent release of Twisted 21.2.0 which includes its own type annotations. We've also landed some improvements which reduce the amount of work Synapse does when presence is enabled and you join a room for the first time. Oh, and the media repository now regenerates missing thumbnails on demand.

Lastly, if you deploy Synapse behind a reverse proxy, Synapse now expects to receive an X-Forwarded-Proto header on incoming requests and will log a warning if it is missing. See the upgrade notes for more information. The full changelog has more information on what's in this release.

Synapse is a Free and Open Source Software project, and we'd like to extend our thanks to everyone who contributed to this release, including aaronraimist, AndrewFerr, dklimpel, ShadowJonathan, and tzyl.

Synapse 1.28.0 released

25.02.2021 00:00 — Releases Dan Callahan

Synapse 1.28.0 is now available!

This release comes with several further improvements to the user experience of single sign-on and numerous bugfixes and stability improvements.

For admins, Synapse 1.28 adds a new Admin API for retrieving event context and implements new spam checker hooks which enable checking file uploads and remote downloads. We've also improved memory usage of media repository workers.

Lastly, we have marked an undocumented Admin API for deprecation. If any of your tools use /_synapse/admin/v1/users/<user_id> to get account information, please replace that with the V2 List Accounts API, which has been available since Synapse 1.7.0.

There are no special upgrade instructions for 1.28.0. See the full changelog for more details on what's in this release.

Synapse is a Free and Open Source Software project, and we'd like to extend our thanks to everyone who contributed to this release, including arya2331, auscompgeek, bubu, compu42, dklimpel, dykstranet, and shadowjonathan.

We'd also like to thank yoric for thoroughly reviewing and re-organizing the Synapse CONTRIBUTING.md file.

Synapse 1.27.0 released

18.02.2021 23:25 — Releases Dan Callahan

Synapse 1.27.0 is now available!

We're especially proud of this release, as this is the version of Synapse that powered FOSDEM 2021 on Matrix. As such, our main focus was on stability, performance, and long-awaited support for social login.

🔗What's New

To our surprise, nearly half of all people who created accounts on the FOSDEM homeserver did so via a social login method. Full support for those methods is included in Synapse 1.27.0, and already available for all users on the matrix.org homeserver.

We've also changed how we use Redis in larger deployments, making Synapse more resilient to lost connections and eliminating delays when restarting with multiple federation senders.

Our Server Admin APIs saw a few tweaks, including new APIs to query and delete forward extremities or list the current state of a room.

See the full changelog for more.

🔗Breaking Changes for SSO

If you use Single Sign-On (SSO) via SAML, OAuth2, or OpenID Connect you must adjust your provider's configuration before upgrading to Synapse 1.27.0, as some endpoint URLs have changed. See the upgrading notes for more information.

🔗Dropping ARMv7 Docker Images

We were unable to produce ARM-based Docker images for this release due to problems with cross-compilation. As a result, we have made the difficult decision to cease building 32-bit ARMv7 Docker images as part of our release process. We will resume publishing ARM64 images with the next Synapse release.

Users on ARMv7 platforms (most notably Raspberry Pis) should consider building images locally using Synapse's Dockerfile or switching to installing Synapse directly as a Python module. Users with Raspberry Pi 3's or newer also have the option of installing a 64-bit Linux distribution and using an ARM64 Docker image.

🔗Thank you to our contributors

Synapse is a Free and Open Source Software project, and we'd like to extend our thanks to everyone who contributed to this release, including dklimpel, intelfx, jcgruenhage, Oliver-Hanikel, rht, and y-pankaj.

Synapse 1.26.0 released

28.01.2021 00:00 — Releases Dan Callahan

Synapse 1.26.0 is now available!

Note: This release includes a new database schema version. If you need to roll back to Synapse 1.25.0, you will also need to follow the associated database downgrade instructions.

In addition to a truckload of refactoring and general improvements, Synapse 1.26.0 includes three major new features:

  1. A brand new algorithm for calculating the auth chain difference, which should dramatically improve worst case performance during state resolution (#8622).
  2. Initial support for enabling multiple OpenID Connect providers, paving the way for proper multi-provider social login workflows.
  3. A significant speed-up to redaction performance in large rooms.

It also brings several improvements to Admin APIs:

We've also made it possible to offload several additional APIs to worker processes, including read receipts and account data persistence, further improving Synapse's scalability.

See the full changelog for more.

Lastly, a reminder: we have deprecated Python 3.5 and PostgreSQL 9.5 and will cease support at the end of March. Due to deprecations in our Python tooling, we were unable to produce a binary package for Ubuntu 16.04 LTS (Xenial) in time for this release. We have resolved this for 1.27.

Synapse is a Free and Open Source Software project, and we'd like to extend our thanks to everyone who contributed to this release, including 0xflotus, chris-ruecker, dklimpel, emelie-qis, jerinjtitus, and tzyl.

Synapse 1.25.0 released

13.01.2021 00:00 — Releases Dan Callahan

Synapse 1.25.0 is now available! With this release, we are deprecating Python 3.5 and PostgreSQL 9.5 and will cease producing binary packages for Debian 9 (Stretch) and Ubuntu 16.04 (Xenial) after a transition period which lasts through March 2021. See the changelog for further details.

We are also deprecating the Purge Room and Shutdown Room Admin APIs and will remove them in a future release. Please update your code to use the Delete Room Admin API instead.

Synapse 1.25.0 brings over a month's worth of improvements, including:

  • The ability for users to pick their own username when using Single Sign-On, right from within Synapse.
  • Support for async Python methods in custom spam checker modules.
  • New ways to restrict allowed IP address ranges for outgoing requests from Synapse.
  • Significantly faster v2 state resolution on rooms with large numbers of power level events, which are common in some types of bridged IRC rooms.

See the full changelog and upgrade notes for more.

Synapse is a Free and Open Source Software project, and we'd like to extend our thanks to everyone who contributed to this release, including @aaronraimist, @Bubu, @dklimpel, @edwargix, @fossterer, @jdreichmann, @jerinjtitus, and @MadLittleMods.

Synapse 1.24.0 and 1.23.1 released

09.12.2020 23:51 — Releases Dan Callahan

Synapse 1.24.0 is now available!

This release fixes a denial of service vulnerability (GHSA-hxmp-pqch-c8mm / CVE-2020-26257) in which a malicious homeserver could send malformed events into a room which would then break federation of that room.

This follows the disclosure of a denial of service vulnerability in OpenSSL (CVE-2020-1971). If you have installed Synapse from source, please ensure your host is up to date and then execute pip install 'cryptography>=3.3' inside your Synapse virtualenv.

We've also released Synapse 1.23.1 which includes that security fix and a small patch to maintain Python 3.5 compatibility. It is otherwise identical to 1.23.0. Note that Synapse 1.24.0 includes backwards incompatible changes which may affect a small number of users. See the notes on upgrading for more information.

Synapse 1.24.0 brings a pair of new Admin APIs, including a way to log in as users and to forcibly purge rooms when deleting them. We've also made numerous bug fixes and improvements to SSO support, especially around OpenID Connect and SAML providers.

This release includes an optional change to push notification badges: currently, the number in the badge is based on the count of rooms with unread messages. However, in some specialized cases you may want the badge to show the count of all unread messages, even if there are multiple unread messages in the same room. This behavior can now be toggled with a new configuration setting.

Additionally, for server admins, the deprecated /_matrix/client/*/admin Admin API endpoints have been removed. If you have tools which target these endpoints, please update them to use the /_synapse/admin URL prefix instead.

See the full changelog for more.

Installation instructions are available on GitHub, as is the v1.24.0 release tag.

Synapse is a Free and Open Source Software project, and we'd like to extend our thanks to everyone who contributed to this release, including @angdraug, @chagai95, @daenney, @dklimpel, @jordanbancino, @localguru, @nchamo, @ShadowJonathan, @TeFiLeDo, @tulir, and @waylon531.

Synapse 1.23.0 released

18.11.2020 00:00 — Releases Dan Callahan

Reminder: On Monday, we will be announcing a denial of service vulnerability which affects Synapse versions prior to 1.20.0. If you have not upgraded recently, please do so.

Synapse 1.23.0 now available!

For Synapse admins, this release support generating structured logs via the standard logging configuration (#8607, #8685). This may require changing your synapse configuration; see the upgrade notes for more information.

We've also added many new Admin APIs, contributed by @dklimpel:

  • Add API to get information about uploaded media (#8647)
  • Add API for local user media statistics (#8700)
  • Make it possible to delete files that were not used for a defined time (#8519)
  • Split API for reported events into detail and list endpoints. This is a breaking change to #8217 which was introduced in Synapse v1.21.0. Those who already use this API should check their scripts (#8539)
  • Allow server admins to list users' notification pushers (#8610, #8689)

Lastly, Synapse 1.23.0 addresses some significant bugs, including regressions in the SQLite-to-PostgreSQL database porting script (#8729, #8730, #8755) and an issue which could prevent Synapse from recovering after losing its connection to its database (#8726). Synapse will also reject ACL modifications from clients which would otherwise cause a server to ban itself from a room (#8708).

Installation instructions are available on GitHub, as is the v1.23.0 release tag.

Synapse is a Free and Open Source Software project, and we'd like to extend our thanks to everyone who contributed to this release, including @chagai95 and @dklimpel.

The full changelog for 1.23.0 is as follows:

🔗Synapse 1.23.0 (2020-11-18)

This release changes the way structured logging is configured. See the upgrade notes for details.

Note: We are aware of a trivially exploitable denial of service vulnerability in versions of Synapse prior to 1.20.0. Complete details will be disclosed on Monday, November 23rd. If you have not upgraded recently, please do so.

🔗Bugfixes

  • Fix a dependency versioning bug in the Dockerfile that prevented Synapse from starting. (#8767)

🔗Synapse 1.23.0rc1 (2020-11-13)

🔗Features

  • Add a push rule that highlights when a jitsi conference is created in a room. (#8286)
  • Add an admin api to delete a single file or files that were not used for a defined time from server. Contributed by @dklimpel. (#8519)
  • Split admin API for reported events (GET /_synapse/admin/v1/event_reports) into detail and list endpoints. This is a breaking change to #8217 which was introduced in Synapse v1.21.0. Those who already use this API should check their scripts. Contributed by @dklimpel. (#8539)
  • Support generating structured logs via the standard logging configuration. (#8607, #8685)
  • Add an admin API to allow server admins to list users' pushers. Contributed by @dklimpel. (#8610, #8689)
  • Add an admin API GET /_synapse/admin/v1/users/<user_id>/media to get information about uploaded media. Contributed by @dklimpel. (#8647)
  • Add an admin API for local user media statistics. Contributed by @dklimpel. (#8700)
  • Add displayname to Shared-Secret Registration for admins. (#8722)

🔗Bugfixes

  • Fix fetching of E2E cross signing keys over federation when only one of the master key and device signing key is cached already. (#8455)
  • Fix a bug where Synapse would blindly forward bad responses from federation to clients when retrieving profile information. (#8580)
  • Fix a bug where the account validity endpoint would silently fail if the user ID did not have an expiration time. It now returns a 400 error. (#8620)
  • Fix email notifications for invites without local state. (#8627)
  • Fix handling of invalid group IDs to return a 400 rather than log an exception and return a 500. (#8628)
  • Fix handling of User-Agent headers that are invalid UTF-8, which caused user agents of users to not get correctly recorded. (#8632)
  • Fix a bug in the joined_rooms admin API if the user has never joined any rooms. The bug was introduced, along with the API, in v1.21.0. (#8643)
  • Fix exception during handling multiple concurrent requests for remote media when using multiple media repositories. (#8682)
  • Fix bug that prevented Synapse from recovering after losing connection to the database. (#8726)
  • Fix bug where the /_synapse/admin/v1/send_server_notice API could send notices to non-notice rooms. (#8728)
  • Fix PostgreSQL port script fails when DB has no backfilled events. Broke in v1.21.0. (#8729)
  • Fix PostgreSQL port script to correctly handle foreign key constraints. Broke in v1.21.0. (#8730)
  • Fix PostgreSQL port script so that it can be run again after a failure. Broke in v1.21.0. (#8755)

🔗Improved Documentation

  • Instructions for Azure AD in the OpenID Connect documentation. Contributed by peterk. (#8582)
  • Improve the sample configuration for single sign-on providers. (#8635)
  • Fix the filepath of Dex's example config and the link to Dex's Getting Started guide in the OpenID Connect docs. (#8657)
  • Note support for Python 3.9. (#8665)
  • Minor updates to docs on running tests. (#8666)
  • Interlink prometheus/grafana documentation. (#8667)
  • Notes on SSO logins and media_repository worker. (#8701)
  • Document experimental support for running multiple event persisters. (#8706)
  • Add information regarding the various sources of, and expected contributions to, Synapse's documentation to CONTRIBUTING.md. (#8714)
  • Migrate documentation docs/admin_api/event_reports to markdown. (#8742)
  • Add some helpful hints to the README for new Synapse developers. Contributed by @chagai95. (#8746)

🔗Internal Changes

  • Optimise /createRoom with multiple invited users. (#8559)
  • Implement and use an @lru_cache decorator. (#8595)
  • Don't instantiate Requester directly. (#8614)
  • Type hints for RegistrationStore. (#8615)
  • Change schema to support access tokens belonging to one user but granting access to another. (#8616)
  • Remove unused OPTIONS handlers. (#8621)
  • Run mypy as part of the lint.sh script. (#8633)
  • Correct Synapse's PyPI package name in the OpenID Connect installation instructions. (#8634)
  • Catch exceptions during initialization of password_providers. Contributed by Nicolai Søborg. (#8636)
  • Fix typos and spelling errors in the code. (#8639)
  • Reduce number of OpenTracing spans started. (#8640, #8668, #8670)
  • Add field total to device list in admin API. (#8644)
  • Add more type hints to the application services code. (#8655, #8693)
  • Tell Black to format code for Python 3.5. (#8664)
  • Don't pull event from DB when handling replication traffic. (#8669)
  • Abstract some invite-related code in preparation for landing knocking. (#8671, #8688)
  • Clarify representation of events in logfiles. (#8679)
  • Don't require hiredis package to be installed to run unit tests. (#8680)
  • Fix typing info on cache call signature to accept on_invalidate. (#8684)
  • Fail tests if they do not await coroutines. (#8690)
  • Improve start time by adding an index to e2e_cross_signing_keys.stream_id. (#8694)
  • Re-organize the structured logging code to separate the TCP transport handling from the JSON formatting. (#8697)
  • Use Python 3.8 in Docker images by default. (#8698)
  • Remove the "draft" status of the Room Details Admin API. (#8702)
  • Improve the error returned when a non-string displayname or avatar_url is used when updating a user's profile. (#8705)
  • Block attempts by clients to send server ACLs, or redactions of server ACLs, that would result in the local server being blocked from the room. (#8708)
  • Add metrics the allow the local sysadmin to track 3PID /requestToken requests. (#8712)
  • Consolidate duplicated lists of purged tables that are checked in tests. (#8713)
  • Add some mdui:UIInfo element examples for saml2_config in the homeserver config. (#8718)
  • Improve the error message returned when a remote server incorrectly sets the Content-Type header in response to a JSON request. (#8719)
  • Speed up repeated state resolutions on the same room by caching event ID to auth event ID lookups. (#8752)

Dendrite 0.3.0 released

16.11.2020 17:44 — Releases Matthew Hodgson

Hi all,

Heads up that we just cut another beta release of Dendrite - now at 0.3.0!

This is a really fun release given almost all the changes are contributed from the wider community - so huge thanks to S7evinK, MayeulC and felix!

The main new feature is full Read Receipt support thanks to S7evinK, which makes an enormous perceptual improvement when using Dendrite - so especial thanks are due there :)

So, if you're interested in helping us test, please spin up a copy from https://github.com/matrix-org/dendrite and let us know how it goes - and if you're already running one, now is an excellent time to upgrade!

Full changelog (including 0.2.1, which we forgot to blog about) follows:

🔗Dendrite 0.3.0 (2020-11-16)

🔗Features

  • Read receipts (both inbound and outbound) are now supported (contributed by S7evinK)
  • Forgetting rooms is now supported (contributed by S7evinK)
  • The -version command line flag has been added (contributed by S7evinK)

🔗Fixes

  • User accounts that contain the = character can now be registered
  • Backfilling should now work properly on rooms with world-readable history visibility (contributed by MayeulC)
  • The gjson dependency has been updated for correct JSON integer ranges
  • Some more client event fields have been marked as omit-when-empty (contributed by S7evinK)
  • The build.sh script has been updated to work properly on all POSIX platforms (contributed by felix)

🔗Dendrite 0.2.1 (2020-10-22)

🔗Fixes

  • Forward extremities are now calculated using only references from other extremities, rather than including outliers, which should fix cases where state can become corrupted (#1556)
  • Old state events will no longer be processed by the sync API as new, which should fix some cases where clients incorrectly believe they have joined or left rooms (#1548)
  • More SQLite database locking issues have been resolved in the latest events updater (#1554)
  • Internal HTTP API calls are now made using H2C (HTTP/2) in polylith mode, mitigating some potential head-of-line blocking issues (#1541)
  • Roomserver output events no longer incorrectly flag state rewrites (#1557)
  • Notification levels are now parsed correctly in power level events (gomatrixserverlib#228, contributed by Pestdoktor)
  • Invalid UTF-8 is now correctly rejected when making federation requests (gomatrixserverlib#229, contributed by Pestdoktor)

How we fixed Synapse's scalability!

03.11.2020 00:00 — Releases Matthew Hodgson

Hi all,

We had a major break-through in Synapse 1.22 which we want to talk about in more detail: Synapse now scales horizontally across multiple python processes.

Horizontal scaling means that you can support more users and traffic by adding in more python processes (spread over more machines, if necessary) without there being a single bottleneck which all the traffic is passing through - as opposed to vertical scaling where you make things go faster overall by making the bottleneck go faster.

After many years of having to vertically scale Synapse (by trying to make the main process go faster) we’re now finally at the point where you can configure Synapse so that messages no longer flow through the main process - eliminating the bottleneck entirely. What’s more, the Matrix.org homeserver has now been successfully running in this config and enjoying the massive scalability improvements for the last 2 weeks! Huge kudos goes to Erik and the wider Synapse team for pulling this off.

Some readers might wonder how this ties in with Dendrite entering beta, given one of Dendrite’s design goals is full horizontal scalability. The answer is that we’re very much using Dendrite for experimentation and next-gen stuff at the moment (currently focused more on scaling downwards for P2P rather than scaling upwards for megaservers) - while Synapse is the stable and long-term supported option.

So, that’s the context - now over to Erik with more than you could possibly ever want to know about how we actually did it...

🔗Background

Synapse started life off back in 2014 as a single monolithic python process, and for quite a while we made it scale by adding more and more in-memory caches to speed things up by avoiding hitting the database (at the expense of RAM). It looked like this:

Eventually the caches stopped helping and we needed more than one thread of execution in order to spread CPU across multiple cores. Python’s Global Interpreter Lock (GIL) means that Python can mainly only use one CPU core at a time, so starting more threads doesn’t help with scalability - you have to run multiple processes.

Now, the vast majority of the work that Synapse does is related to “streams”. These are append only sequences of rows, such as the events stream, typing stream, receipts stream, etc. When a new event arrives (for example) we write it to the events stream, and then notify anything waiting that there has been an update. The /sync endpoint, for instance, will wait for updates to streams and send them down to long-polling Matrix clients.

Streams support being added to concurrently, so have a concept of the “persisted up-to position”. This is the point where all rows before that point have finished persisting. Readers only read up to the current “persisted up-to position”, so that they don’t skip updates that haven’t finished persisting at that point. (E.g. if two events A and B get assigned positions 5 and 6, but B finishes persisting first, then the persisted up to position will remain at 4 until A finishes persisting and then it jumps to 6).

To split any meaningful amount of work into separate processes, we need to add a mechanism where processes can be told that updates to streams have happened (otherwise they’d have to repeatedly poll the DB, which would be deeply inefficient). The architecture ended up being one where we had the “main” process that streams updates via a custom replication protocol (initially long-polling HTTP; later custom TCP) to any number of “worker” processes. This meant that we could move sync stream handling (and other read apis) off the main process and onto workers, but also that all database writes had to go through the single main process (as it was a star topology, the main process could talk to all workers but workers could only talk to the main process and not each other).

2020-11-03-synapse2.png

As an aside: cache invalidations also had to be streamed down the replication connections, which has the side effect that we could only cache things that would only be invalidated on the main process.

We continued to move more and more read APIs out onto separate workers. We also added workers in front of the main process that would e.g. handle the creation of the new events, authenticating, etc, and then call out to the main process with the event for it to persist the event.

🔗Moving writes off the main process

Eventually we ran out of stuff to move out of the main process that didn’t involve writing to the DB. To write stuff from other processes we needed a way for the workers to stream updates to each other. The easiest and most obvious way was to just use Redis and its pub/sub support.

2020-11-03-synapse3.png

This almost allowed us to move writing of a particular stream to a different worker, except writing to streams generally also meant invalidating caches which in itself requires writing to a stream. We needed a way of writing to the cache invalidation stream from multiple workers at once.

Sharding the cache invalidation thankfully turned out to be easy, as workers would simply call the cache invalidation function whenever they get an invalidation notice over replication. In particular, the ordering of invalidations from different workers doesn’t matter and so there isn’t a need to calculate a single “persisted up-to position”. Sharding then just becomes a case of adding the name of the worker that is writing the update to the replication stream, and then workers reading from it can basically treat the cache stream the same as if they were multiple streams, one per worker.

This then unlocks the ability to move writing of streams off the main process and onto different workers - and so we added the “event persister” worker for offloading the main event stream off the main process:

2020-11-03-synapse4.png

🔗Sharding the events stream

Eventually the worker responsible for doing nothing but persisting events started maxing out CPU. This meant that we had to look at sharding the events stream, i.e. writing to it from multiple workers.

This is more complicated than sharding the cache invalidation stream as the ordering of the events does matter; we send them down sync streams, in order, with a token that indicates where the sync stream is up to in the events stream. This means that workers need to be able to calculate a “persisted up-to position” when getting updates from different workers.

The easiest way of doing that is to simply set the persisted up-to position as the minimum position received over federation from all active writers. This works, except events would only be processed after all other writers have subsequently written events (to advance the persisted position past the point at which the event was written), which can add a lot of latency depending on how often events are written.

A refinement is to note that if you have a persisted up-to position of 10, then receive updates at sequential positions 11, 12, 13 and 14, you know that everything between 10 and 14 has finished persisting (as you received updates about them), and so can set the persisted up-to position to 14. Annoyingly, it’s not required that positions are sequential without gaps (due to various technical considerations), and so in the worst case this still has the same problems as the naïve solution.

To avoid these problems we change the persisted up-to position to be a vector clock of positions; tracking a vector of positions - one per writer. This still allows answering the query of “get all events after token X” (as events are written with the position and the name of the writer). The persisted up-to position is then calculated by just tracking the last position seen to arrive over replication from each writer.

This allows writing events from multiple workers, while ensuring that other workers can correctly keep track of a “persisted up-to position”. Then it's just a matter of inspecting the code to ensure that it does not assume that it is the only writer to the stream. In the case of writing to the events stream, we note that the function persisting events assumes it's the only writer for a given room, so when sharding we have to ensure that there are no concurrent writes to the same room. This is most easily done by sharding based on room ID, and ensuring that the mapping of room ID to worker does not change (without coordination).

The only thing left is to then encode the vector clock position into the sync tokens. We want to ensure that these tokens are not too long, as they get included as query string parameters (e.g. the since= parameter of /sync). By assigning persistent unique integer IDs to workers the vector clock can be persisted as a sequence of pairs of integers, which is relatively few bytes so long as we don’t have too many workers writing to the events stream. We can further reduce the size of the tokens by calculating an integer “persisted up-to position” as we did before, encoding that and only including positions for workers that are larger than the integer persisted upto position. (The idea here is that most of the time only a small number of workers will be ahead of the calculated persisted up-to position, and so we only need to encode those).

And this is what we have today:

2020-11-03-synapse5.png

The major limitation of the current situation is that you can’t dynamically add/remove workers which persist events, as the sharding by room ID is calculated at startup, and so changing it requires restarting the whole system. This could be replaced by any system that allowed coordination over which persister is allowed to write to a room at any given point. However this is likely tricky to get right in practice, but would allow dynamic auto scaling of deployments, or automatically recovering from a worker that gets wedged/dies.

Finally, it’s worth noting that sharding event persisters isn’t the only performance work that’s been going on - switching everything over to python 3 and async twisted has helped, along with lots of smaller optimisations on the hot paths, and further rebalancing workers (e.g. moving background jobs off the master process to dedicated workers). We’ve also benefited a lot from the maintainability of rolling out mypy typing throughout the codebase. And next up, we’ll be going back to speeding up the codebase as a whole - starting with algorithmic state resolution improvements! 🎉

🔗Performance

So, how does it stack up?

Here’s the send time heatmap on Matrix.org showing the step change on Oct 16th when we rolled out the second event persister (full disclosure: this also coincides with moving background processes off the main Synapse process to a background worker). As you can see, we go from messages being spread over a huge range of durations (up to several seconds) to the sweet spot being 50ms or less - a spectacular improvement!

2020-11-03-synapse-heatmap.png

Meanwhile, here’s the actual CPU utilisation as we split the traffic from a single event persister (yellow) to two persisters (one yellow, one blue), showing the sharding beautifully horizontally balancing CPU between the two active/active worker processes:

2020-11-03-synapse-cpu.png

We’ve yet to loadtest to see just how fast we can go now (before we start hitting bottlenecks on the postgres cluster), but it sure feels good to have all our CPU headroom back on Matrix.org again, ready for the next wave of users to arrive.

🔗Conclusion

So there you have it: folks running massive homeservers (50K+ concurrent users) like Matrix.org (and cough various high profile public sector deployments) are no longer held hostage by the bottleneck of the main synapse process and should feel free to experiment with setting up event persister workers to handle high traffic loads. Otherwise, if you can spread your users over smaller servers, that’s also a good bet (assuming they don’t have massively overlapping room membership, like we see on Matrix.org.)

The current worker documentation is up-to-date, although does assume you are already very familiar with how to administer Synapse. It’s also very much subject to change, as we keep adding new workers and improving the architecture. However, now is a pretty good time to get involved if you’re interested in large-scale Matrix deployments.

-- The Synapse Team

Synapse 1.22.0 released

27.10.2020 17:01 — Releases Dan Callahan

Synapse 1.22.0 now available!

This release focused on improving Synapse's horizontal scalability, including:

  • Support for running background tasks in separate worker processes.
  • Fixes to sharded event persisters, which were experimentally introduced in 1.21.0.
  • Fixing a message duplication bug with worker-based deployments. (#8476)

Synapse 1.22.0 also has a few other notable changes:

  • Defaulting to version 6 rooms, per MSC2788.
  • Initial support for three new experimental MSCs:
    • MSC2732: Supporting olm fallback keys
    • MSC2697: Supporting device dehydration
    • MSC2409: Allowing appservices to receive ephemeral events like read receipts, presence, and typing indicators.
  • Multi-arch Docker images, covering arm64 and arm/v7 in addition to amd64.

Installation instructions are available on GitHub, as is the v1.22.0 release tag.

Lastly, Synapse is a Free and Open Source Software project, and we'd like to extend our thanks to everyone who contributed to this release, including @Akkowicz, @BBBSnowball, @maquis196, and @samuel-p.

The full changelog for 1.22.0 is as follows:

🔗Synapse 1.22.0 (2020-10-27)

No significant changes.

🔗Synapse 1.22.0rc2 (2020-10-26)

🔗Bugfixes

  • Fix bugs where ephemeral events were not sent to appservices. Broke in v1.22.0rc1. (#8648, #8656)
  • Fix user_daily_visits table to not have duplicate rows per user/device due to multiple user agents. Broke in v1.22.0rc1. (#8654)

🔗Synapse 1.22.0rc1 (2020-10-22)

🔗Features

  • Add a configuration option for always using the "userinfo endpoint" for OpenID Connect. This fixes support for some identity providers, e.g. GitLab. Contributed by Benjamin Koch. (#7658)
  • Add ability for ThirdPartyEventRules modules to query and manipulate whether a room is in the public rooms directory. (#8292, #8467)
  • Add support for olm fallback keys (MSC2732). (#8312, #8501)
  • Add support for running background tasks in a separate worker process. (#8369, #8458, #8489, #8513, #8544, #8599)
  • Add support for device dehydration (MSC2697). (#8380)
  • Add support for MSC2409, which allows sending typing, read receipts, and presence events to appservices. (#8437, #8590)
  • Change default room version to "6", per MSC2788. (#8461)
  • Add the ability to send non-membership events into a room via the ModuleApi. (#8479)
  • Increase default upload size limit from 10M to 50M. Contributed by @Akkowicz. (#8502)
  • Add support for modifying event content in ThirdPartyRules modules. (#8535, #8564)

🔗Bugfixes

  • Fix a longstanding bug where invalid ignored users in account data could break clients. (#8454)
  • Fix a bug where backfilling a room with an event that was missing the redacts field would break. (#8457)
  • Don't attempt to respond to some requests if the client has already disconnected. (#8465)
  • Fix message duplication if something goes wrong after persisting the event. (#8476)
  • Fix incremental sync returning an incorrect prev_batch token in timeline section, which when used to paginate returned events that were included in the incremental sync. Broken since v0.16.0. (#8486)
  • Expose the uk.half-shot.msc2778.login.application_service to clients from the login API. This feature was added in v1.21.0, but was not exposed as a potential login flow. (#8504)
  • Fix error code for /profile/{userId}/displayname to be M_BAD_JSON. (#8517)
  • Fix a bug introduced in v1.7.0 that could cause Synapse to insert values from non-state m.room.retention events into the room_retention database table. (#8527)
  • Fix not sending events over federation when using sharded event writers. (#8536)
  • Fix a long standing bug where email notifications for encrypted messages were blank. (#8545)
  • Fix increase in the number of There was no active span... errors logged when using OpenTracing. (#8567)
  • Fix a bug that prevented errors encountered during execution of the synapse_port_db from being correctly printed. (#8585)
  • Fix appservice transactions to only include a maximum of 100 persistent and 100 ephemeral events. (#8606)

🔗Updates to the Docker image

  • Added multi-arch support (arm64,arm/v7) for the docker images. Contributed by @maquis196. (#7921)
  • Add support for passing commandline args to the synapse process. Contributed by @samuel-p. (#8390)

🔗Improved Documentation

  • Update the directions for using the manhole with coroutines. (#8462)
  • Improve readme by adding new shield.io badges. (#8493)
  • Added note about docker in manhole.md regarding which ip address to bind to. Contributed by @Maquis196. (#8526)
  • Document the new behaviour of the allowed_lifetime_min and allowed_lifetime_max settings in the room retention configuration. (#8529)

🔗Deprecations and Removals

  • Drop unused device_max_stream_id table. (#8589)

🔗Internal Changes

  • Check for unreachable code with mypy. (#8432)
  • Add unit test for event persister sharding. (#8433)
  • Allow events to be sent to clients sooner when using sharded event persisters. (#8439, #8488, #8496, #8499)
  • Configure public_baseurl when using demo scripts. (#8443)
  • Add SQL logging on queries that happen during startup. (#8448)
  • Speed up unit tests when using PostgreSQL. (#8450)
  • Remove redundant database loads of stream_ordering for events we already have. (#8452)
  • Reduce inconsistencies between codepaths for membership and non-membership events. (#8463)
  • Combine SpamCheckerApi with the more generic ModuleApi. (#8464)
  • Additional testing for ThirdPartyEventRules. (#8468)
  • Add -d option to ./scripts-dev/lint.sh to lint files that have changed since the last git commit. (#8472)
  • Unblacklist some sytests. (#8474)
  • Include the log level in the phone home stats. (#8477)
  • Remove outdated sphinx documentation, scripts and configuration. (#8480)
  • Clarify error message when plugin config parsers raise an error. (#8492)
  • Remove the deprecated Handlers object. (#8494)
  • Fix a threadsafety bug in unit tests. (#8497)
  • Add user agent to user_daily_visits table. (#8503)
  • Add type hints to various parts of the code base. (#8407, #8505, #8507, #8547, #8562, #8609)
  • Remove unused code from the test framework. (#8514)
  • Apply some internal fixes to the HomeServer class to make its code more idiomatic and statically-verifiable. (#8515)
  • Factor out common code between RoomMemberHandler._locally_reject_invite and EventCreationHandler.create_event. (#8537)
  • Improve database performance by executing more queries without starting transactions. (#8542)
  • Rename Cache to DeferredCache, to better reflect its purpose. (#8548)
  • Move metric registration code down into LruCache. (#8561, #8591)
  • Replace DeferredCache with the lighter-weight LruCache where possible. (#8563)
  • Add virtualenv-generated folders to .gitignore. (#8566)
  • Add get_immediate method to DeferredCache. (#8568)
  • Fix mypy not properly checking across the codebase, additionally, fix a typing assertion error in handlers/auth.py. (#8569)
  • Fix synmark benchmark runner. (#8571)
  • Modify DeferredCache.get() to return Deferreds instead of ObservableDeferreds. (#8572)
  • Adjust a protocol-type definition to fit sqlite3 assertions. (#8577)
  • Support macOS on the synmark benchmark runner. (#8578)
  • Update mypy static type checker to 0.790. (#8583, #8600)
  • Re-organize the structured logging code to separate the TCP transport handling from the JSON formatting. (#8587)
  • Remove extraneous unittest logging decorators from unit tests. (#8592)
  • Minor optimisations in caching code. (#8593, #8594)