Synapse 0.25 is out… as is Matrix Specification 0.3(!!!)

Hi all,

Today is a crazy release day here – not only do we have Synapse 0.25, but we’ve also made a formal release of the Matrix Specification (CS API) for the first time in 16 months!

Matrix CS API 0.3

Talking first about the spec update: the workflow of the Matrix spec is that new experimental features get added to an /unstable API prefix, and then whenever we release the Matrix spec, these get moved over to being part of the /r0 prefix (or whatever version we happen to be on).  We’ve been very constrained on manpower to work on the spec over the last ~18 months, but we’ve been keeping it up-to-date on a best effort basis, with a bit of help from the wider community.   As such, this latest release does not contain all the latest APIs (and certainly not experimental ones like Groups/Communities which are still evolving), but it does release all of the unstable ones which we’ve managed to document and which are considered stable enough to become part of the ‘r0’ prefix.  Going forwards, we’re hoping that the wider community will help us fill in the remaining gaps (i.e. propose PRs against the matrix-org/matrix-doc repository to formalise the various spec drafts flying around the place) – and we’re also hoping (if/when funding crisis is abated) to locate full-time folk to work on the spec.

The full changelog for 0.3 of the spec is:

  • Breaking changes:
    • Change the rule kind of .m.rule.contains_display_name from underride to override. This works with all known clients which support push rules, but any other clients implementing the push rules API should be aware of this change. This makes it simple to mute rooms correctly in the API (#373).
    • Remove /tokenrefresh from the API (#395).
    • Remove requirement that tokens used in token-based login be macaroons (#395).
  • Changes to the API which will be backwards-compatible for clients:
    • Add filename parameter to POST /_matrix/media/r0/upload (#364).
    • Document CAS-based client login and the use of m.login.token in /login (#367).
    • Make origin_server_ts a mandatory field of room events (#379).
    • Add top-level account_data key to the responses to GET /sync and GET /initialSync (#380).
    • Add is_direct flag to POST /createRoom and invite member event. Add ‘Direct Messaging’ module (#389).
    • Add contains_url option to RoomEventFilter (#390).
    • Add filter optional query param to /messages (#390).
    • Add ‘Send-to-Device messaging’ module (#386).
    • Add ‘Device management’ module (#402).
    • Require that User-Interactive auth fallback pages call window.postMessage to notify apps of completion (#398).
    • Add pagination and filter support to /publicRooms. Change response to omit fields rather than return null. Add estimate of total number of rooms in list. (#388).
    • Allow guest accounts to use a number of endpoints which are required for end-to-end encryption. (#751).
    • Add key distribution APIs, for use with end-to-end encryption. (#894).
    • Add m.room.pinned_events state event for rooms. (#1007).
    • Add mention of ability to send Access Token via an Authorization Header.
    • New endpoints:
      • GET /joined_rooms (#999).
      • GET /rooms/{roomId}/joined_members (#999).
      • GET /account/whoami (#1063).
      • GET /media/{version}/preview_url (#1064).
  • Spec clarifications:
    • Add endpoints and logic for invites and third-party invites to the federation spec and update the JSON of the request sent by the identity server upon 3PID binding (#997)
    • Fix “membership” property on third-party invite upgrade example (#995)
    • Fix response format and 404 example for room alias lookup (#960)
    • Fix examples of m.room.member event and room state change, and added a clarification on the membership event sent upon profile update (#950).
    • Spell out the way that state is handled by POST /createRoom (#362).
    • Clarify the fields which are applicable to different types of push rule (#365).
    • A number of clarifications to authentication (#371).
    • Correct references to user_id which should have been sender (#376).
    • Correct inconsistent specification of redacted_because fields and their values (#378).
    • Mark required fields in response objects as such (#394).
    • Make m.notice description a bit harder in its phrasing to try to dissuade the same issues that occurred with IRC (#750).
    • GET /user/{userId}/filter/{filterId} requires authentication (#1003).
    • Add some clarifying notes on the behaviour of rooms with no m.room.power_levels event (#1026).
    • Clarify the relationship between username and user_id in the /register API (#1032).
    • Clarify rate limiting and security for content repository. (#1064).

…and you can read the spec itself of course over at https://matrix.org/docs/spec.  It’s worth noting that we have slightly bent the rules by including three very minor ‘breaking changes’ in 0.3, but all for features which to our knowledge nobody is depending on in the wild.  Technically this should mean bumping the major version prefix (i.e. moving to r1), but given how minor and nonimpacting these are we’re turning a blind eye this time.

Meanwhile, Synapse 0.25 is out!

This is a medium-sized release; the main thing being to support configurable room visibility within groups (so that whenever you add a room to a group, you’re not forced into sharing their existence with the general public, but can choose to just tell group members about them).  There’s also a bunch of useful bug fixes and some performance improvements, including lots of contributions from the community this release (thank you!).  Full release notes are:

Changes in synapse v0.25.0 (2017-11-15)

Bug fixes:

  • Fix port script (PR #2673)
Changes in synapse v0.25.0-rc1 (2017-11-14)

Features:

Changes:

  • Ignore tags when generating URL preview descriptions (PR #2576)
    Thanks to @maximevaillancourt!
  • Register some /unstable endpoints in /r0 as well (PR #2579) Thanks to
    @krombel!
  • Support /keys/upload on /r0 as well as /unstable (PR #2585)
  • Front-end proxy: pass through auth header (PR #2586)
  • Allow ASes to deactivate their own users (PR #2589)
  • Remove refresh tokens (PR #2613)
  • Automatically set default displayname on register (PR #2617)
  • Log login requests (PR #2618)
  • Always return is_public in the /groups/:group_id/rooms API (PR #2630)
  • Avoid no-op media deletes (PR #2637) Thanks to @spantaleev!
  • Fix various embarrassing typos around user_directory and add some doc. (PR
    #2643)
  • Return whether a user is an admin within a group (PR #2647)
  • Namespace visibility options for groups (PR #2657)
  • Downcase UserIDs on registration (PR #2662)
  • Cache failures when fetching URL previews (PR #2669)

Bug fixes:

  • Fix port script (PR #2577)
  • Fix error when running synapse with no logfile (PR #2581)
  • Fix UI auth when deleting devices (PR #2591)
  • Fix typo when checking if user is invited to group (PR #2599)
  • Fix the port script to drop NUL values in all tables (PR #2611)
  • Fix appservices being backlogged and not receiving new events due to a bug in
    notify_interested_services (PR #2631) Thanks to @xyzz!
  • Fix updating rooms avatar/display name when modified by admin (PR #2636)
    Thanks to @farialima!
  • Fix bug in state group storage (PR #2649)
  • Fix 500 on invalid utf-8 in request (PR #2663)

Finally…

If you haven’t noticed already, Riot/Web 0.13 is out today, as is Riot/iOS 0.6.2 and Riot/Android 0.7.4.  These contain massive improvements across the board – particularly mainstream Communities support at last on Riot/Web; CallKit/PushKit on Riot/iOS thanks to Denis Morozov (GSoC 2017 student for Matrix) and Share Extension on iOS thanks to Aram Sargsyan (also GSoC 2017 student!); and End-to-end Key Sharing on Riot/Android and a full rewrite of the VoIP calling subsystem on Android.

Rather than going on about it here, though, there’s a full write-up over on the Riot Blog.

 

And so there you go – new releases for eeeeeeeeveryone!  Enjoy! :)

–Matthew, Amandine & the team.

Synapse 0.22.0 released!

Hi Synapsefans,

Synapse 0.22.0 has just been released! This release lands a few interesting features:

  • The new User directory API which supports Matrix clients’ providing a much more intuitive and effective user search capability by exposing a list of:
    • Everybody your user shares a room with, and
    • Everybody in a public room your homeserver knows about
  • New support for server admins, including a Shutdown Room API (to remove a room from a local server) and a Media Quarrantine API (to render a media item inaccessible without its actually being deleted)

As always there are lots of bug fixes and performance improvements, including increasing the default cache factor size from 0.1 to 0.5 (should improve performance for those running their own homeservers).

You can get Synapse 0.22.0 from https://github.com/matrix-org/synapse or https://github.com/matrix-org/synapse/releases/tag/v0.22.0 as normal.

Changes in synapse v0.22.0 (2017-07-06)

No changes since v0.22.0-rc2

Changes in synapse v0.22.0-rc2 (2017-07-04)

Changes:

  • Improve performance of storing user IPs (PR #2307, #2308)
  • Slightly improve performance of verifying access tokens (PR #2320)
  • Slightly improve performance of event persistence (PR #2321)
  • Increase default cache factor size from 0.1 to 0.5 (PR #2330)

Bug fixes:

  • Fix bug with storing registration sessions that caused frequent CPU churn
    (PR #2319)

Changes in synapse v0.22.0-rc1 (2017-06-26)

Features:

  • Add a user directory API (PR #2252, and many more)
  • Add shutdown room API to remove room from local server (PR #2291)
  • Add API to quarantine media (PR #2292)
  • Add new config option to not send event contents to push servers (PR #2301)
    Thanks to @cjdelisle!

Changes:

Bug fixes:

  • Fix users not getting notifications when AS listened to that user_id (PR
    #2216) Thanks to @slipeer!
  • Fix users without push set up not getting notifications after joining rooms
    (PR #2236)
  • Fix preview url API to trim long descriptions (PR #2243)
  • Fix bug where we used cached but unpersisted state group as prev group,
    resulting in broken state of restart (PR #2263)
  • Fix removing of pushers when using workers (PR #2267)
  • Fix CORS headers to allow Authorization header (PR #2285) Thanks to @krombel!

 

Synapse 0.21.1 released!

Hi folks – we forgot to mention that Synapse 0.21.1 was released last week.  This contains a important fix to the report-stats option, which was otherwise failing to report any usage stats for folks who had the option turned on.

This is a good opportunity to note that the report-stats option is really really important for the ongoing health of the Matrix ecosystem: when raising funding to continue working on Matrix we have to be able to demonstrate how the ecosystem is growing and why it’s a good idea to support us to work on it. In practice, the data we collect is: hostname, synapse version & uptime, total_users, total_nonbridged users, total_room_count, daily_active_users, daily_active_rooms, daily_messages and daily_sent_messages.

Folks: if you have turned off report-stats for whatever reason, please consider upgrading to 0.21.1 and turning it back on.

In return, the plan is that we’ll start to publish an official Grafana of the anonymised aggregate stats, probably embedded into the frontpage of Matrix.org, and then you and everyone else can have a view of the state of the Matrix universe. And critically, it’ll really help us continue to justify $ to spend on growing the project!

You can get Synapse 0.21.1 from https://github.com/matrix-org/synapse or https://github.com/matrix-org/synapse/releases/tag/v0.21.1 as normal.

Synapse 0.21.0 is released!

Hi all,

Synapse 0.21.0 was released a moment ago. This release lands a number of performance improvements and stability fixes, plus a couple of small features.

For those of you upgrading https://github.com/matrix-org/synapse has the details as usual.  Full changelog follows:

Changes in synapse v0.21.0 (2017-05-17)

Features:

  • Add per user rate-limiting overrides (PR #2208)
  • Add config option to limit maximum number of events requested by /sync and /messages (PR #2221) Thanks to @psaavedra!

Changes:

  • Various small performance fixes (PR #2201, #2202, #2224, #2226, #2227, #2228, #2229)
  • Update username availability checker API (PR #2209, #2213)
  • When purging, don’t de-delta state groups we’re about to delete (PR #2214)
  • Documentation to check synapse version (PR #2215) Thanks to @hamber-dick!
  • Add an index to event_search to speed up purge history API (PR #2218)

Bug fixes:

  • Fix API to allow clients to upload one-time-keys with new sigs (PR #2206)

Changes in synapse v0.21.0-rc2 (2017-05-08)

Changes:

  • Always mark remotes as up if we receive a signed request from them (PR #2190)

Bug fixes:

  • Fix bug where users got pushed for rooms they had muted (PR #2200)

Changes in synapse v0.21.0-rc1 (2017-05-08)

Features:

  • Add username availability checker API (PR #2183)
  • Add read marker API (PR #2120)

Changes:

Bug fixes:

  • Fix nuke-room script to work with current schema (PR #1927) Thanks @zuckschwerdt!
  • Fix db port script to not assume postgres tables are in the public schema (PR #2024) Thanks @jerrykan!
  • Fix getting latest device IP for user with no devices (PR #2118)
  • Fix rejection of invites to unreachable servers (PR #2145)
  • Fix code for reporting old verify keys in synapse (PR #2156)
  • Fix invite state to always include all events (PR #2163)
  • Fix bug where synapse would always fetch state for any missing event (PR #2170)
  • Fix a leak with timed out HTTP connections (PR #2180)
  • Fix bug where we didn’t time out HTTP requests to ASes (PR #2192)

Docs:

  • Clarify doc for SQLite to PostgreSQL port (PR #1961) Thanks @benhylau!
  • Fix typo in synctl help (PR #2107) Thanks @HarHarLinks!
  • web_client_location documentation fix (PR #2131) Thanks @matthewjwolff!
  • Update README.rst with FreeBSD changes (PR #2132) Thanks @feld!
  • Clarify setting up metrics (PR #2149) Thanks @encks!

Update on Matrix.org homeserver reliability

Hi folks,

We’ve had a few outages over the last week on the Matrix.org homeserver which have caused problems for folks using bridges or accounts hosted on matrix.org itself – we’d like to apologise to everyone who’s been caught in the crossfire.  In the interests of giving everyone visibility on what’s going on and what we’re doing about it (and so folks can learn from our mistakes! :), here’s a quick writeup (all times are UTC):

  • 2017-05-04 21:05: The datacenter where we host matrix.org performs an emergency unscheduled upgrade of the VM host where the main matrix.org HS & DB master lives.  This means a live-migration of the VM onto another host, which freezes the (huge) VM for 9 minutes, during which service is (obviously) down.  Monitoring fires; we start investigating and try to get in on the console, but by the point we’re considering failing over to the hot-spare, the box has come back and recovers fine other than a load spike as all the traffic catches up.  The clock however is off by 9 minutes due to its world having paused.
  • 2017-05-04 22:30: We step NTP on the host to fix the clock (maximum clock skew on ISC ntpd is 500ppm, meaning it would take weeks to reconverge naturally, during which time we’re issuing messages with incorrect timestamps).
  • 2017-05-05 01:25: Network connectivity breaks between the matrix.org homeserver and the DC where all of our bridges/bots are hosted.
  • 2017-05-05 01:40: Monitoring alerts fire for bridge traffic activity and are picked up.  After trying to restart the VPN between the DC a few times, it turns out that the IP routes needed for the VPN have mysteriously disappeared.
  • 2017-05-05 02:23: Routes are manually readded and VPN recovers and traffic starts flowing again.  It turns out that the routes are meant to be maintained by a post-up script in /etc/network/interfaces, which assumes that /sbin/ip is on the path.  Historically this hasn’t been a problem as the DHCP lease on the host has never expired (only been renewed every 6 hours) – but the time disruption caused by the live-migration earlier means that on this renewal cycle the lease actually expires and the routes are lost and not-readded.  Basic bridging traffic checks are done (e.g. Freenode->Matrix).
  • 2017-05-05 08:30: Turns out that whilst Freenode->Matrix traffic was working, Matrix->Freenode was wedged due to a missing HTTP timeout in the AS logic on Synapse.  Synapse is restarted and the bug fixed.
  • …the week goes by…
  • 2017-05-11 18:00: (Unconnected to the rest of this outage, an IRC DDoS on GIMPnet cause intermittent load problems and delayed messages on matrix.org; we turn off the bridge for a few hours until it subsides).
  • 2017-05-12 02:50: The postgres partition on the matrix.org DB master diskfills and postgres halts.  Monitoring alerts fire (once, phone alerts), but the three folks on call manage to sleep through their phone ringing.
  • 2017-05-12 04:45: Folks get woken up and notice the outage; clear up diskspace; restart postgres. Meanwhile, synapse appears to recover, other than mysteriously refusing to send messages from local users.  Investigation commences in the guts of the database to try to see what corruption has happened.
  • 2017-05-12 06:00: We realise that nobody has actually restarted synapse since the DB outage begun, and the failure is probably a known issue where worker replication can get fail and cause the master synapse process to fail to process writes.  Synapse is restarted; everything recovers (including bridges).
  • 2017-05-12 06:20: Investigation into the cause of the diskfill reveals it to be due to postgres replication logs (WALs) stacking up on the DB master, due to replication having broken to a DB slave during the previous networking outage.  Monitoring alerts triggered but weren’t escalated due to a problem in PagerDuty.

Lessons learned:

  • Test your networking scripts and always check your box self-recovers after a restart (let alone a DHCP renewal).
  • Don’t use DHCP in production datacenters unless you really have no choice; it just adds potential ways for everything to break.
  • We need better end-to-end monitoring for bridged traffic.
  • We need to ensure HS<->Bridge traffic is more reliable (improved by fixing timeout logic in synapse).
  • We need better monitoring and alerting of DB replication traffic.
  • We need to escalate PagerDuty phone alerts more aggressively (done).
  • We need better alerting for disk fill thresholds (especially “time until fill”, remembering to take into account the emergency headroom reserved by the filesystem for the superuser).
  • We should probably have scripts to rapidly (or even automatedly) switch between synapse master & hot-spare, and to promote DB slaves in the event of a master failure.

Hopefully this is the last we’ve seen of this root cause; we’ll be working through the todo list above.  Many apologies again for the instability – however please do remember that you can (and should!) run your own homeserver & bridges to stay immune to whatever operational dramas we have with the matrix.org instance!

Synapse 0.20.0 is released!

Hi folks,

Synapse 0.20.0 was released a few hours ago – this is a major new release with loads of stability and performance fixes and some new features too. The main headlines are:

  • Support for using phone numbers as 3rd party identifiers as well as email addresses!  This is huge: letting you discover other users on Matrix based on whether they’ve linked their phone number to their matrix account, and letting you log in using your phone number as your identifier if you so desire.  Users of systems like WhatsApp should find this both familiar and useful ;)
  • Fixes some very nasty failure modes where the state of a room could be reset if a homeserver received an event it couldn’t verify.  Folks who have suffered rooms suddenly losing their name/icon/topic should particularly upgrade – this won’t fix the rooms retrospectively (your server will need to rejoin the room), but it should fix the problem going forwards.
  • Improves the retry schedule over federation significantly – previously there were scenarios where synapse could try to retry aggressively on servers which were offline.  This fixes that.
  • Significant performance improvements to /publicRooms, /sync, and other endpoints.
  • Lots of juicy bug fixes.

We highly recommend upgrading (or installing!) asap – https://github.com/matrix-org/synapse has the details as usual.  Full changelog follows:

Changes in synapse v0.20.0 (2017-04-11)

Bug fixes:

  • Fix joining rooms over federation where not all servers in the room saw the
    new server had joined (PR #2094)

Changes in synapse v0.20.0-rc1 (2017-03-30)

Features:

  • Add delete_devices API (PR #1993)
  • Add phone number registration/login support (PR #1994, #2055)

Changes:

  • Use JSONSchema for validation of filters. Thanks @pik! (PR #1783)
  • Reread log config on SIGHUP (PR #1982)
  • Speed up public room list (PR #1989)
  • Add helpful texts to logger config options (PR #1990)
  • Minor /sync performance improvements. (PR #2002, #2013, #2022)
  • Add some debug to help diagnose weird federation issue (PR #2035)
  • Correctly limit retries for all federation requests (PR #2050, #2061)
  • Don’t lock table when persisting new one time keys (PR #2053)
  • Reduce some CPU work on DB threads (PR #2054)
  • Cache hosts in room (PR #2060)
  • Batch sending of device list pokes (PR #2063)
  • Speed up persist event path in certain edge cases (PR #2070)

Bug fixes:

  • Fix bug where current_state_events renamed to current_state_ids (PR #1849)
  • Fix routing loop when fetching remote media (PR #1992)
  • Fix current_state_events table to not lie (PR #1996)
  • Fix CAS login to handle PartialDownloadError (PR #1997)
  • Fix assertion to stop transaction queue getting wedged (PR #2010)
  • Fix presence to fallback to last_active_ts if it beats the last sync time.
    Thanks @Half-Shot! (PR #2014)
  • Fix bug when federation received a PDU while a room join is in progress (PR
    #2016)
  • Fix resetting state on rejected events (PR #2025)
  • Fix installation issues in readme. Thanks @ricco386 (PR #2037)
  • Fix caching of remote servers’ signature keys (PR #2042)
  • Fix some leaking log context (PR #2048, #2049, #2057, #2058)
  • Fix rejection of invites not reaching sync (PR #2056)